Standard Operating Procedure
PoliPrism v2 — Site Details
Architecture, tech stack, database schema, deploy workflow, and operational reference for the PoliPrism civic intelligence platform.
Platform Overview
PoliPrism is a civic intelligence platform — structured public data (legislators, bills, committees, campaign finance) enabling research, LLM analysis (Prism Factor), and cross-domain intelligence.
The platform has three decoupled layers:
Site Ops
Frontend build + deploy. Never touches the database. Astro static output served by nginx.
Data Processing
Ingest goblins fetch from external APIs → staging → promote to canonical tables → refresh MVs. Never triggers a site build.
Statistics & AI Analytics
Two-suite analytics: deterministic statistics (DuckDB + Polars + scikit-learn + NetworkX — ideology, centrality, pairwise similarity) and LLM-interpretive Prism Factor scoring (vLLM + Qwen3). Both run on the stats/LLM box; results land in stat_* tables. Never triggers build or MV refresh.
Tech Stack
Frontend
Backend
Infrastructure
Database Schema
Canonical Tables
| Table | Domain | Key Identifiers |
|---|---|---|
| entities_legislators | Legislators | id (PK), people_slug (NOT NULL, canonical URL slug), bioguide_id (federal), openstates_id (state) |
| entities_bills | Bills | id (PK), congress + bill_type + bill_number (federal), openstates_bill_id or nyleg_bill_id (state) |
| entities_committees | Committees | id (PK), bioguide_committee_id |
| entities_organizations | Finance/Orgs | id (PK), fec_committee_id |
| entities_districts | Districts | id (PK), district_slug (e.g. ca-12, us-ca, us-all), level (federal_house / state / national) |
| entities_lobbying_registrants | Lobbying firms | id (PK), registrant_slug, lda_registrant_id |
| entities_lobbying_clients | Lobbying clients | id (PK), client_slug, lda_client_id, org_id → entities_organizations |
| entities_lobbyists | Lobbyists | id (PK), lobbyist_slug, lda_lobbyist_id, legislator_id → entities_legislators (revolving door) |
| entities_fec_committees | FEC committee master | id (PK), committee_id (C00…), organization_id → entities_organizations. Identity layer for every PAC. |
Junction / Transactional Tables
Materialized Views (API Read Model)
API Architecture
Modular FastAPI at api/. Every response is a Pydantic model. List endpoints return PaginatedResponse[T] with items, total, page, per_page, pages.
Design Rules
- Canonical slug = people_slug (never bioguide_id in URLs)
- Flat scores: prism_score: float | None (not nested objects)
- State-members merged into /api/legislators?level=state
- State-bills merged into /api/bills?level=state
- Connection pooling via psycopg2 ThreadedConnectionPool
Endpoint Groups
Legislators
/api/legislators
/api/legislators/{slug}
/api/legislators/stats
/api/legislators/facets
Bills
/api/bills
/api/bills/{slug}
/api/bills/stats
/api/bills/facets
Finance
/api/finance/overview
/api/finance/organizations
/api/finance/contributions
/api/finance/independent-expenditures
Committees
/api/committees
/api/committees/{slug}
/api/committees/stats
/api/committees/facets
Districts
/api/districts
/api/districts/{slug}
/api/districts/stats
/api/districts/facets
Lobbying
/api/lobbying/registrants
/api/lobbying/clients
/api/lobbying/lobbyists
/api/lobbying/stats
Search (Meilisearch-backed, federated)
/api/search?q=…&types=legislator,bill,…&page=…&sort=field:asc|desc
Federated across 8 indexes: legislators, bills, committees, districts, organizations, lobbying_registrants, lobbying_clients, lobbyists.
Deploy Workflow
Frontend Deploy (2-3 seconds)
- git push (from dev machine to NAS bare repo)
- VM: git pull (from NAS mount at /mnt/nas/PoliPrism.git)
- VM: npm ci && astro build (~60,000 static pages across all entity types)
- VM: rsync dist/ → /usr/share/nginx/html/poliprism/
Data Pipeline (independent)
- Goblins ingest from external APIs → staging tables
- Promote scripts move staging → canonical tables
- MV refresh runs ONCE after processing
- API reads from MVs — site sees new data on next request
No site rebuild needed when data changes.
AI Scoring (independent)
- LLM models score bills (Prism Factor: policy alignment, category, summary)
- Legislator scores computed from bill positions + finance + effectiveness
- Scores written to entities_legislators and entities_bills
- MV refresh makes scores visible to API
Prism Factor
Two-suite analytics architecture — deterministic statistics that never ask an LLM to count, plus a versioned LLM layer for interpretive work only.
Deterministic Statistics Suite
Computed on the stats box (DuckDB attached to VM Postgres over LAN; Polars + scikit-learn + NetworkX). Results land in stat_* tables.
Ideology (W-NOMINATE-style)
PCA on pairwise vote similarity across bill_votes.
Leadership Centrality
Graph centrality on the 139K-edge cosponsorship network.
Bipartisanship
Rate of cross-party cosponsorships — already computed (Layer 1).
Attendance / Voting
Party unity %, yea/nay/absent counts.
LLM-Interpretive Suite (Prism Factor)
vLLM serving Qwen3 (14B-FP8 → 32B / 30B-A3B MoE once hardware online). Cached by (content_hash, prompt_version, model_id). Never counts, never does graph math — interpretive only.
Prism Score (0–100)
Political-alignment rating from bill-text + voting-pattern analysis.
Effectiveness
Bills passed vs introduced (deterministic component + LLM context).
Donor Alignment
Policy vs donor-industry gap. Deterministic rollup + LLM industry tagging.
Policy Depth / Integrity
LLM-read analysis of bill substance and legislator consistency.
Experimental. Scores are interpretive, not legal or political advice.
Data Domains
| Domain | Status | Description |
|---|---|---|
| Legislators | Active | 8,009 federal + state officeholders; 538 federal, ~7,470 state |
| Bills | Active | 5,531 federal (target ~15K) + 39,931 state (NY + AL state-primary + generic OpenStates). Sponsors, committees, actions. |
| Bill Votes | Active | 10,659 roll-call records from 1,207 rolls (House Clerk + Senate XML), 88% resolved to legislators |
| Campaign Finance | Active | 101,516 contributions ($1.1B) across 6,217 PACs |
| Independent Expenditures | Ingesting | FEC Schedule E (Super PACs + 501c4s). 2024 + 2026 cycles loading; 2020 + 2022 backfill queued. |
| FEC Committee Master | Ingesting | Identity layer for every PAC (~88K committees). Links contributions to committee_type / designation / party / connected org. |
| Committees | Active | 229 federal committees, 3,891 memberships |
| Districts & Demographics | Active | 491 districts (439 congressional + 51 state rollups + national) with Census ACS demographics — population, income, education, race/ethnicity |
| Lobbying (LDA) | Active | Senate Lobbying Disclosure filings + LD-203 contributions. Registrants, clients, lobbyists (incl. revolving door). |
| USAspending | Active | Federal contracts + grants by district. 1,314 legislator-year summaries. |
| Elections & Candidates | Planned | FEC Form 2 candidate filings, race derivation, /elections/2026 product surface |
| State Votes | Planned | OpenStates roll calls — complex to normalize across 50 states |
Compute Topology
Hardware boundary enforces the layer separation — heavy compute stays off the VM so Postgres + nginx + the API stay responsive.
Lab VM (live)
CentOS Stream 9 · 192.168.50.152 (LAN) / 209.71.19.183 (public)
Hosts: PostgreSQL · nginx (site + /api/ reverse proxy) · FastAPI v2 API (systemd, :8077) · Meilisearch (Podman Quadlet, :7700) · ingest goblins (v1 + v2) · orchestrator cron
Stats / LLM Box (planned)
Ubuntu 24.04 LTS · Intel Ultra 9 285K (24-core) · 64GB RAM · RTX 5090 (32GB VRAM) · static LAN IP
Will host: vLLM serving Qwen3 · DuckDB attached to VM Postgres over LAN · Polars + scikit-learn + NetworkX stats venv · Prism Factor LLM inference · writes stat_* tables back to VM.
Documentation
Standard Operating Procedures live in the repo under poliprism/docs/ and are rendered on-site at build time.
- Pipeline Architecture SOP
Staging-first universal architecture: canonical tables, promote pattern, orchestrator template, full ingestion inventory (every goblin with counts + file paths), and finance-track integration notes.
- State-Primary Ingest SOP (
poliprism/docs/STATE-PRIMARY-INGEST-SOP.md)Per-vendor discovery, API stewardship tiers, feasibility template. Reference implementations: NY (Open Legislation) and AL (OpenStates v3 via dedicated processor).
- Three Decoupled Layers (
docs/poliprism-sop/Three_Decoupled_Layers.txt)Layer rules: Site Ops never touches the DB; Data Processing never triggers a site build; Statistics & AI Analytics never triggers build or MV refresh. Hardware boundary reinforces the split.
PoliPrism v2 · Civic Intelligence Platform · Not affiliated with any government entity