Standard Operating Procedure

PoliPrism v2 — Site Details

Architecture, tech stack, database schema, deploy workflow, and operational reference for the PoliPrism civic intelligence platform.

Platform Overview

PoliPrism is a civic intelligence platform — structured public data (legislators, bills, committees, campaign finance) enabling research, LLM analysis (Prism Factor), and cross-domain intelligence.

The platform has three decoupled layers:

Site Ops

Frontend build + deploy. Never touches the database. Astro static output served by nginx.

Data Processing

Ingest goblins fetch from external APIs → staging → promote to canonical tables → refresh MVs. Never triggers a site build.

Statistics & AI Analytics

Two-suite analytics: deterministic statistics (DuckDB + Polars + scikit-learn + NetworkX — ideology, centrality, pairwise similarity) and LLM-interpretive Prism Factor scoring (vLLM + Qwen3). Both run on the stats/LLM box; results land in stat_* tables. Never triggers build or MV refresh.

Tech Stack

Frontend

FrameworkAstro 6 (static output)
Interactive IslandsSvelte 5 (runes)
StylingTailwind CSS v4
Componentsshadcn-svelte (Bits UI)
Data FetchingTanStack Query (Svelte)
Data TablesTanStack Table (Svelte)
ChartsLayerChart (D3)
ValidationZod
SearchFuse.js
IconsLucide

Backend

API FrameworkFastAPI (Python)
Response ModelsPydantic v2
DatabasePostgreSQL
Read ModelMaterialized Views
DB Driverpsycopg2 (connection pool)
Search BackendMeilisearch (Podman Quadlet)
Vector Searchpgvector (bill text embeddings, planned)

Infrastructure

Web Servernginx
Process Mgmtsystemd
TLSLet's Encrypt / Certbot
DNSBIND (split-horizon)
Build RuntimeNode.js 22
API RuntimePython 3.9
OSCentOS Stream 9

Database Schema

Canonical Tables

Table Domain Key Identifiers
entities_legislatorsLegislatorsid (PK), people_slug (NOT NULL, canonical URL slug), bioguide_id (federal), openstates_id (state)
entities_billsBillsid (PK), congress + bill_type + bill_number (federal), openstates_bill_id or nyleg_bill_id (state)
entities_committeesCommitteesid (PK), bioguide_committee_id
entities_organizationsFinance/Orgsid (PK), fec_committee_id
entities_districtsDistrictsid (PK), district_slug (e.g. ca-12, us-ca, us-all), level (federal_house / state / national)
entities_lobbying_registrantsLobbying firmsid (PK), registrant_slug, lda_registrant_id
entities_lobbying_clientsLobbying clientsid (PK), client_slug, lda_client_id, org_id → entities_organizations
entities_lobbyistsLobbyistsid (PK), lobbyist_slug, lda_lobbyist_id, legislator_id → entities_legislators (revolving door)
entities_fec_committeesFEC committee masterid (PK), committee_id (C00…), organization_id → entities_organizations. Identity layer for every PAC.

Junction / Transactional Tables

bill_sponsorships bill_votes committee_memberships campaign_contributions independent_expenditures usaspending_district_summary district_demographics lobbying_filings lobbying_contributions

Materialized Views (API Read Model)

mv_legislator_profiles — denormalized legislator + scores + finance aggregations + office info
mv_bills_list — bills with sponsor info joined from entities_legislators
mv_bill_detail — extended bill view with committees and vote counts
mv_finance_ideology_gap — donor-ideology alignment analysis per legislator per industry
mv_district_profile — latest demographics per district
mv_legislator_district_context — legislators joined to district demographics
mv_lobbying_registrant_summary / mv_lobbying_client_summary / mv_lobbying_issue_summary — lobbying rollups
mv_fec_committee_summary — FEC committee × total contributions × recipient count

API Architecture

Modular FastAPI at api/. Every response is a Pydantic model. List endpoints return PaginatedResponse[T] with items, total, page, per_page, pages.

Design Rules

  • Canonical slug = people_slug (never bioguide_id in URLs)
  • Flat scores: prism_score: float | None (not nested objects)
  • State-members merged into /api/legislators?level=state
  • State-bills merged into /api/bills?level=state
  • Connection pooling via psycopg2 ThreadedConnectionPool

Endpoint Groups

Legislators

/api/legislators

/api/legislators/{slug}

/api/legislators/stats

/api/legislators/facets

Bills

/api/bills

/api/bills/{slug}

/api/bills/stats

/api/bills/facets

Finance

/api/finance/overview

/api/finance/organizations

/api/finance/contributions

/api/finance/independent-expenditures

Committees

/api/committees

/api/committees/{slug}

/api/committees/stats

/api/committees/facets

Districts

/api/districts

/api/districts/{slug}

/api/districts/stats

/api/districts/facets

Lobbying

/api/lobbying/registrants

/api/lobbying/clients

/api/lobbying/lobbyists

/api/lobbying/stats

Search (Meilisearch-backed, federated)

/api/search?q=…&types=legislator,bill,…&page=…&sort=field:asc|desc

Federated across 8 indexes: legislators, bills, committees, districts, organizations, lobbying_registrants, lobbying_clients, lobbyists.

Deploy Workflow

Frontend Deploy (2-3 seconds)

  1. git push (from dev machine to NAS bare repo)
  2. VM: git pull (from NAS mount at /mnt/nas/PoliPrism.git)
  3. VM: npm ci && astro build (~60,000 static pages across all entity types)
  4. VM: rsync dist/ → /usr/share/nginx/html/poliprism/

Data Pipeline (independent)

  1. Goblins ingest from external APIs → staging tables
  2. Promote scripts move staging → canonical tables
  3. MV refresh runs ONCE after processing
  4. API reads from MVs — site sees new data on next request

No site rebuild needed when data changes.

AI Scoring (independent)

  1. LLM models score bills (Prism Factor: policy alignment, category, summary)
  2. Legislator scores computed from bill positions + finance + effectiveness
  3. Scores written to entities_legislators and entities_bills
  4. MV refresh makes scores visible to API

Prism Factor

Two-suite analytics architecture — deterministic statistics that never ask an LLM to count, plus a versioned LLM layer for interpretive work only.

Deterministic Statistics Suite

Computed on the stats box (DuckDB attached to VM Postgres over LAN; Polars + scikit-learn + NetworkX). Results land in stat_* tables.

Ideology (W-NOMINATE-style)

PCA on pairwise vote similarity across bill_votes.

Leadership Centrality

Graph centrality on the 139K-edge cosponsorship network.

Bipartisanship

Rate of cross-party cosponsorships — already computed (Layer 1).

Attendance / Voting

Party unity %, yea/nay/absent counts.

LLM-Interpretive Suite (Prism Factor)

vLLM serving Qwen3 (14B-FP8 → 32B / 30B-A3B MoE once hardware online). Cached by (content_hash, prompt_version, model_id). Never counts, never does graph math — interpretive only.

Prism Score (0–100)

Political-alignment rating from bill-text + voting-pattern analysis.

Effectiveness

Bills passed vs introduced (deterministic component + LLM context).

Donor Alignment

Policy vs donor-industry gap. Deterministic rollup + LLM industry tagging.

Policy Depth / Integrity

LLM-read analysis of bill substance and legislator consistency.

Experimental. Scores are interpretive, not legal or political advice.

Data Domains

Domain Status Description
LegislatorsActive8,009 federal + state officeholders; 538 federal, ~7,470 state
BillsActive5,531 federal (target ~15K) + 39,931 state (NY + AL state-primary + generic OpenStates). Sponsors, committees, actions.
Bill VotesActive10,659 roll-call records from 1,207 rolls (House Clerk + Senate XML), 88% resolved to legislators
Campaign FinanceActive101,516 contributions ($1.1B) across 6,217 PACs
Independent ExpendituresIngestingFEC Schedule E (Super PACs + 501c4s). 2024 + 2026 cycles loading; 2020 + 2022 backfill queued.
FEC Committee MasterIngestingIdentity layer for every PAC (~88K committees). Links contributions to committee_type / designation / party / connected org.
CommitteesActive229 federal committees, 3,891 memberships
Districts & DemographicsActive491 districts (439 congressional + 51 state rollups + national) with Census ACS demographics — population, income, education, race/ethnicity
Lobbying (LDA)ActiveSenate Lobbying Disclosure filings + LD-203 contributions. Registrants, clients, lobbyists (incl. revolving door).
USAspendingActiveFederal contracts + grants by district. 1,314 legislator-year summaries.
Elections & CandidatesPlannedFEC Form 2 candidate filings, race derivation, /elections/2026 product surface
State VotesPlannedOpenStates roll calls — complex to normalize across 50 states

Compute Topology

Hardware boundary enforces the layer separation — heavy compute stays off the VM so Postgres + nginx + the API stay responsive.

Lab VM (live)

CentOS Stream 9 · 192.168.50.152 (LAN) / 209.71.19.183 (public)

Hosts: PostgreSQL · nginx (site + /api/ reverse proxy) · FastAPI v2 API (systemd, :8077) · Meilisearch (Podman Quadlet, :7700) · ingest goblins (v1 + v2) · orchestrator cron

Stats / LLM Box (planned)

Ubuntu 24.04 LTS · Intel Ultra 9 285K (24-core) · 64GB RAM · RTX 5090 (32GB VRAM) · static LAN IP

Will host: vLLM serving Qwen3 · DuckDB attached to VM Postgres over LAN · Polars + scikit-learn + NetworkX stats venv · Prism Factor LLM inference · writes stat_* tables back to VM.

Documentation

Standard Operating Procedures live in the repo under poliprism/docs/ and are rendered on-site at build time.

  • Pipeline Architecture SOP

    Staging-first universal architecture: canonical tables, promote pattern, orchestrator template, full ingestion inventory (every goblin with counts + file paths), and finance-track integration notes.

  • State-Primary Ingest SOP (poliprism/docs/STATE-PRIMARY-INGEST-SOP.md)

    Per-vendor discovery, API stewardship tiers, feasibility template. Reference implementations: NY (Open Legislation) and AL (OpenStates v3 via dedicated processor).

  • Three Decoupled Layers (docs/poliprism-sop/Three_Decoupled_Layers.txt)

    Layer rules: Site Ops never touches the DB; Data Processing never triggers a site build; Statistics & AI Analytics never triggers build or MV refresh. Hardware boundary reinforces the split.

PoliPrism v2 · Civic Intelligence Platform · Not affiliated with any government entity