PoliPrism · Methodology

How we compute the scores

Every score on every detail page is reproducible from the canonical PoliPrism database snapshot plus the code version stamped in the record. We document the formulas here so the score is never opaque. AI is bounded to summary and interpretation; the numbers themselves are always deterministic.

The hard line: stats vs. AI

PoliPrism splits computed signals into two layers, with a hard boundary:

Statistical layer

Vote counts, attendance, money totals, graph centrality, ideology from PCA — all from published formulas over canonical primary records. Reproducible from the snapshot. Lives in stat_* tables, computed on the Labwizard CPU venv with Polars / scikit-learn / NetworkX.

AI inference layer

Summarization, sentiment, fuzzy entity resolution, narrative — served by Gemma-4-26B-Thinking via vLLM on Labwizard. Every inference persists method + confidence + model/prompt version stamps. AI never generates roll-call math, money totals, or graph metrics.

Prism Factor composite_v1

A 0–100 composite per (legislator, congress) from stat_prism_composite. Built from four weighted subscores that capture showing up, doing things, working across the aisle, and collaborating broadly. Ideology extremity is shown as a diagnostic, not weighted in.

Subscore formulas

Subscore Formula Weight
Bipartisanship 0.5 × bipartisan_voting_pct + 0.5 × bipartisan_edge_pct 0.30
Productivity 0.6 × sponsored_pctile_chamber + 0.4 × min(100, law_rate × 10) 0.25
Collaboration 100 × eigenvector_pctile_within_chamber_congress 0.25
Civic Engagement attendance_pct (capped 0–100) 0.20

Composite = Σ (subscore × weight). Weights sum to 1.0, picked to reward working with people unlike you as the rarest signal and clearest predictor of effectiveness.

Diagnostic — not weighted

Ideology Extremity = 100 × |ideology_dim1| / max_abs_in_(chamber, congress). Computed via 2D PCA on the yea/nay matrix; sign-flipped so dim1 > 0 is conservative. Shown as a 5th cell on the Prism Factor panel for context — it is not part of the composite formula.

Confidence

score_confidence = min of upstream confidences (currently ideology_confidence). When a legislator has <100 cast votes, the ideology dimension downweights and the composite confidence drops with it — sparse-data members never get a high-confidence score from thin evidence.

Authoritative sources

Inputs trace back to stat_legislator_voting (attendance, party-unity, bipartisan vote share), stat_legislator_network (cosponsorship eigenvector centrality + bipartisan edges), stat_legislator_productivity (bills sponsored, passage, chamber percentile), and stat_legislator_ideology (PCA on yea/nay vectors). Each compute job is reproducible from the canonical voting tables; see Prism Studio to query the underlying data.

Money Profile money_v1

Per (legislator, cycle) analytical signals over campaign finance from stat_money_profile. This is the layer above the raw "total raised" totals: concentration, top-funder share, sector breakdown, IE rollups.

Signals

  • Total receipts + distinct orgs + contribution count — raw scale.
  • Top-1 / Top-5 / Top-10 share % — what fraction of receipts come from the largest individual contributors.
  • Concentration (HHI) — Herfindahl-Hirschman Index over per-org contribution shares. Bands: Diversified (<0.15), Moderate (0.15–0.25), High (≥0.25). High HHI means a single donor has outsized influence.
  • Top sector + industry classified % — the dominant industry/sector and how much of total $ has been industry-classified (the unclassified residue is a known data gap and is shown explicitly).
  • IE for / against + counts — independent expenditure activity targeting this legislator. Outside money that doesn't flow into their committee but fights for or against them.

Authoritative sources

campaign_contributions + fec_itemized_contributions + independent_expenditures, joined to entities_organizations.industry_label (classifier-driven) for the sector breakdown. Unclassified contributions are excluded from sector calculations but counted in the totals.

Donor → Vote Alignment donor_vote_alignment_v1

The marquee cross-domain composite. Per (donor_org, legislator, cycle) from stat_donor_vote_alignment, records the donor relationship (amount, rank in the org's recipient list) AND a derived peer alignment score measuring how typical this legislator's voting pattern is relative to other legislators the same org funded in the same cycle.

Question answered

"Among legislators receiving from this donor, how typical is THIS legislator's voting pattern?" High alignment (≥0.6) = typical recipient cluster — the donor bets on members who vote like its other recipients. Low or negative alignment = outlier — got the money, but votes unlike the cohort.

Computation

For each (org, cycle), find the set of legislators that received money from the org. For each (org, legislator, cycle):

  1. Look up this legislator's vote-vector similarity (from stat_vote_similarity) against EACH other recipient in the same chamber.
  2. Average those same-chamber peer similarities.
  3. Persist the average as peer_alignment_score with the count of contributing peers (peer_count).

Vote similarity is itself chamber-scoped (House and Senate vote on different things), so cross-chamber peers contribute nothing — orgs that give to both chambers have separate alignment calcs per leg.

Cycle ↔ Congress mapping

Cycle 2024 + cycle 2026 both map to the 119th Congress — 2024 is past donor support, 2026 is the live fundraising posture against ongoing 119th-Congress votes. Cycle 2022 contributions live in fec_itemized_contributions (bulk Schedule A) only, not campaign_contributions, so are excluded here until a future v2 unifies both feeds.

What this is NOT

  • NOT a quid-pro-quo claim. Alignment correlates donor preference with voting pattern. Causation runs both directions and is impossible to assert from this data alone.
  • NOT bill-position-aware. Doesn't know whether the donor supported or opposed any specific bill — only that other recipients voted similarly or not.
  • NOT predictive. Past recipient cohort behavior, not future voting prediction.

Candidate Profile candidate_profile_v1

Per (candidate, cycle) from stat_candidate_profile. Mirrors Money Profile but candidate-keyed so non-incumbent challengers get the same analytical layer. Receipts, disbursements, cash on hand come straight from FEC; we add the cohort percentile and IE rollup.

Cohort percentile

For each candidate, computes their receipts rank within the (office, cycle) cohort. Top 10% (≥90th percentile) shown green; top 25% cyan; top 50% gold; below 50th red. Cohort size is shown alongside so a "top 25% of 8" reads differently from "top 25% of 400".

Org Influence org_influence_v1

Per-org lifetime rollup over campaign_contributions, from stat_org_influence. Captures scale (total $ given, contribution count), breadth (distinct legislators reached, cycles active), partisan tilt (D/R/Other share + bipartisan share = min(D-share, R-share) × 2), and the top single recipient with their share of org total.

Bipartisan band

Bipartisan share peaks at 1.0 when an org gives 50/50 D/R; drops toward 0 as giving concentrates on one party. Bands shown on the panel: ≥0.6 Balanced, 0.3-0.6 Tilted, 0.05-0.3 Partisan, <0.05 One-party. Most aggregator PACs (ActBlue, WinRed) score 0 because they aggregate single-party donor preferences — the bipartisan signal is for orgs that actually give to both sides.

Filter

Only orgs with at least one campaign_contributions row are scored — keeps the table to ~5K active political donors out of 531K total entities_organizations. Orgs without contributions on file render no panel.

Committee Activity committee_activity_v1

Per (committee, congress) hearings rollup from stat_committee_activity. Joins hearing_committees × hearings × hearing_witnesses × hearing_bills to produce four signals: hearings held, hearings with witnesses, total witnesses, distinct bills referenced. Plus the activity date range. Latest congress wins on the detail panel.

Sources

  • hearing_committees — committee ↔ hearing junction
  • hearings — congress, chamber, meeting_date
  • hearing_witnesses — count per hearing
  • hearing_bills — distinct bills mentioned

Donor Twins donor_overlap_v1

Pairwise legislator-legislator donor-base similarity from stat_donor_overlap. For each pair (A, B) per cycle/ chamber: Jaccard over the set of donor orgs and cosine_amount over the per-org $ vector. Cosine is the headline metric (amount-weighted, robust to one-shot small donors); Jaccard is shown alongside.

Prism Insight (LLM narratives) *_summary_v1

Per-entity 1-2 sentence neutral summary written by Gemma-4-26B-Thinking on Labwizard, drawing on canonical PoliPrism evidence (sponsored bill titles for legislators; recent hearings + bill-references for committees; bill metadata + sponsor for bills). Distinct from the deterministic numeric Prism Factor composite — this is the narrative layer, NOT a rating of effectiveness or partisan characterization.

Legislators (legislator_summary_v1)

Source: entities_legislators.prism_summary. Evidence: ≤10 most-recent sponsored bill titles, party + state. ~1,100 active members. Generated by score_legislators.py.

Committees (committee_summary_v1)

Source: entities_committees.prism_summary. Evidence: ≤8 recent hearing titles + ≤6 bills referenced in those hearings. ~200 federal committees. Generated by score_committees.py.

Quality + safety

  • Confidence ≥ 0.55 required for write — model self-reports its own confidence; below the floor we skip rather than guess.
  • Neutral-tone prompting: "describe what they work on, NOT whether they do it well; not partisan framing; not predictions". Per architecture rule #11.
  • "AI-derived" badge always shown alongside the summary in the UI. Model + scored-at version stamps persisted.
  • Idempotent re-scoring: jobs skip already-scored rows by default; pass --rescore to force regeneration.

Versioning + reproducibility

Every row in every stat_* table carries a method_version stamp (currently composite_v1, money_v1, etc.) and a computed_at timestamp. When a formula changes the version bumps and we recompute. When the upstream input HWM (high-water mark) hasn't moved, the recompute auto-skips, keeping the pipeline cheap.

The stats_runs log captures every Labwizard execution (input HWM, method version, row count, status) so any score on the site is traceable to a specific run.

What we don't publish (yet)

  • Bill-level Prism scores (LLM-derived legislative- impact scoring on individual bills). Coverage is currently sparse; the panel renders only on the small set with finalized scores. Backfill of all federal bills 118–119 is in progress.
  • Donor families. Curated 50-family seed is live; algorithmic family resolution beyond the seed is an open project.
  • Cross-domain composites (e.g. "donor → sponsor → vote alignment" indices). The graph exists; the published indices on top of it are deliberate work to do, not generated as side effects of the existing scoring.

Found something inconsistent or under-explained? Open an issue or hit Prism Studio to query the underlying records directly.