PoliPrism · Methodology
How we compute the scores
Every score on every detail page is reproducible from the canonical PoliPrism database snapshot plus the code version stamped in the record. We document the formulas here so the score is never opaque. AI is bounded to summary and interpretation; the numbers themselves are always deterministic.
The hard line: stats vs. AI
PoliPrism splits computed signals into two layers, with a hard boundary:
Statistical layer
Vote counts, attendance, money totals, graph centrality, ideology
from PCA — all from published formulas over canonical
primary records. Reproducible from the snapshot. Lives in
stat_* tables, computed on the Labwizard CPU venv with
Polars / scikit-learn / NetworkX.
AI inference layer
Summarization, sentiment, fuzzy entity resolution, narrative —
served by Gemma-4-26B-Thinking via vLLM on Labwizard. Every
inference persists method + confidence +
model/prompt version stamps. AI never generates
roll-call math, money totals, or graph metrics.
Prism Factor composite_v1
A 0–100 composite per (legislator, congress) from
stat_prism_composite. Built from four weighted subscores
that capture showing up, doing things,
working across the aisle, and collaborating broadly.
Ideology extremity is shown as a diagnostic, not weighted in.
Subscore formulas
| Subscore | Formula | Weight |
|---|---|---|
| Bipartisanship | 0.5 × bipartisan_voting_pct + 0.5 × bipartisan_edge_pct | 0.30 |
| Productivity | 0.6 × sponsored_pctile_chamber + 0.4 × min(100, law_rate × 10) | 0.25 |
| Collaboration | 100 × eigenvector_pctile_within_chamber_congress | 0.25 |
| Civic Engagement | attendance_pct (capped 0–100) | 0.20 |
Composite = Σ (subscore × weight). Weights sum to 1.0, picked to reward working with people unlike you as the rarest signal and clearest predictor of effectiveness.
Diagnostic — not weighted
Ideology Extremity = 100 × |ideology_dim1| / max_abs_in_(chamber, congress). Computed via 2D PCA on the yea/nay matrix; sign-flipped so dim1 > 0 is conservative. Shown as a 5th cell on the Prism Factor panel for context — it is not part of the composite formula.
Confidence
score_confidence = min of upstream confidences (currently
ideology_confidence). When a legislator has <100 cast votes, the
ideology dimension downweights and the composite confidence drops
with it — sparse-data members never get a high-confidence score from
thin evidence.
Authoritative sources
Inputs trace back to stat_legislator_voting (attendance,
party-unity, bipartisan vote share), stat_legislator_network
(cosponsorship eigenvector centrality + bipartisan edges),
stat_legislator_productivity (bills sponsored, passage,
chamber percentile), and stat_legislator_ideology (PCA
on yea/nay vectors). Each compute job is reproducible from the
canonical voting tables; see Prism Studio
to query the underlying data.
Money Profile money_v1
Per (legislator, cycle) analytical signals over campaign
finance from stat_money_profile. This is the layer above
the raw "total raised" totals: concentration, top-funder share,
sector breakdown, IE rollups.
Signals
- Total receipts + distinct orgs + contribution count — raw scale.
- Top-1 / Top-5 / Top-10 share % — what fraction of receipts come from the largest individual contributors.
- Concentration (HHI) — Herfindahl-Hirschman Index over per-org contribution shares. Bands: Diversified (<0.15), Moderate (0.15–0.25), High (≥0.25). High HHI means a single donor has outsized influence.
- Top sector + industry classified % — the dominant industry/sector and how much of total $ has been industry-classified (the unclassified residue is a known data gap and is shown explicitly).
- IE for / against + counts — independent expenditure activity targeting this legislator. Outside money that doesn't flow into their committee but fights for or against them.
Authoritative sources
campaign_contributions + fec_itemized_contributions +
independent_expenditures, joined to
entities_organizations.industry_label (classifier-driven)
for the sector breakdown. Unclassified contributions are excluded from
sector calculations but counted in the totals.
Donor → Vote Alignment donor_vote_alignment_v1
The marquee cross-domain composite. Per (donor_org,
legislator, cycle) from stat_donor_vote_alignment,
records the donor relationship (amount, rank in the org's recipient
list) AND a derived peer alignment score measuring
how typical this legislator's voting pattern is relative to other
legislators the same org funded in the same cycle.
Question answered
"Among legislators receiving from this donor, how typical is THIS legislator's voting pattern?" High alignment (≥0.6) = typical recipient cluster — the donor bets on members who vote like its other recipients. Low or negative alignment = outlier — got the money, but votes unlike the cohort.
Computation
For each (org, cycle), find the set of legislators that received money from the org. For each (org, legislator, cycle):
- Look up this legislator's vote-vector similarity (from
stat_vote_similarity) against EACH other recipient in the same chamber. - Average those same-chamber peer similarities.
- Persist the average as
peer_alignment_scorewith the count of contributing peers (peer_count).
Vote similarity is itself chamber-scoped (House and Senate vote on different things), so cross-chamber peers contribute nothing — orgs that give to both chambers have separate alignment calcs per leg.
Cycle ↔ Congress mapping
Cycle 2024 + cycle 2026 both map to the 119th Congress — 2024 is
past donor support, 2026 is the live fundraising posture against
ongoing 119th-Congress votes. Cycle 2022 contributions live in
fec_itemized_contributions (bulk Schedule A) only,
not campaign_contributions, so are excluded here
until a future v2 unifies both feeds.
What this is NOT
- NOT a quid-pro-quo claim. Alignment correlates donor preference with voting pattern. Causation runs both directions and is impossible to assert from this data alone.
- NOT bill-position-aware. Doesn't know whether the donor supported or opposed any specific bill — only that other recipients voted similarly or not.
- NOT predictive. Past recipient cohort behavior, not future voting prediction.
Candidate Profile candidate_profile_v1
Per (candidate, cycle) from stat_candidate_profile.
Mirrors Money Profile but candidate-keyed so non-incumbent
challengers get the same analytical layer. Receipts, disbursements,
cash on hand come straight from FEC; we add the cohort percentile
and IE rollup.
Cohort percentile
For each candidate, computes their receipts rank within the
(office, cycle) cohort. Top 10%
(≥90th percentile) shown green; top 25% cyan;
top 50% gold; below 50th red. Cohort size is
shown alongside so a "top 25% of 8" reads differently from
"top 25% of 400".
Org Influence org_influence_v1
Per-org lifetime rollup over campaign_contributions,
from stat_org_influence. Captures
scale (total $ given, contribution count),
breadth (distinct legislators reached, cycles
active), partisan tilt (D/R/Other share +
bipartisan share = min(D-share, R-share) × 2), and the
top single recipient with their share of org
total.
Bipartisan band
Bipartisan share peaks at 1.0 when an org gives 50/50 D/R; drops toward 0 as giving concentrates on one party. Bands shown on the panel: ≥0.6 Balanced, 0.3-0.6 Tilted, 0.05-0.3 Partisan, <0.05 One-party. Most aggregator PACs (ActBlue, WinRed) score 0 because they aggregate single-party donor preferences — the bipartisan signal is for orgs that actually give to both sides.
Filter
Only orgs with at least one campaign_contributions row
are scored — keeps the table to ~5K active political donors out of
531K total entities_organizations. Orgs without
contributions on file render no panel.
Committee Activity committee_activity_v1
Per (committee, congress) hearings rollup from
stat_committee_activity. Joins
hearing_committees × hearings × hearing_witnesses × hearing_bills
to produce four signals: hearings held, hearings with witnesses,
total witnesses, distinct bills referenced. Plus the activity date
range. Latest congress wins on the detail panel.
Sources
hearing_committees— committee ↔ hearing junctionhearings— congress, chamber, meeting_datehearing_witnesses— count per hearinghearing_bills— distinct bills mentioned
Donor Twins donor_overlap_v1
Pairwise legislator-legislator donor-base similarity from
stat_donor_overlap. For each pair (A, B) per cycle/
chamber: Jaccard over the set of donor orgs and
cosine_amount over the per-org $ vector. Cosine
is the headline metric (amount-weighted, robust to one-shot small
donors); Jaccard is shown alongside.
Prism Insight (LLM narratives) *_summary_v1
Per-entity 1-2 sentence neutral summary written by Gemma-4-26B-Thinking on Labwizard, drawing on canonical PoliPrism evidence (sponsored bill titles for legislators; recent hearings + bill-references for committees; bill metadata + sponsor for bills). Distinct from the deterministic numeric Prism Factor composite — this is the narrative layer, NOT a rating of effectiveness or partisan characterization.
Legislators (legislator_summary_v1)
Source: entities_legislators.prism_summary. Evidence:
≤10 most-recent sponsored bill titles, party + state. ~1,100 active
members. Generated by score_legislators.py.
Committees (committee_summary_v1)
Source: entities_committees.prism_summary. Evidence:
≤8 recent hearing titles + ≤6 bills referenced in those hearings.
~200 federal committees. Generated by score_committees.py.
Quality + safety
- Confidence ≥ 0.55 required for write — model self-reports its own confidence; below the floor we skip rather than guess.
- Neutral-tone prompting: "describe what they work on, NOT whether they do it well; not partisan framing; not predictions". Per architecture rule #11.
- "AI-derived" badge always shown alongside the summary in the UI. Model + scored-at version stamps persisted.
- Idempotent re-scoring: jobs skip already-scored rows by default; pass
--rescoreto force regeneration.
Versioning + reproducibility
Every row in every stat_* table carries a
method_version stamp (currently composite_v1,
money_v1, etc.) and a computed_at timestamp.
When a formula changes the version bumps and we recompute. When the
upstream input HWM (high-water mark) hasn't moved, the recompute
auto-skips, keeping the pipeline cheap.
The stats_runs log captures every Labwizard execution
(input HWM, method version, row count, status) so any score on the
site is traceable to a specific run.
What we don't publish (yet)
- Bill-level Prism scores (LLM-derived legislative- impact scoring on individual bills). Coverage is currently sparse; the panel renders only on the small set with finalized scores. Backfill of all federal bills 118–119 is in progress.
- Donor families. Curated 50-family seed is live; algorithmic family resolution beyond the seed is an open project.
- Cross-domain composites (e.g. "donor → sponsor → vote alignment" indices). The graph exists; the published indices on top of it are deliberate work to do, not generated as side effects of the existing scoring.
Found something inconsistent or under-explained? Open an issue or hit Prism Studio to query the underlying records directly.