Objective measurement first
Every score starts with reproducible measurements taken straight from your code, your reviews, and your CI — numbers a careful engineer could verify by hand. AI is the layer that interprets, never the layer that decides.
Most engineering metrics quietly fail one of two tests: they don’t actually predict the outcome leaders care about, or they don’t hold up the first time someone looks behind the number. GitDash is designed so every dimension passes both. Here’s how.
Every score starts with reproducible measurements taken straight from your code, your reviews, and your CI — numbers a careful engineer could verify by hand. AI is the layer that interprets, never the layer that decides.
Every claim points to the specific pull requests, comments, and code that drove it. If a leader doesn’t believe a number, they’re two clicks away from the underlying facts. No black boxes.
Every dimension is tuned against a labeled set built from your own senior engineers’ judgments — not a generic benchmark. If a dimension can’t earn enough agreement with your team, we don’t ship it on your dashboard.
Definitions evolve. Org charts change. Teams reorganize. When that happens, your history doesn’t get thrown away — the entire back-catalogue gets re-scored under the new view, so trends stay comparable.
A scoped GitHub App install, approved by your admin. Read-only access to the metadata we need — pull requests, reviews, comments, CI status. Short-lived, tenant-isolated credentials. Nothing your security team hasn’t reviewed a hundred times before.
GitHub AppRead-onlyTenant isolationBefore you see your first dashboard, GitDash silently catches up on a full year of pull-request history. Trends start populated — you don’t spend a quarter waiting for data to accrue.
12-month historyIdempotentComplexity, duplication, churn, test-to-code ratio, rework after the first review, architectural boundary violations, security findings already in your CI. Cheap, reproducible, fact-based numbers — the foundation every score is built on.
ReproducibleAuditableHand-reading every review comment across thousands of PRs is the only thing AI is actually good at here — and it’s where most leaders are flying blind. GitDash categorizes each comment (architecture, correctness, security, tests, style, docs), assigns a severity, and ties it back to the line of code that triggered it.
ArchitectureCorrectnessSecurityTestsStyleA reopened or rewritten PR isn’t a number — it’s a story. We line up the first-review state of a PR against what got merged, classify what changed (a redesign? a missing test? a security catch?), and link it back to the human comment that drove the rewrite. Rework rate finally has a "why."
Cause-attributedEvery dimension on your dashboard reports a confidence interval. If we’re not sure, we say so — and a human reviewer in your org gates high-impact or low-confidence scores before they publish. Every number links back to the PRs and comments that produced it. No black boxes.
Confidence per dimensionAudit trailCode ownership maps every PR into the org structure your leadership already uses. You see week-over-week and quarter-over-quarter trends — never a single noisy snapshot — and the view changes cleanly as teams reorganize.
Time-seriesRe-mappableEvery score is continuously compared against a benchmark labeled by your own senior engineers. Where the agreement is high, we publish. Where it’s not, we tune — or we drop the dimension. The dashboard you trust today is the dashboard we re-prove every week.
Human-labeled gold setTuned to your teamThe published research on AI pull-request review is unambiguous: the best available models catch fewer than a third of the issues that experienced human reviewers flag. The numbers get worse, not better, when you feed the AI more context. Anyone selling you an "AI that grades your PRs" hasn’t read the literature — or hopes you haven’t.
That’s why GitDash never lets AI render a verdict. AI is the layer that reads the human conversation at scale — categorizing, summarizing, attributing — while objective measurement and your own senior engineers decide what "good" looks like. It’s the only honest answer we found that survives a CTO’s third question.
Every score is anchored in measurements you could verify by hand — not in an AI’s opinion. Numbers your senior engineers already trust become the spine of the dashboard.
We use AI to read thousands of review conversations the way a careful engineer would — categorizing, attributing, summarizing — not to decide whether a PR is good.
Every dimension carries a confidence score. Low-confidence or high-impact results route to a human in your org before they publish. The dashboard you see is the one your team would have approved.
We’ll backfill 90 days of merged PRs and walk you through the rollup live.