Turn interview transcripts into a structured scorecard

A consistent, evidence-backed scorecard for every candidate, generated from the interview transcript against the same rubric every time, so panel decisions stop being a clash of gut feelings.

What you'll have when you're done

A scorecard per candidate that scores the competencies you actually care about, cites a verbatim quote for each, and looks the same across every interviewer and every candidate. Instead of four people remembering four different things, you get one comparable record built from what was actually said. Hiring decisions get faster and more honest, and you have evidence behind every rating. The human still decides; the AI makes the inputs consistent.

Panel interviews produce four sets of notes that never agree

You run a candidate through four interviewers and get back four blurry impressions: one loved them, one had "a feeling," nobody wrote down why, and none of it lines up into a decision. The notes are inconsistent because the process is, each person listened for different things and recorded them differently. I have sat in debriefs where the loudest, most confident voice won, and then watched us regret the hire six months later, not because the data was wrong but because there was no data, just whoever argued hardest. The expensive part was never the disagreement. It was that we had nothing to disagree about.

Fix the consistency and you fix the decision. If every interview is transcribed and run through the same rubric, you get scorecards you can actually compare. This is the Granola meeting-transcript pipeline applied to hiring: the call gets captured automatically, and a saved prompt turns it into a structured evaluation. The catch is legal and ethical, recording people requires consent, so that comes first.

What you need first

Granola (or your meeting-notes tool) to transcribe interviews, and consent to record. This is not optional: Illinois has an AI Video Interview Act requiring notice and consent, and two-party-consent states require it generally. Tell candidates you record and why.
A competency rubric: the 3-5 things you score every candidate on (e.g., communication, judgment, ownership), defined as job-related, not "culture fit."
A Granola Recipe (its saved-prompt feature) or a Claude Project on a business plan to run the rubric consistently.

Step-by-step

Before anything, tell candidates the interview is recorded and transcribed for evaluation, and get their consent. Beyond the legal requirement, it is the decent thing and it sets the tone. Then let Granola capture the conversation without a bot awkwardly joining the call.

Step 2Build the rubric once, as a Recipe

Define your scorecard as a reusable Granola Recipe (or a Project instruction). The Recipe is what enforces consistency, every interview gets scored the same way:

Score this interview transcript 1-5 on each: Communication, Judgment under
ambiguity, Ownership. For each score, cite a verbatim quote from the transcript
as evidence. List any red flags with the quote that prompted them. Score ONLY
these competencies. Do not infer demographics, personality, or culture fit.
Do not recommend hire/no-hire; that is the panel's call.

Step 3Run it on every interview

After each interview, run the Recipe on the transcript. You get a scorecard with scores and the exact quotes behind them. Run it across multiple candidates and you can ask the tool to lay them side by side on the same competencies, an apples-to-apples comparison built from evidence, not memory.

Here is the shape of one candidate's scorecard from one interviewer, illustrative:

Candidate C · Interviewer: VP Eng · Role: Senior PM

Competency Score Evidence

Communication 4/5 "I'd frame it as three bets, rank them by reversibility, and ship the cheapest test first."

Judgment under ambiguity 5/5 "We didn't have the data, so I ran a one-week painted-door test before committing eng time."

Ownership 3/5 "The launch slipped, and the design team was behind." (deflected; no first-person fix)

Red flags: On Ownership, attributed the slip to another team without naming their own corrective action. Worth probing in a follow-up.

Competency	Score	Evidence
Communication	4/5	"I'd frame it as three bets, rank them by reversibility, and ship the cheapest test first."
Judgment under ambiguity	5/5	"We didn't have the data, so I ran a one-week painted-door test before committing eng time."
Ownership	3/5	"The launch slipped, and the design team was behind." (deflected; no first-person fix)

The scores are not the product. The quotes are. A "3/5 on Ownership" you could argue with; "the design team was behind, no first-person fix" is a specific thing you can re-read and judge for yourself.

Step 4The panel decides, with the evidence in front of them

Bring the scorecards to the debrief. Now the conversation is "candidate A scored higher on judgment, here is the quote, do we agree?" instead of "I just liked them more." The AI made the inputs consistent and evidence-backed; the panel still makes the human decision. Never let the scorecard auto-select the hire.

Watch what that does to the room. Without the scorecards, the debrief is: "I really liked C." "Hm, I had a worry but I can't name it." Forty minutes of vibes, decided by whoever is most confident. With them, the VP Eng's 3/5 on Ownership and its quote sit right next to another interviewer's 4/5, and the disagreement gets specific and fast: "I scored Ownership higher because later they said they rebuilt the timeline themselves. Did you catch that line?" You either reconcile in five minutes or you have found the exact thing to probe in a follow-up call. The scorecards do not make the decision. They make the disagreement about the evidence instead of about who is most sure of themselves.

Step 5Spot-check the quotes

Transcription is not perfect, especially with accents, names, or technical terms, and a misheard line can distort a score. Before a score swings a decision, confirm the cited quote against the transcript. The evidence-citation step (Step 2) makes this a quick check rather than a re-listen. A concrete example of why it matters: a transcript once rendered a candidate saying "I owned the migration" as "I joined the migration," and the model docked them a point on Ownership for sounding like a bystander. One word, one score, potentially one wrong rejection. You only catch that by reading the cited line, which takes seconds when the quote is sitting right there in the scorecard.

How you'll know it's working

Debriefs get shorter and sharper, because everyone is looking at the same evidence. Disagreements become productive ("I read that quote differently") instead of vibes-based. And you can defend any hiring decision with a consistent, documented process, which is exactly what you want if a rejected candidate ever questions it: the same rubric, applied to everyone, with the verbatim quotes that drove each score, is a far stronger answer than "the panel felt candidate B was a better fit."

When it breaks

Scores feel inconsistent across candidates. Someone ran a different prompt. The whole point is one Recipe for everyone; lock it down.
The transcript is garbled. Accents and cross-talk hurt accuracy. Verify quotes before trusting scores, and consider re-recording key segments.
The AI recommends a hire. Your prompt slipped. It scores competencies with evidence; the panel decides. Restate that.
A candidate objects to recording. Honor it, do not record. Have a manual-notes fallback for anyone who declines, and never penalize the decline.
It scored "culture fit" or inferred a trait. Cut it. Score only the defined, job-related competencies (see adverse impact for why).
Every candidate scores a 4 or a 5. The rubric has no teeth. Anchor each competency with what a 2 versus a 4 actually looks like ("2 = states an opinion; 4 = states it and names the tradeoff they weighed"), and the scores start to spread.
One interviewer scores consistently harder than the rest. That is calibration drift, not a tool fault. Have the panel score the same practice transcript once and compare; the scorecard makes the gap visible, which is the first step to closing it.

Make it yours. Keep it to three to five competencies, no more, or the scores stop being comparable across candidates. Swap the competencies to fit the role: an IC engineer might be scored on technical depth and collaboration, a sales leader on pipeline rigor and coaching. Define the rubric once per role level and reuse it for everyone in that loop, which is the whole source of the consistency.

Where this fits in your harness

This is the middle of the hiring pipeline: candidates who clear the resume screen get interviewed and scored here. It is also the same Granola pipeline that powers pre-meeting briefs and customer-call follow-ups elsewhere on this site, one capture layer, many uses. A sharp job description upstream means the candidates you are scoring were the right ones to interview in the first place.

The architecture behind this workflow.

Two operator manuals for the same job, run two ways: OpenCLAW for the always-on harness, Claude Code for the focused-work CLI. Pick one, or get the bundle for $149.

Browse the books · $99 each

Want one workflow like this taken apart end-to-end every week? The Tuesday Pro Deep Dive · $39/mo.