Back to Blog
    When AI Does the Work, Who Owns the Outcome?
    AI & HR Technology
    Leadership & Strategy

    By Graham Thornton

    When AI Does the Work, Who Owns the Outcome?

    AI makes work faster. Without validation, it makes failure faster. What talent teams must change as automation scales.

    Late last year, Deloitte agreed to refund $440,000 to the Australian government after delivering a report that included fake academic citations, fabricated court quotes, and nonexistent references. The culprit wasn’t malice or incompetence. It was AI hallucinations that nobody caught before the work went out the door.

    People like to say that 20–30% of AI outputs are simply wrong. In this case, that statistic showed up in a government-commissioned report from one of the world’s most respected consulting firms.

    This could have been any firm. Deloitte just happened to be the one that got caught publicly. Versions of this are happening everywhere right now: consulting firms, law practices, agencies, internal strategy teams, and recruiting departments. Someone is using AI to draft analysis, write reports, screen candidates, or generate outreach. Someone else is skimming it, thinking “looks good enough,” and shipping it.

    The problem isn’t that people are using AI. Everyone should be experimenting with AI. The problem is that almost nobody has figured out what quality assurance actually looks like when a machine is doing the first draft.


    The Old Model Is Breaking Down

    The traditional professional services model worked like this: junior people do grunt work (research, data compilation, slide formatting, resume screening). Senior people review it, add strategic insight, and make sure it doesn’t say anything stupid. Managers and partners own the client relationship and the outcome.

    That model assumed juniors would eventually become seniors by learning judgment through repetition. Do enough slide decks. Screen enough resumes. Review enough candidates. Eventually, you internalize what “good” looks like.

    If AI is doing the grunt work now, the obvious question is: how does judgment develop?

    There’s also an economic question most organizations are avoiding. If AI does the work faster, but you still need the same level of human review to ensure quality, where are the savings?

    In Deloitte’s case, AI presumably sped up delivery. But someone still needed to validate the work. They didn’t. Or they did it poorly. The result was a refund, reputational damage, and scrutiny that likely cost far more than $440K.

    The trap is assuming faster automatically means cheaper. If faster also means “needs more intensive review,” you haven’t eliminated cost. You’ve just moved it.

    And if you skip the review to capture the savings, you eventually end up writing checks and issuing apologies.


    What Quality Assurance Actually Looks Like Now

    Most organizations are still running QA processes designed for human-generated work. They’re checking for typos, formatting errors, and math mistakes—not hallucinations or fabricated information.

    The questions they should be asking look different:

    • Is this conclusion logical, or just confident-sounding?

    • Would a real expert in this field agree with it?

    • Did a human actually verify this data source?

    • Are these citations real?

    • Does this candidate actually match the job requirements, or just the keywords?

    AI doesn’t make random mistakes. It makes confident mistakes.

    It fabricates academic paper titles that sound plausible. It cites court cases that feel real. It flags candidates as qualified based on pattern matching that has nothing to do with whether they can actually do the job.

    And if you’re not explicitly looking for those problems, you won’t catch them. Your brain skims right past them because everything looks right.

    We’ve written before about “workslop”: AI-generated work that appears fine at first glance but lacks substance. A bad recruiting email is annoying. A $440K report with fake citations is a crisis. Both come from the same root cause: speed without validation.


    This Is Already Happening in Talent Acquisition

    If you think this is just a consulting problem, you’re missing what’s happening inside TA teams right now.

    AI sourcing tools are pulling candidates based on keyword matches that don’t reflect actual qualifications. Recruiters are reaching out to hundreds of “matches” who aren’t remotely qualified because nobody validated what the system considers a match.

    AI-generated outreach sounds generic because everyone is using similar prompts. Candidates are getting nearly identical “personalized” messages from five different employers. Employer brand erosion is happening quietly, message by message.

    AI screening tools are making pass/fail decisions based on pattern recognition. Good candidates are being filtered out systematically. Nobody’s checking why, because the system is “working” — meaning it’s moving fast.

    Chatbots are answering candidate questions with confident but incorrect information about benefits, policies, or job requirements. Candidates make decisions based on that information. The fallout shows up months later.

    Most TA teams are measuring:

    • Number of candidates sourced

    • Outreach open rates

    • Time to fill

    • Cost per hire

    What they’re not measuring:

    • Quality of candidates sourced beyond “did they get hired”

    • Whether AI outreach is damaging employer brand perception

    • Whether AI screening is systematically biasing against good candidates

    • Whether automated responses are accurate

    • Whether tools are actually finding people who can do the job


    The Diagnostic Gap

    This conversation happens constantly:

    Us: “How do you know your AI sourcing tool is finding qualified people?”

    Client: “Our recruiters review candidates before reaching out.”

    Us: “How much time are they spending per candidate?”

    Client: “They’re looking at hundreds of profiles a day… maybe 30 seconds each?”

    Us: “So they’re not actually reviewing them. They’re speed-checking that a human exists and clicking approve.”

    Client: “Yeah.”

    That’s not quality assurance. That’s the illusion of human oversight.

    We’ve made versions of this mistake ourselves earlier in our careers. Anyone optimizing for speed eventually does—until something breaks.

    This is why our AI & Talent Technology Consulting work starts with diagnostics, not implementation. Before you deploy tools, you need to answer:

    • What work actually requires human judgment?

    • What could go wrong if the technology makes mistakes?

    • How would we know if it did?

    • What does “good” actually look like, and can we measure it?

    Most companies skip straight to implementation. They buy the tool, turn it on, track activity metrics, and hope for the best. Then they’re surprised when quality drops or candidate experience suffers.


    What Actually Needs to Change

    You can’t bolt AI onto existing processes and hope it works. You have to redesign the work.

    Define What Requires Human Judgment

    Not everything needs human validation. But you need to be explicit about what does.

    • Low stakes, high volume, clear success metrics: automation with light monitoring

    • High stakes, candidate-facing, nuanced interpretation: human validation mandatory

    • Novel situations, strategic decisions, reputational risk: human-led, AI-supported

    The Deloitte report was high stakes and client-facing. It needed human validation. It didn’t get adequate validation. That was the failure.


    Build Validation Workflows, Not Just Automation Workflows

    Every automated process needs a corresponding QA workflow. Not a checkbox. A real system.

    Ask:

    • What could go wrong here?

    • How would we detect it?

    • Who owns catching it?

    • What does “good” mean, and can we measure it?

    In recruiting, that looks like:

    • Random sampling of AI-sourced candidates using a quality rubric

    • A/B testing AI-generated outreach against human-written for response quality and brand impact

    • Regular audits of screening decisions with demographic breakdowns

    • Candidate feedback on automated touchpoints

    • Periodic chatbot transcript reviews to catch misinformation

    You can’t assume technology is working. You have to check.


    Develop New Skills Alongside New Tools

    When AI handles execution, different skills matter:

    • Pattern recognition

    • Domain expertise

    • Quality design

    • Systems thinking

    These aren’t skills you get from vendor training. They’re judgment developed intentionally over time.


    Who Will Win

    The old competitive advantage in TA was access to great recruiters who could do the work.

    The new advantage is recruiters who know what technology can and can’t do—and have the judgment to tell the difference.

    That requires more than buying tools. It requires:

    • Rethinking how work gets done

    • Redefining quality

    • Building validation into automation

    • Developing judgment that doesn’t scale like code

    At Talivity, we help organizations implement AI and talent technology with quality built in from the start. With metrics that go beyond speed and volume. With validation workflows designed to catch problems early.

    Because the alternative is familiar: expensive mistakes, quietly compounding, until they’re impossible to ignore.


    The Bottom Line

    The Deloitte refund wasn’t really about AI. It was about optimizing for speed without redesigning for quality.

    Organizations that succeed with AI over the next five years won’t be the ones with the most tools. They’ll be the ones who decided, explicitly, who owns the outcome after technology does the work.

    In talent acquisition, the damage rarely makes headlines. It shows up as declining candidate quality, eroding employer brand, rising cost per hire, and frustrated recruiters who can’t explain what’s broken.

    Those problems compound quietly—until leadership finally asks why nothing seems to be working.

    This is a judgment problem, not a technology problem. And judgment doesn’t scale the way code does.

    If you’re implementing AI or automation in talent acquisition and aren’t confident you can validate whether it’s actually working, we should talk.