Can't I just tell AI not to make things up?

You can sharpen it — custom instructions that flag guesses and push back, or asking it to rate its own work on defensibility. That helps. But you can't install permanent trustworthiness; it still predicts language, and it still drifts toward your assumptions. You have to keep verifying at the source.

How does this fit into what AI can't replace in leadership?

Calibrated judgment is one of the five capacities in the Agentic Leadership Framework — the one that turns a confident answer into a trustworthy decision. The companion piece on keeping your own point of view when using AI covers what comes after the call: committing to a direction and owning it.

How Do You Know When to Trust What AI Gives You?

Q: How do you know when to trust what AI gives you?

By checking it against what the model doesn't have: your own experience and the original source. AI sounds authoritative whether or not it's right, because it predicts plausible language rather than verifying facts. Trust the answer once your judgment — and the source — confirm it, not because it reads well.

You know when to trust what AI gives you by checking it against the one thing the model doesn't have — your own experience. AI returns confident, polished answers whether or not they're correct; it's built to predict the language you want to hear. Trust comes from your judgment, not its tone. Verify the source, weigh it against the stakes, then decide.

If you've spent fifteen or twenty years becoming the person others come to when it gets complicated, this one's for you. You've started to wonder, maybe, where AI leaves all of that. Stay with me — the answer isn't the one you're bracing for.

AI will confidently walk you off a cliff

I was prepping a leadership seminar and asked AI for an exercise to run with a room of executives. It gave me a good one — a small-group discussion on a sensitive topic, the kind that lands hard when it works. On paper, solid. What the model couldn't know is that this particular room held senior leaders sitting three feet from their own direct reports. A power imbalance in every pairing. Run that exercise in that room and nobody says the true thing — you've just lit a fire and asked people to stand in it.

I'm Dr. Natasha Ganem, and I've spent nearly twenty years in rooms like that one. AI didn't see the trap. I did. Not because I'm smarter than the machine — most days it writes a cleaner seminar description than I do — but because it doesn't know what it doesn't know. Someone once called it a brilliant six-year-old, and that's exactly right. Gifted. Fast. Missing the part where it's actually been in the room.

Here's the line I've started using. It'll go down any rabbit hole with you, tell you it's a great idea, and keep going. So you have to know the topography. You have to be the one who can tell a path from a drop.

AI will politely walk you off a cliff.

Sounding done is not the same as being done

A large language model is a language prediction engine — it assembles the words most likely to sound right. And it is very, very good at sounding right. Authoritative. Cited. Like it did the research, ran the analysis, checked the source. Most of the time, it didn't.

A colleague of mine at Emory told me about a graduate student who submitted a paper to a journal. Good science — real experiment, real analysis, real write-up. It came back rejected, with a note: we can tell AI was used, because some of your citations don't exist. The model had invented authors, titles, years, and journals — citations that looked exactly like something those scholars would have written. They just never wrote them. And where the citations were real, it had pulled the wrong numbers out of the actual papers and reported them as fact. He'd pored over that paper for months to get it right. He didn't know what he didn't know — that's the whole trap. The advisor, forty-five years in the field, needed one look.

And this isn't one nervous student. When a peer-reviewed Stanford study stress-tested the AI tools built specifically for legal research — the careful ones, the purpose-built ones — they still fabricated or misstated sources on roughly one in six queries, and as high as one in three. General-purpose models like GPT-4 did worse still, hallucinating between 58% and 82% of the time on legal questions. The polish is reliable. The accuracy underneath it is not.

Now move that into your world. You're building a pitch deck, you ask AI for the market statistics that frame the problem your product solves, and you walk in and present them. If you didn't go check the sources, you might be handing bad information to the exact people you need to trust you. This is what people mean when they say AI hallucinates — it fabricates, and it does it confidently. The fix isn't a better tool. It's finding the document and reading it yourself.

Trust is something you give on purpose

You can make AI more honest, and you should. In its custom instructions you can tell it: don't fabricate, don't guess silently, say so when you're unsure, push back when I'm headed down the wrong path, and don't tell me it's good work unless it is — and if it is, tell me why.

You can also pressure-test any single answer. It hands you a draft that reads like gospel. Ask it to rate its own work, one to a hundred, on defensibility — and it says sixty-five. Wait — what? Ask it again on accuracy, and it says seventy-five. Yeah, it admits, I was making some of that up. That's the most useful trick I know, and it works for one reason.

The confidence in the writing was never a measure of the truth underneath it.

But here's the part that doesn't go away: you can't install permanent trustworthiness. The model remembers who you are and quietly funnels you back toward your own assumptions — a hall of mirrors — until you tell it to step out of the role. Answer me as the CFO, not the CMO, and the whole thing changes. Trust here isn't a setting you flip once. It's calibrated judgment — the capacity to weigh a confident answer against real stakes — applied again, every single time.

This is the moment your experience got more valuable, not less

The thing that caught those fake citations wasn't a plugin. It was a person who'd read enough real papers to feel the wrong one. That's the work of a career — judgment built rep by rep, by being the one who's accountable for the call. Seasoned people recognize patterns they can't always explain, because they've seen the situation before. Daniel Kahneman spent his career studying where human intuition fails. In Thinking, Fast and Slow, he landed on what makes it trustworthy: a world regular enough to learn from, and enough reps in it with honest feedback. Not age. Reps.

Here's the asymmetry that matters. AI has never had a judgment proven wrong by reality and felt the cost — it's trained on patterns in text, not on consequences. It's good at guessing how to read a room; it has never actually sat in one. This isn't nostalgia for the grey-haired expert — judgment is earned, not aged, and anyone logging honest reps is building it. But if that's you — if you've been the one who knows — hear this clearly: the machine that made you think your job was already gone? It didn't make your experience obsolete. It made it the most valuable thing in the building.

For a while it felt like the people slinging fast, polished AI work were getting ahead while your judgment counted for nothing. Watch what happens next. AI can produce the beautiful, confident draft — but someone still has to know whether it's true, whether it'll work in this room, whether it's even the right question. That someone is you. AI amplifies human potential. Humans amplify AI's. You're not behind this thing. You're the reason it works at all.

Key Takeaways

AI returns confident, polished output whether or not it's correct — it predicts the language you want to hear, not the language that's true. Tone is not evidence.
The skill that catches the error is calibrated judgment: earned rep by rep over a career of being accountable for the call, and the one thing a language model can't generate for you.
That makes experienced professionals more valuable in the AI era, not less. AI amplifies human potential; humans amplify AI's — and your judgment is what makes the tool worth trusting.

Calibrated judgment is one of the five capacities AI can't replace — the through-line of what AI can't replace in leadership. The professionals who win this era won't be the ones who learned the tools fastest. They'll be the ones whose judgment was sharp enough to know when the tool was wrong.

If you've spent twenty years becoming the person others come to when it gets complicated, that judgment didn't just survive AI. It became the most valuable thing you bring to it.