Why I want my AI projects blessed by Jesuits

The hardest problems in AI aren't in the code. They're problems of judgment. And someone already solved them.

Jun 30, 2026

In the early 18th century, Maharaja Jai Singh II built five Jantar Mantar complexes. Astronomical observatories with no lenses, no electronics, and no moving parts. They used them, in part, to ensure their astrological birth charts were more accurately cast.

Earlier this year this centuries old approach to astronomy and astrology solved a problem I had trying to fix a dashboard.

The dashboard’s mine. It’s on top of a system I’ll come back to later - an AI pipeline to turn customer signals into executable work. The dashboard tells me if I can still trust the machine’s judgement.

It had a problem every dashboard faces. It can’t see its own drift.

When the AI slowly gets worse, and if my own instinct on “good enough” also slides, the dashboard will have green lights all the way down. It’s not lying, but it’s quietly going wrong.

The astronomers at the Jantar Mantar solved the problem centuries ago.

They knew their instruments drifted. They knew the assumptions they had in their calendar would, over time, pull away from the actual stars in the heavens.

They also didn’t trust the running system to catch its own decay. They re-anchored, on a schedule, against a fixed external baseline. It’s a mathematical correction called Ayanamsha to anchor their astrology to the fixed stars, and account for the earth’s drift.

I didn’t learn this solution from a digital product blog about dashboards.

But I did get it on purpose. By colliding the problem in front of me with a domain that had nothing to do with it. And, once I really started doing that deliberately, I can’t stop noticing a pattern. The hardest problems in the newest technology - AI - are often old problems wearing new clothes. And ancient answers are better than ones we’re busy reinventing.

The library five years shallow

Most people building with AI are reasoning from five years of data at most. Last quarter’s framework, the most recent SaaS playbook, or the pattern that worked last time.

That’s not a knock. It’s a fast moving field. Five years ago feels like forever. But it means we’re solving ancient problems with a shallow frame of reference.

And it’s not just a story about code. That’s the one we’re telling the most. AI writes a function. AI reviews a pull request. AI ships a feature. That’s real, interesting, and only a slice of the whole.

I lead product. When I’m using AI it’s not primarily about writing code. I’m triaging customer signals, internal data, and predictions about the future. And using that data to recommend how to route, draft specs.

How to decide what’s worth building.

Those are judgement calls, and I need to know how much of that judgement I feel comfortable trusting.

Is something ready? Calibrated? True? Is that prediction honest?

These aren’t engineering problems. They’re the problems guilds, courts, councils, and churches have stress-tested over centuries - and we can use the answers they wrote down.

So I went looking for the answers on purpose. This is how I did that, and what I found when I started building with them.

Reading old code

The how is a method I built into a skill called inv-collide. Part of a broader suite of invention related skills.

It stems from an idea called bisociation, created by the author Arthur Koestler. Then it operationalizes that idea.

Take a problem in front of you, and then a domain that has nothing to do with it, and you force them through three steps.

Map both as structures. The roles, processes, constraints, feedback loops, value flows, and failure modes. Not what they’re about, but about the bones of what they do.
Find the places where the bones are identical. Isomorphisms.
Generate concepts at that intersection. e.g. if this domain solves problem P with mechanism M, and I also have a version of problem P, can I transplant M?

It’s not the same as brainstorming. Brainstorming is free-association. Bisociation matches the bones of the thing. That discipline makes the output usable.

And that discipline requires a step people might skip.

Disciplined enough to discard it

It’s all very romantic. I found a poetic parallel between a dashboard and the stars in the sky. Well, the universe is really big and it’s easy to make up a metaphor.

If the method was just making nice-sounding coincidences, it would be a party trick.

It also generates a big discard pile.

When I collided the trust dashboard with the Jyotish astronomy and the Jaipur observatory, it surfaced twelve structural matches.

Astrological muhurta - an auspicious window where conditions align and you can move forward - mapped directly onto a promotion gate in my system. The point where AI capability became more trusted to act autonomously.

That got thrown out. It wasn’t bad, but my system already had a streak counter and encoded readiness gates. This added costume, not structure.

The collisions are only worth integrating when they survive an honest attempt to kill them. Re-anchoring survived. It runs in production.

Running old code in my product flywheel

Underneath the status dashboard is what I call my product flywheel. It’s an AI-and-human pipeline to turn customer signal and internal consensus into execution ready work, without a person writing every issue.

It reads from various sources - customer knowledge base, Slack, Zendesk. Then it classifies and routes what’s changed. It’ll draft a business case and independently assess it, scaffolding the approved work into Linear. Then it will recommend routing, priority, and provides a rate-limited queue for engineering to pull from.

This is product work. The product flywheel feeds execution, but it isn’t the execution. This is triage, routing, specs, and prioritization. Engineering pulls from its output in order to execute.

This isn’t a story about code review. These are rules for intelligence from medieval guilds, eighteenth-century astronomers, and the Talmud. And they’re applied to the most difficult parts of trusting AI with product judgement. Applying methods that were created a long time ago.

Here are three that are running.

Earned trust, and the medieval masterwork

One of the oldest problems in management. When do you let someone work unsupervised?

If you get it wrong early, you’re letting unqualified hands do a lot of damage in your name.

If you get it wrong late, you’re throttling the potential of someone who was ready.

Every craft tradition that’s lasted seems to have solved this in the same way. With a gate. A guild apprentice submitted a masterwork, and the sitting masters judged it. A Jesuit student reached a point where his judgement was, in their words, formed. Trust was domain-specific, slow to build, easy to lose.

AI tooling uses a flag. We set a permission mode, and we crank the autonomy up and down depending on how brave we’re feeling. But the system isn’t demonstrating anything.

The flywheel doesn’t have a dial. Its agents earn the right to act, and they don’t start with it.

I track trust per stream. Incoming customer signals, routing, drafting, the queue controller all have their own standing. Being good at one thing is no evidence of being good at anything else.

Each capability can climb through three tiers. The tier changes what the agent is allowed to do. An apprentice intake agent reports what it would have done, line by line, and needs a human to confirm every call. A journeyman provides a summary of what it would do, and asks for a single confirmation of that summary. A master executes and then reports back for review.

An agent moves from apprentice to journeyman on fourteen consecutive clean runs - confirmed, by a human, as correct. The journeyman to master takes thirty. And this isn’t an average. One bad run resets the streak to zero. And mistakes made by more “senior” agents mean demotion and a doubled threshold.

That’s the medieval guild structure, as a YAML file.

Trust is isolated by domain. It’s earned slowly, lost easily, and more expensive to win back a second time. That’s not how permission flags work, but every master craftsman who ever lived would recognize it.

That makes autonomy something earned - and easily lost.

Calibrating my judgement against the stars

Let me close the loop on that dashboard.

I’m not worried about the AI “breaking”. That will be loud and obvious. What I’m worried about is silent drift. Miscalibration as the AI degrades slowly, and the gap never shows up.

Those astronomers in Jaipur had an answer. Re-anchor on a fixed, external, baseline. On a schedule. Not trusting the instrument to audit itself.

My flywheel uses two versions of that.

A deliberately imperfect target. My runs are clean if I override the AI’s call no more than fifteen percent of the time. Not zero. Zero overrides is a person not paying attention. We want a target that keeps someone in the loop.
You can’t re-baseline a scoring system - even if it’s an improvement - without re-scoring at least five historical projects against the new rules. Then recording the sign-off and noting the discontinuity. You can’t move a baseline and also erase the evidence that you moved it.

That’s Jantar Mantar. Re-anchoring against a fixed point, on a cadence. And keeping its receipts.

Adversarial truth-seeking, in the room and in the spec

If we review for consensus we throw information away.

Two smart, competent people disagree about a tough problem. When that disagreement gets resolved, we move on.

But that signal tells you a problem has more than one shape. And that losing argument might be right under conditions that develop in future.

The Jesuits formed students through disputatio. A structured, adversarial defense. You don’t prove you’re competent. You prove you’re competent when a skeptic attacks your work.

The Talmud has preserved the minority ruling alongside the majority one, deliberately, for two thousand years. A defeated argument might become the right argument when the world changes.

Those memories need to be written down. If they don’t, we don’t remember them. A dissent gets aired in a meeting, a decision gets made, and the losing argument is - at best - noted in a retro document that nobody reads. That’s even worse for product decisions - routing, prioritizing, build this not that - than code, because the record’s thinner.

The flywheel runs these institutions against product judgement.

Disputatio is in every routing decision. Recommendations don’t come from one agent, they come from a pair. One proposes the route and the priority, and the second is designed specifically to challenge it. Attack the recommendation before it’s committed. The proposal has to survive the examination. That’s Jesuit insight - that the defense is where the judgement is formed - applied to a decision about a customer request.

Chavruta is the basis for how the system records the disagreements. Reviews produce dissents. The dissents don’t need to be resolved, but they are committed. And not just as a stale record. They’re structured objects that include conditions under which they wake up again. Recheck at the next live run. Resurface if a specific metric flatlines. When the world changes to match a trigger, that old losing argument comes back on its own and demands a second hearing.

Consensus is lossy. The Talmud knew that two thousand years ago. My flywheel acts on that memory.

Others still on the bench

There’s a couple more collisions which produced things that I haven’t shipped yet. The same method, but in the design stage.

Drift lineage. A text copied by hand for a thousand years has scribal drift - small errors that harden into official fact because the chain is trusted. Traditions that survive have apparatus for this - lay one manuscript next to another and see where the text mutated. I’m building the same thing for a claim - a walk backwards to the primary source, and flags where the numbers or the conclusions changed. Not whether those changes were right or wrong, but a lineage of provenance and drift from source.
Avoiding “vaticinium ex eventu“. This is prophecy after the fact. Re-reading the record once you know the outcome, and bending the record to show you were right. AI multiplies confident predictions, and decision journals are editable. Medieval clerks had the answer. A boring, comprehensive one. Witnessed, dated, tamper-evident entries. Predictions logged and stamped, and scored at outcome time. It’s about not lying to yourself about a prediction.

We keep reinventing what was proven

It’s about trust.

Transferring trust to something that acts in your name. Keeping your own judgement from drift. Not letting disagreements die. And being honest about predictions.

None of them are about AI writing code.

It’s about trusting AI’s judgement, calibrating it against my own, auditing it, and keeping us both honest.

This match to old institutions is real, not a flourish. History is not just a charming source of cool metaphors. It’s where they already ran these experiments. Where they wrote down the answer in a language nobody speaks any more.

It’s the debugging history of our species, and we’re reaching past it for last quarter’s new framework.

When you hand an AI something that matters - a decision, a forecast, a release - find the institutions that already solved it.

Read their old code.

Get the thing blessed by Jesuits.

Robin Cannon

Ready for more?