Pillar essay · May 27, 2026

The autonomous, self-improving company

Four feedback loops that turn a company from a system you operate into a system that gets sharper at the work on its own. One of mine is running clean today. The other three are the pattern, and here's how to install each.

The voice profile that Desk Theory's AI drafts against got sharper this week, and I didn't write the thing that sharpened it.

Here's what happened. I co-edited a draft on Sunday: the AI wrote a workflow article, I rewrote the parts that didn't sound like me, and I shipped the version I liked. The difference between what the AI handed me and what I shipped is data. A skill I built once, months ago, reads that difference. It extracts the principles I applied without naming them ("you cut every sentence that started with 'in today's world'," "you replaced the abstract noun with the concrete one"), writes them into the right sections of the voice profile, and bumps the version number. The profile is on its tenth version now. It started as a paragraph I typed myself. Every version since has been refined by my own edits, fed back in by a loop I'm no longer in.

I did not write the code that does that on the day it runs. I wrote it once. It has been improving itself against my taste ever since.

That is one loop. There are three more shapes like it, on different timescales, against different parts of the business. And here's the honest part, because the honest part is the whole point: one of the four is running clean in my stack today. The other three I'm building toward, on top of pieces that are already running. I'll tell you which is which as we go, because the gap between "running" and "building toward" is exactly the gap most CEOs paper over when they talk about this, and papering over it is how you end up with a slide deck instead of an operating system.

None of this is the same thing as "using AI." Using AI is what every CEO is doing now. What I'm describing is the altitude above that: the company that gets sharper at the work without the CEO getting sharper at the company. Tools are leverage on effort. Feedback loops are leverage on time. Once you've installed the first loop, the second one is easier, and the third one stops feeling like a project and starts feeling like a habit.

If the last twelve months were about installing AI, the next twelve are about installing the loops.

What "self-improving" actually means

A self-improving system has one property a normal system doesn't: its outputs change its own inputs. A static system takes fixed inputs and produces outputs from them, the same way every time. A self-improving system takes its outputs, feeds them back into the front of itself, and the next run is measurably better than the last. The only question that matters is whether the loop closes.

Most of what gets sold as "self-improving AI" doesn't close the loop. It produces an output, you look at the output, and nothing about your judgment of that output makes the next output better. The model doesn't get smarter because you frowned at it. The loop is open. You're the only feedback mechanism, and you don't scale.

The thing I'm describing closes the loop with engineering, not magic. My edits to a draft become training data for the next draft. The alerts I act on versus the ones I ignore become the threshold for which alerts fire next week. Last quarter's review becomes the starting point for next quarter's plan. In each case, a small piece of code captures the signal I was already generating and routes it back to where it changes the next run.

Let me draw the boundary clearly, because the boundary is where most of the nonsense lives. This is not AGI. It is not "the AI runs the company." It is bounded feedback loops, each scoped to a specific surface of the business, that compound against your taste, your data, and your decisions. The loops only get sharper at the work because you keep deciding what "sharper" means. Take your judgment out of the loop and it doesn't improve toward anything; it just drifts. The human stays in. What changes is that the human stops doing the part a loop can do.

Why does the language matter this much? Because most CEOs talking about "AI strategy" right now are talking about tools. They're asking which model, which seat license, which vendor. Those are effort questions, and effort questions cap out. A tool makes a given task faster. A loop makes every future version of that task better than the last one. The compounding doesn't come from the model. It comes from the loops.

Why this is the next altitude up from a normal AI install

For a long stretch I had what I'd have called a good AI install. The [CLAUDE.md][11] was tight. The skills folder had grown to a couple dozen workflows. Routines ran on a schedule. The agent earned its keep every week. If you'd asked me whether I was getting leverage out of AI, I'd have said obviously, and I'd have been right.

What I did not have, until more recently, was anything that got better without me touching it. Every improvement to that "good install" came from me sitting down and improving it. I'd notice the morning brief was missing something, so I'd edit the prompt. I'd notice a skill had drifted, so I'd rewrite it. The install was leverage on my effort, and it was a lot of leverage, but it had a ceiling, and the ceiling was me. The whole apparatus only got as sharp as the hours I personally spent sharpening it.

The phase transition happens when the loop closes on even one surface. The first time I watched the voice profile improve from my edits alone, with no separate "go fix the voice profile" task on my list, the curve changed shape. The improvement was no longer coming out of my week. It was coming out of work I was already doing for another reason. That's the tell. A loop is real when the improvement is a byproduct of work you'd do anyway, not a separate chore you have to remember.

Here's why this matters strategically, and I'm going to be blunt about it. The CEOs who install the loops will look, twelve months from now, like they got smarter. They didn't. Their stack got smarter while they slept, on the back of decisions they were making anyway. From the outside the compounding is invisible: you can't see someone's feedback loops, you can only see that their company seems to be pulling away. By the time the lead is visible, it's structural, because the gap compounds and yours started later. That's the generational asymmetry I keep coming back to. The advantage doesn't arrive as a thunderclap. It arrives as a company that's a little sharper every month for reasons nobody outside can point to.

And here's why most CEOs won't get there, even after reading this: loops are infrastructure, and most CEOs treat infrastructure as someone else's problem. They'll delegate "AI" to a smart person on the team, that person will install tools, and the tools will help. But the loop design, the decision about what "sharper" means at the altitude of the business, is the one piece that can't be delegated, because only the CEO knows what the company is supposed to be getting better at. Tools are leverage on effort; feedback loops are leverage on time. The first you can buy. The second you have to design.

Loop 1: documentation that sharpens against your edits

This is the loop running clean in my stack today, so I'll spend the most time here. It's also the one I'd tell you to install first, for reasons I'll get to.

What it is. A [skill][2] that reads the difference between an AI draft and the version I actually shipped, after I've co-edited it. It extracts the principles my edits imply, and it writes them into the document the AI drafts against next time. For me that document is a voice profile, but the shape is general: any document that instructs the AI, improved automatically by the gap between what the AI produced and what you accepted.

What it costs. Most of an afternoon to write the skill the first time. After that, near zero. The whole point is that invoking it is cheap and the maintenance is cheaper. I run it as a step in my normal ship process, not as a separate task I have to remember, which matters more than it sounds like, because the loops you have to remember are the loops that die.

What it produces. A voice profile that has gone from a paragraph I seeded to its tenth refined version, every revision earned by an actual edit I made on an actual draft. The compounding is concrete and you can feel it: the drafts I get back today land noticeably closer to publishable than the drafts I got back when the profile was young. The drafts the system writes today are better than the ones it wrote months ago, because the system that writes them already has my edits. I'm not a better editor than I was. The thing I'm editing is just starting from a much better place.

Where it breaks. When my edits are stylistic rather than principled, the loop tries to extract a rule from noise. I changed a word because it sounded better in that one sentence, not because there's a principle, and a naive version of this loop would write down "prefer the word 'sharp' to the word 'good'" as if it were law. The fix is a filter: the skill has to be told to extract only the edits that generalize, and to leave the one-off stylistic moves alone. Getting that filter right is the actual work of this loop. The rest is plumbing.

What's next for it. Right now this loop sharpens one document. The obvious extension is pointing the same pattern at the other documents that instruct the AI: the CLAUDE.md, the glossary, even the commit-message conventions. Each is a place where my corrections are currently evaporating instead of being captured. That extension is on the bench, not in production, and I'm flagging that honestly.

If you want the conceptual version of what a skill is before you build one, the [explainer is here][2]. If you want to build your first one in an afternoon, the [walk-through is here][3]. This is the loop I'd install first if I were starting over, because it's the one that taught me what the other three should look like.

Loop 2: memory that compounds across sessions

This loop sits on a substrate that's already running in my stack: tiered memory through the [harness][4] I run my companies on. The substrate is real. The self-improving layer on top of it, the part that maintains itself, is the part I'm building toward, and I'll mark the seam.

What it is. Memory kept in tiers, so the AI knows your business at every level of how-fast-it-changes. Tier one is the stuff that almost never changes: what the company is, who the team is, what a normal week looks like. Tier two changes on a quarterly rhythm: this quarter's priorities, the current top customers, the decisions you made and don't want to relitigate. Tier three changes session to session: what's open right now, what you're trying to do today. The harness loads the right tier at the right time so the agent never has to ask you something it should already know.

What it costs. A weekend to write tier one well the first time. Around half an hour a quarter to refresh tier two. Tier three takes care of itself.

What it produces. An agent that operates with context instead of from scratch. It doesn't ask "what's our priority this quarter," because tier two told it. It can answer "what did we decide about hiring back in the spring," because tier two wrote the decision down when you made it. The compounding here is subtle and it's almost all in tier two: tier one is the install everyone does, and tier three is automatic, but tier two is where the compounding lives, because it's the layer that turns a generic assistant into one that has been in the room for your last four quarters.

Where it breaks. Tier-two drift. The refresh is a discipline, and disciplines decay. Skip a quarter and the agent starts making confident recommendations on stale priorities, which is worse than no recommendation, because a confidently wrong brief costs you more than a blank one.

What's next for it (the seam). The self-improving version of this loop detects its own drift: a skill that reads the current tier-two file against the last ninety days of what actually happened and flags the lines that no longer match reality, so the quarterly refresh stops depending on me remembering to do it. That's the part I'm building toward. The tiered memory is running; the loop that maintains the memory is the frontier. If you want the foundation under all of this, the [explainer on the harness is here][4], and the full operator's manual is the [book][5].

Loop 3: routines that get quieter as they learn

The substrate here is also already running: [Routines][6], which are agents on a schedule, and specifically a pipeline-radar routine I built that reads the tone of deal conversations and flags the ones drifting the wrong way. That [workflow is its own walk-through][7]. The routine runs today. The self-tuning layer on top is the pattern, and I'll be precise about which part is which.

What it is. A scheduled agent whose own configuration gets adjusted by the loop it runs inside. The pipeline radar alerts me when the tone of a deal conversation turns. The self-improving version watches which alerts I act on and which I wave off, and moves its own threshold accordingly. Act on the alert, the threshold for that pattern holds. Ignore three in a row, the threshold backs off, and that pattern stops interrupting me.

What it costs. Building the routine is the cost of any workflow, an afternoon or so. Building the tuning layer on top is another short session. The pair is cheap relative to what it saves, which is the slow tax of an alerting system you stop trusting.

What it produces. A monitoring layer that gets quieter over time as it learns what you actually want to be told about. That's the opposite of how almost every alerting system you've ever used behaves. Most get noisier the longer you run them, because every incident adds a rule and no one ever removes one, until you're ignoring the whole channel. Most CEOs add more alerts when they want to know more; the trick is to make the alerts you already have smarter, not to add more of them.

Where it breaks. Recency bias. A naive threshold-tuner overweights last week and forgets that the quiet month before it was the normal state. It needs a longer memory than the last few signals, or it chases noise. Building that longer-window normalizer is the real work, and being honest, it's the part of this loop that separates a clean version from a janky one.

What's next for it. Routines that learn from each other. One routine notices that another routine's alerts reliably correlate with me taking action, and weights accordingly. That's further out and I'm not going to pretend it's running.

Loop 4: reviews that change next week's plan

This loop is the pattern more than a thing I run end to end today, and I want to say that plainly because it's the one it'd be easiest to oversell. The pieces exist in my stack: a review cadence, and a [team-and-investor-update workflow][8] that turns the week into something I send. The wiring that makes the review feed the plan automatically is the part I'm describing as a pattern for you to build, not a finished machine I'm operating.

What it is. A review that doesn't just summarize the period. It compares what happened to what you planned, and the difference becomes the input to the next plan. The weekly version reads the week against last week's intentions and seeds Monday's plan with the delta. The quarterly version reads a season of weekly reviews and adjusts the cadence one altitude up.

What it costs. If you've already got a review skill, the feedback wiring is one more short skill: read the review, write the delta into the top of next period's plan. Twenty minutes of work for a loop that changes how every Monday starts.

What it produces. A planning rhythm that never starts from a blank page. Friday's review feeds Monday's plan; Monday's plan already knows what Friday said about Thursday. The plan stops being a thing you write from scratch under time pressure and starts being a thing you edit. Editing beats authoring, every time, on both speed and quality.

Where it breaks. Plan inertia. If next week's plan hugs last week's too closely, the loop is overfitting, and you've built a machine for doing the same week forever. It needs a deliberate jolt: a prompt that forces a step outside the recursion and asks what you'd do if last week hadn't happened. Without that, a review-to-plan loop quietly optimizes you into a rut.

What's next for it. The strategic-altitude version: a quarterly review that reads four quarterly reviews and asks where you're overfitting at the level of the quarter, not the week. That's the version I most want and least have.

What's still ahead, and what shouldn't be a loop

I've been marking the seams between running and building-toward as I go, so let me pull them together honestly.

The loop I'm trying to install next is the one that couples the others. Each of the four operates inside its own scope today. The next one would notice when an improvement in one loop should change another, when sharper memory in loop two should reset a threshold in loop three, for instance, and adjust the coupling. That's the frontier. It's a working idea I poke at, not a thing I run.

The thing that is not a loop and never should be: the actual decisions. The loops can sharpen the inputs to a decision, what data you look at, what context you carry, what gets flagged for your attention, but they should not make the call. A loop that started making strategic decisions wouldn't be leverage on my time; it'd be a slow abdication of the one job that's actually mine. The human judgment stays human. The loops feed it; they don't replace it.

What honestly isn't ready: anything that asks the loops from one company to talk to the loops from another. I run more than one business, and the loops don't cross between them, and they probably shouldn't, because what "sharper" means at one is not what it means at the other. Cross-company learning sounds elegant and is mostly a way to import the wrong instincts.

Where I'm actually headed is a stack where the loops handle the compounding and I handle the judgment. I'm not there. Today it's one loop I trust completely, two substrates I'm wiring loops onto, and a planning pattern I run half by hand. The honest state of the art in my own company is "further along than most, nowhere near finished." Anyone who tells you they've got the whole self-improving company humming is selling you something.

The install order

If you're going to build these, build them in this order. The order isn't arbitrary; each loop teaches you what you need to build the next.

Loop 1 first: documentation that sharpens against your edits. It's the cheapest, the improvement shows up fastest, and it teaches you what a closed loop feels like on a surface where mistakes are harmless. A bad voice-profile rule is easy to delete. Start where the blast radius is small.

Loop 2 second: memory in tiers. Tier one is the foundation everything else assumes. Plan a weekend for it. You can't build smart routines or useful reviews on top of an agent that doesn't know your business, so this comes before the loops that depend on context.

Loop 4 third: reviews that feed the plan. Once you have memory and at least one skill, the review loop connects them into a weekly rhythm. It's the loop that makes the others feel like a system instead of a pile of parts.

Loop 3 last: routines that self-tune. Routines are easy to start and hard to make smart, and the smart part, the threshold tuning, needs the memory tier to know what "smart" means for your business. Hold it until the rest is running.

A realistic cadence is one loop a month for four months. Faster than that and you won't have run any single loop long enough to see whether it compounds, which means you're flying blind on whether to keep it. Slower than that and you lose the momentum that makes the second loop feel easy after the first. One loop a month for four months: fast enough to keep the thread, slow enough to actually learn.

Why most CEOs will never get here

I'll close where the honesty has to land hardest.

Most CEOs delegate infrastructure, and loops are infrastructure. The CEO who hands "AI strategy" to a lieutenant gets tools, because tools are what you get when the person installing them doesn't own the definition of "better" at the business altitude. Loop design is the rare piece of technical work that's genuinely a CEO's job, because it's the design of how your company gets better, and that's not delegable in the way buying software is.

Most CEOs install and don't return. The first loop runs three weeks, the compounding hasn't shown up yet because compounding shows up at month three and not week three, and they quietly abandon it for the next shiny thing. The loop was working. They left before the curve bent.

Most CEOs over-engineer the first one. They want the coupling meta-loop before they've run a single simple loop end to end. The order above exists precisely to stop that. Build the harmless one first.

Most CEOs treat self-improvement as a feature instead of a discipline. They buy a product with "self-improving" on the box and assume the loop got installed in their operating system. It didn't. The loop is a thing you build and tend in your own business, against your own definition of sharper. Nobody sells you the discipline, which is exactly why it's an edge.

So here's the real talk. Most people reading this will find it interesting and move on, and that's fine. But the few who actually install even one of these loops will, in a year, look measurably better-run than their peers, and the gap will keep widening after that, because that's what loops do. I'd rather be honest about how few will do it than pretend it's easy. It's not easy. It's just available, and almost nobody is taking it.

Do this next

The weekly version of this thinking lands in The Thursday 3: one workflow at a time, every Thursday, free. The loops above are the strategic frame. The Thursday 3 is the tactical piece, the specific skill or routine you can build that week. Subscribe at desktheory.com (the form's in the masthead and the footer, take your pick).

Then go install the first loop. Pick the document your AI drafts against, wire the skill that learns from your edits, and run it for a month. Tell me in thirty days what compounded. I read every reply, and the first loop is the one I most love hearing about, because it's the one where people feel the curve change shape for the first time.

Andrew

The autonomous, self-improving company

What "self-improving" actually means

Why this is the next altitude up from a normal AI install

Loop 1: documentation that sharpens against your edits

Loop 2: memory that compounds across sessions

Loop 3: routines that get quieter as they learn

Loop 4: reviews that change next week's plan

What's still ahead, and what shouldn't be a loop

The install order

Why most CEOs will never get here

Do this next

Related reading

The architecture behind these articles.

The autonomous, self-improving company

What "self-improving" actually means

Why this is the next altitude up from a normal AI install

Loop 1: documentation that sharpens against your edits

Loop 2: memory that compounds across sessions

Loop 3: routines that get quieter as they learn

Loop 4: reviews that change next week's plan

What's still ahead, and what shouldn't be a loop

The install order

Why most CEOs will never get here

Do this next

Related reading

The signal in your inbox, every Thursday

The architecture behind these articles.