← back to writing

The RAM ran out, so we rebuilt the entire stack

Notes on what changes when you stop optimising your stack for one developer at a time, and start optimising for one developer plus four agents working in parallel.

What's in this

Next.js was the first thing to break under the new workload.

Every agent we have working on a problem runs its own dev server. There is no other honest way to do it; an agent that cannot exercise the code it just wrote is an agent guessing at whether the change works. Each of those servers sits at roughly 8GB of resident memory in Next. The laptop survives two. It does not survive four. The framework was sized for one developer in front of one running instance, which had stopped being a useful description of our setup some months back.

That sounds like a small shift in the question. It changes most of the answers.

Before I get into the why, here’s what we actually moved:

None of those changes is novel on its own; plenty of teams have made any one of them. What’s interesting is the underlying constraint they’re all answers to, and the order in which each one became obvious.

I joined Life Scientific about two months ago. The internal tooling stack already existed when I arrived. It had been shipped roughly three months earlier, built in a handful of weeks as a controlled experiment in how fast a small team could ship internal software with AI in the loop, and how much further you could push that velocity by enabling semi-technical people across the company to contribute alongside the engineers. As a proof of concept, it had succeeded completely. Very little work had gone into it in the months between then and my arrival.

It also looked exactly like a prototype that had succeeded completely. The TypeScript was technically TypeScript, but it was peppered liberally with as any and the type system was doing approximately none of the load-bearing work it could have been doing. There were hundreds of Supabase migrations stacked up in a way that nobody had had time to consolidate. Auth was living inside the database itself via row-level security, which is a perfectly reasonable pattern in isolation but had been extended in our codebase to the point where data access logic was distributed across a dozen unrelated places and very difficult to reason about as a whole. Build times had crept into the tragic side of three minutes. None of this was anyone’s fault. It was the entirely predictable shape of code written under pressure to demonstrate that something was possible.

My brief was to take that demonstration and turn it into something a wider group of semi-technical people across the company could safely and productively contribute to. Almost every architectural decision I made over the following months flows from that.

The new shape of development

For most of the last decade, the resource cost of your framework was measured per developer. One person, one editor, one dev server running on one machine. Frameworks competed on cold start time, hot reload speed, and build duration, but always for the case of one running instance at a time. That was the assumption baked into every benchmark, every blog post comparing framework X to framework Y, and every social media thread arguing about who had won the JavaScript meta-framework war this week.

That assumption is starting to break.

When you’re working alongside agents in any serious way, you aren’t running one instance of anything. You’re running three or four, in separate git work trees, all live, all watching files, all consuming memory and CPU. Each agent is doing real, billable work on a different ticket. One is refactoring an API surface. One is writing tests against a feature you shipped yesterday. One is investigating a flaky bug. One is implementing something you sketched out fifteen minutes ago. You spend the day hopping between work trees, reviewing what each agent has produced, redirecting when they go off course, and occasionally pulling on the keyboard yourself when a task needs human judgment that the agent hasn’t earned yet.

This is a different shape of work, and it has different infrastructure requirements. Three Next.js dev servers can eat all the memory you have before you’ve opened a browser. Add a fourth and a meaningful fraction of the laptops in active developer use today simply cannot keep up. The pattern is universal: every framework whose dev mode was tuned for one instance starts to wheeze noticeably when you ask it to be N instances on shared hardware.

What’s interesting is that the dev experience for any single developer hasn’t changed. Each individual instance feels exactly the same as it ever did. The system you’re sitting at has changed, and almost nobody is benchmarking the difference. The cost of running one dev server is fine. The cost of running four at once on the same machine is the metric that quietly started to bite us. It doesn’t show up in framework comparison charts. It doesn’t have a name on Twitter yet. But if you talk to anyone running a serious agent-augmented workflow on a real codebase, they will know exactly what you mean.

What we tried

We did the obvious things first. We upgraded to Next 16.2, which had just landed with about 87% faster dev startup, meaningfully faster rendering, and over 200 Turbopack fixes. The improvements were real and we felt them. The pain was still there, because none of those fixes were aimed at memory. The release notes are about startup time, render time, and DX, which is the right set of things to optimise for the workload Next is designed around. It is not the workload we have.

We had also been accumulating Next-specific quirks. The auth-gated ISR setup we built to make pages feel fast worked, but it shaped how every new route had to be written. Adding an authenticated page meant consciously building it around the ISR carve-out and keeping that carve-out in mind the whole time it was being written. That is not something I want to be holding in my head during development, and it is definitely not something I want an agent to be expected to remember on a route it has never touched before.

TanStack Start on Cloudflare was the move we settled on. The timing happened to align with Cloudflare’s Agents Week, which felt about right for what we were trying to solve. We had prior experience with Cloudflare’s developer platform and it had been consistently good. The pricing structure is meaningfully better than Vercel’s for a team our shape, mostly because there is no per-seat pricing. We can hand a semi-technical contributor an account when they need one without that decision coming with a per-user bill attached. We use it in a fairly monolithic way: the Worker runs in the same region as our database, and almost none of the edge features that people usually pick Cloudflare for are in play. That is fine. The platform earns its place here on the runtime and the pricing. And, most importantly for our specific situation, TanStack Start gave us a router and runtime that didn’t carry the same RAM tax as the Next dev server.

I want to be honest about the trade-off, because every “we migrated frameworks” post I have ever read soft-pedals the cost. TanStack on Cloudflare is not as pleasant to set up as Next.js. Next is insanely easy to scaffold; the on-ramp is one of the best in the industry, and I don’t think anyone seriously disputes that. TanStack on Cloudflare has more sharp edges, more configuration to think about, and a real amount of “wait, how do I do this in the new world.” The end result is genuinely better. The first week of building anything new in it is worse than the equivalent first week in Next would have been. Anyone considering this move should know that going in rather than discovering it midway through their migration.

What we got out of the migration, once it had settled in: 14-second page loads dropped to roughly 400ms, build times stopped being something I would sigh at when I kicked them off, and I can now comfortably run 4 dev servers concurrently on a laptop that was crying for help a few months ago.

What else falls out

Here’s what surprised me about all of this. Once you start designing the codebase for the assumption that multiple agents and humans can work on it in parallel, safely, a lot of architecture decisions that used to be matters of personal taste become close to obvious.

Take secrets management. With semi-technical contributors and agents working across four work trees at a time, you really do not want any of them juggling local .env files. Every additional copy of a credential is another surface for that credential to leak, get out of date, or end up committed to a branch by an agent that doesn’t fully understand which lines it should be staging. One stale credential in one work tree and you have quietly poisoned that branch’s output. We moved to 1Password with a CLI flow that injects runtime environment variables on application start, which gave us a single source of truth and removed the developer-machine attack surface almost entirely.

Take the monorepo decision. It wasn’t ideological for us, and I’d push back on anyone framing it that way. It was the only realistic way to get shared schemas, shared types, and shared utilities into the hands of multiple frontends without us doing the kind of manual sync work that an agent would inevitably get wrong somewhere along the way.

The same logic drove the core API. We built it on Effect’s HttpApi, which gives us a single typed schema that every client reads from: internal tools today, MCP servers next quarter, agents reading our data after that. Scalar runs on top of it as an interactive console for anyone, human or not, who needs to understand the surface.

And take the Neon migration, which actually predated this whole framing but ended up being its cleanest validation. We moved off Supabase onto Neon primarily for one feature: database branching. Neon’s branching lets a non-technical contributor spin up an isolated copy of production data, run their change against it, verify that what they built does what they expected against real data shapes, and throw the branch away when they’re done. All without me being in the loop, all without them risking anything that touches the live system. That single capability shifted what was possible for the people I was trying to enable. Supabase couldn’t give us the same workflow in the shape we needed without significantly more custom plumbing on our end, and even then it would have been a worse fit. We also swapped the Supabase SDK for Kysely as our typed query builder. The Kysely experience has been excellent; in a codebase where the type system had been doing almost no real work, having a properly typed query layer has been a significant quality-of-life improvement on every PR we’ve shipped since.

That last change is also the part that proves the thesis better than anything I can argue for in the abstract. Replacing the SDK meant rewriting essentially every database query in the codebase: hundreds of locations, all subtly different, scattered across services and components and route handlers. It was exactly the kind of large-scale, mechanically tedious code change that, two years ago, you would have blocked out a week of an experienced engineer’s calendar to do carefully, with one eye on the diff and the other on the test suite. We didn’t. Claude ran for about five hours and nearly one-shot it. I cleaned up the genuinely tricky cases at the end. That was the whole human contribution.

We aren’t preparing for some future moment when this becomes possible. We already are doing it, in production, on the code that runs the company.

— The point of the exercise

If you’re wondering what I mean when I talk about “building for agents,” that is the most direct example I can offer.

Every one of these decisions felt like a separate problem at the time we were making it. The secrets one was a security concern. The monorepo one was a code-organisation concern. The Neon one was a database concern. Looking back at them in aggregate, though, they are all answers to the same underlying question: what does a codebase need to look like in order to host parallel work, including work done by entities that aren’t human? We’ve been locking in the choices that point toward “yes, it can host that,” and quietly removing the ones that don’t.

What actually mattered

The stack matters more than I expected. Typed API surfaces, dev servers light enough to run several at once, isolated database branches, no per-seat blockers on contributor accounts: getting these right has done more for our throughput than any single change to how we work with the agents themselves. The framework, the database, and the secrets layer either get out of the way or they become the thing the team spends its energy fighting.

Agents are also remarkably good at the rebuilding work itself. The Kysely migration earlier in this post was a small example. For a much larger one, the Bun project just shipped a rewrite of its entire JavaScript runtime from Zig into Rust: around a million lines of code, 6,755 commits landed in a single merge, performance held flat or improved, every pre-existing test still passing. The Bun team promised a follow-up blog post explaining how it was done. The rewrite took about a week. The blog post still has not shipped. The interesting question for the rest of us is what we are going to choose to do with a capability that suddenly exists.

We’re still early. The next thing on top of the core API is a set of internal MCP servers, so agents can reach the same typed surface we already expose to humans and take on richer pieces of internal work the same way the migrations did.

Written by
Joey Jooste Software Engineer - Life Scientific

Builds the tools and writes some of the notes.