Why I Split My AI Workflow Into Two Layers (And Why You Should Too)

I learned this the hard way. When I first started using Claude Code, I put everything into a single file called CLAUDE.md. Every rule, every preference, every workflow. "Always do X." "Never forget Y." "When this happens, do that."

It worked great — for about one session. Then context got compressed, or a new session started, and Claude forgot half of it. Not because it's bad. Because that's how LLMs work. Every response is a fresh prediction. There's no persistent memory between turns unless you build it yourself.

So I'd re-explain the rules. Re-teach the preferences. Catch mistakes that shouldn't have happened because the instructions were right there. And I realized — I was treating an AI like an employee who can read a manual and remember it forever. That's not what this is.

The Problem With Promises

Here's the thing nobody tells you about AI workflows. You can write the most detailed instructions in the world, and the AI will follow them — until it doesn't. Not out of malice. Out of architecture. LLMs predict the next token based on what's in the context window right now. If your rule got compressed away or the session is fresh, that rule doesn't exist anymore.

I had a rule that said "always commit and push at the end of every task." Sounds simple, right? But Claude Code doesn't have a sticky note on its monitor. It processes the current conversation. If the conversation is long and that rule is 2,000 lines up in the CLAUDE.md, it might get summarized away. And then work sits uncommitted on one machine while I switch to another.

Promising "I'll always do X" in an AI instruction file is like promising yourself you'll go to the gym every morning. The intention is real. The follow-through depends on something remembering the intention at the right moment.

The Breakthrough: Automate, Don't Promise

The fix wasn't better prompts. It was code.

If something needs to happen every single time — reliably, without fail — it can't live in an instruction file that an LLM reads. It needs to live in a script that a computer runs.

I started splitting my workflow into two layers:

Layer 1: Deterministic (Python scripts). These handle everything that has a predictable, repeatable answer. Gathering data. Checking the calendar. Pulling git logs. Reading files. Making API calls. Copying files to the right locations. If the answer is the same every time given the same inputs, a Python script does it.

Layer 2: Non-deterministic (Claude). This handles everything that requires judgment, creativity, or conversation. Interpreting the data the script gathered. Deciding what to prioritize. Writing in my voice. Making recommendations. Having a back-and-forth with me about what to work on next.

The scripts don't think. Claude doesn't gather. Each layer does what it's best at.

What This Looks Like in Practice

The Standup System

Every morning when I start a session, I run a standup. The old way was a list of instructions in CLAUDE.md telling Claude to check this file, read that log, pull the calendar, look at the to-do list. Sometimes it worked perfectly. Sometimes it forgot a step. Sometimes it hallucinated a calendar event that didn't exist.

Now there's a Python script called standup.py. It pulls today's calendar events from the Google Calendar API. It reads the to-do list. It checks the last session file for context. It scans recent git commits across all my repos. It gathers the weather. It calculates how much work time I have before 5 PM. All deterministic. All reliable. Every single time.

Then it hands that data to Claude and says: "Here's everything. Brief James." And Claude does what Claude is super good at — it reads the room, picks out what matters, prioritizes, and talks to me like a human. The script gathers. Claude interprets.

The Episode Publisher

Same pattern. I have a script called publish-episode.py that handles the full pipeline for publishing a YouTube episode to my blog. It downloads the transcript, cleans it, grabs the thumbnail, creates the episode folder, reads the Google Doc checklist, and stages everything. Deterministic steps that need to happen the same way every time.

Then Claude takes over for the creative work — writing the article from the transcript, choosing which quotes to highlight, deciding how to structure the narrative, propagating links across the site. The judgment calls.

The Auto-Journal

My daily journal used to be something Claude would build by remembering what happened during the session. You can guess how well that worked after context compression. Now a Python build script handles the deterministic parts — it reads the calendar for what I actually had scheduled, pulls git commits for what code I actually touched, and formats the HTML template. Claude fills in the human parts based on our conversation. What did we talk about? What decisions were made? What was the mood of the day?

The build script also copies the output file to Google Drive automatically. No LLM memory needed. No "remember to sync the file." The code just does it.

When to Use Which Layer

The decision is simple. Ask yourself: "Is the answer always the same given the same inputs?"

If yes — script it. API calls, file operations, data gathering, calculations, deployments, file copying, format conversions. These are deterministic. Code handles them better than instructions every time.

If no — let Claude handle it. Writing, prioritizing, interpreting data, making judgment calls, having conversations, creative decisions. These require the kind of flexible reasoning that LLMs are actually good at.

The gray area is small. Most tasks clearly fall into one bucket or the other. And when you're not sure, default to scripting. You can always add Claude's judgment on top of deterministic output. You can't add reliability on top of an LLM promise.

The Compound Effect

Here's where it gets interesting. Every script I write makes Claude smarter — not by training it, but by giving it better inputs. When standup.py gathers accurate calendar data, Claude gives better morning briefings. When publish-episode.py stages a clean transcript, Claude writes better articles. When the build script copies files to the right places, Claude doesn't have to remember where things go.

The scripts accumulate. I have maybe a dozen now across different workflows. Each one removes a failure mode. Each one means fewer instructions in CLAUDE.md that might get forgotten. The system gets more reliable every week, not because the AI is improving, but because I'm moving more work into the layer that doesn't forget.

It's The Checklist Manifesto principle applied to AI. Simple systems prevent costly mistakes. Except instead of a checklist that a human reads, it's a script that a computer runs. Same philosophy — make the right thing happen automatically so you can focus your brainpower on the decisions that actually need a brain.

The Meta Lesson

If you're building with AI tools — Claude Code, Cursor, whatever — and you're frustrated that it "keeps forgetting" things, you're probably putting deterministic work in the non-deterministic layer. You're asking the AI to be reliable at tasks where reliability means "do the exact same thing every time." That's not what AI is for. That's what scripts are for.

Split the layers. Let code be code. Let AI be AI. The results compound.