The Anthropic line item on your P&L is the part you can see.
It is also the smaller cost.
A 50-engineer team running four AI coding sessions per engineer per day is running 200 AI environments across the workday. The token cost shows up on a bill at the end of the month. The labor cost of 200 unmanaged sessions, the rework they produce, and the sprint velocity that never quite matches the individual output numbers does not show up anywhere. The CFO sees the bill. The CTO sees the velocity gap. Nobody connects the two until the quarter ends and the AI investment has not produced the lift the budget was justified by.
The reason the labor cost does not show up has a specific shape. Your engineers are not configuring sessions. They are opening them. There is a difference, and it is the difference between a managed cloud-infrastructure line item and an unmanaged one.
A session is an environment, not a tool
An AI coding session is not a tool your engineer uses. It is a live environment they work inside. It has a fixed capacity. It degrades over time. It forgets. And it starts consuming budget before your engineer types a single character.
A fresh Claude Code session costs 20,000 tokens on startup. The system prompt, project instructions, and tool definitions load before any work begins. At Sonnet 4.6 pricing, that is roughly six cents per session in baseline overhead. Fifty engineers running four sessions a day is 200 startups per day, $12 per day in startup overhead alone, before any productive work has happened. $3,000 per year, just to open the empty rooms.
That is the cheap part of the bill.
Where the line item is actually growing
Quality in an AI coding session does not hold steady and then collapse. It degrades gradually as the context window fills.
Multiple engineering teams have independently arrived at the same threshold: sessions start producing lower-quality output at 20 to 40 percent of the context window. Most engineers do not restart at 40 percent. They keep going. The output gets worse. They correct it. They keep going. The correction costs tokens.
Auto-compaction kicks in at around 83.5 percent context capacity. It is lossy. It retains roughly 20 to 30 percent of what was in the session. One engineering team lost three hours of refactoring work when compaction erased all knowledge of migration decisions mid-session. The session kept running. The engineer kept working. They found out at review.
A 750,000-token session at Claude Sonnet pricing costs about $2.25 in input tokens. If that session produced unusable output because the context degraded past the quality threshold and nobody caught it, the real cost is one hour of a senior engineer’s time. At a $200,000 annual salary that is $100. The token cost is a rounding error. The labor cost is not.
Across a 50-engineer team running four sessions a day, you have 200 sessions per day. If 20 percent degrade past the quality threshold before anyone restarts them, that is 40 hours of engineering time per day producing output that requires rework. Teams who have measured it report the real number is higher.
That cost shows up nowhere on the P&L. It shows up in sprint velocity that never quite matches the individual output numbers, and in the gap between what the AI line item was supposed to deliver and what the company actually shipped.
What unmanaged sessions cost at scale
Each engineer has their own approach. Some use detailed project instructions. Some use none. Some restart sessions frequently. Some run a single session all day. Some connect every available MCP server. Some connect none.
Inconsistent output quality is the visible problem. Inconsistent cost is the real one. A well-configured session and a poorly configured session running the same task can differ by 3 to 5 times in token consumption. Across a 50-engineer team that variance is itself a line item, and one that the engineer running the cheap session is subsidizing for the engineer running the expensive one.
Boris Cherny built Claude Code. He shared his full workflow in January 2026. He runs 10 to 15 sessions at once. Each one is scoped to a single task. His project instructions file is about 100 lines. His rule: when the agent does something wrong, add it to the file. The file is not documentation. It is a running record of corrections.
That is a managed approach. Most engineering teams do not have one. The Anthropic line item is paying for the unmanaged version.
The four questions that define a session
The teams getting this right answer four questions before a session starts. Not as a ritual. As a policy.
What permissions does this session have. Read only or read-write. Which directories. Which external tools are connected. An agent with access to everything has access to everything. Scoping permissions per task type is not a security theater exercise. It reduces the surface area the agent navigates, which reduces token consumption and improves output focus.
Does it use existing project context. Project instructions tell the agent what the codebase does, what conventions to follow, and what it should never do. A session without them starts from scratch on every question the codebase could have answered. Those are wasted tokens and wasted time. Project instructions get followed about 70 percent of the time. For rules that matter, say them in the session. Not just in the file.
Does it need specific capabilities for this task. A debugging session needs different tools than a feature build. Over-connecting MCP servers adds overhead to every call whether the server is invoked or not. Discipline about what a session needs for a specific task reduces cost and reduces the chance the agent reaches for a tool it should not use.
What is the session scope. The simplest operating rule the engineering community has converged on: one task per session. Starting a fresh session costs 20,000 tokens. That is nothing compared to the quality loss from a degraded session or the labor cost of reworking output that looked correct at 80 percent context and was not.
What the CFO should be asking next quarter
The Anthropic line item is the visible cost. The invisible cost is the variance in output quality across sessions, the rework that happens downstream, and the sprint velocity that never quite reflects the individual output numbers.
A 50-engineer team running four sessions a day is running 200 AI environments across the workday. Most companies manage their cloud infrastructure more carefully than they manage those 200 environments. The cloud spend has a FinOps function. The AI spend does not.
The teams getting ahead of this are treating session configuration the way they treat code review standards or deployment checklists. Not every engineer reinventing their own approach. A shared policy for how sessions get configured, scoped, and restarted. The policy does not require new tooling. It requires someone deciding what the standard is and making it visible to the team.
Three CFO-grade questions for the next QBR.
What is our variance in token cost per merged PR across the engineering org. If the answer is “we do not measure that,” the AI line item is unmanaged.
What percentage of AI-assisted PRs require a second pass to fix problems introduced by the AI session. If the answer is “we do not track that,” the labor cost is invisible.
Who owns the session-management policy. If the answer is “every engineer,” there is no policy.
The AI line item is going to keep growing through 2027. The board is going to keep asking what it is buying. The honest answer this quarter is that it is buying individual output and that the company has not yet built the management discipline that makes that output add up to a business outcome.
Two quarters from now the answer needs to be different. The first step is reading what the unmanaged sessions are actually doing, who is running them, and which ones are producing the rework that does not appear on the bill.
That read is what Zamski does. zamski.com.
