It was 11pm.

I had a Sentry alert, a GitHub issue, and a Claude Code session open. The workflow made sense. Connect the GitHub MCP. Connect the Sentry MCP. Let the agent trace the error back through the codebase, find the root cause, update the issue with the fix.

I went to make coffee. Came back ten minutes later. The session was still running. Context window 68 percent full.

I watched it work. It was pulling commits. Hundreds of them. Every open PR touching the file. Stack traces from Sentry. Each one landing in the context window like furniture being moved into a studio apartment.

By the time the agent finished I had used 750,000 tokens. In one hour. On one incident.

I did not have the fix yet.

At Sonnet 4.6 input pricing, that one session cost roughly $2.25 in tokens. Add the hour of my time at a senior engineering rate and the real cost is $102. Not catastrophic on its own. The shape of the cost is the part the CFO needs to see, because that shape is what your engineering org runs thousands of times per month.

What a token is, what 750,000 of them buys

A token is not a word. It is not a character. It is a chunk of text somewhere between one character and one word. About four characters on average. Three-quarters of a word. Anthropic, OpenAI, and Google all define it the same way.

At that rate, 750,000 tokens is roughly 560,000 words. A typical novel is 80,000 words. I burned through the equivalent of seven novels of working memory. In one hour. Investigating one incident.

And I still did not have the fix.

If a single engineer running a single MCP session can produce that shape of cost without producing the work the session was opened for, the question is not whether the MCP is the right tool. The question is what the MCP is handing back, why the context fills the way it does, and what would have to change for the same session to cost 50x less and produce the fix on the first attempt.

The thing nobody tells you about MCP

MCP is not the problem.

The idea is right. Give your agent access to your tools and let it work. GitHub. Sentry. Jira. Slack. All of it there when you need it.

The problem is what the tools hand back.

When MCP took off, every major platform rushed to ship an integration. The pressure was real. Customers were asking. The announcement was easy. Wrap the existing API, expose it as MCP tools, ship it.

Nobody stopped to ask what an agent actually needs versus what an API was designed to return.

Those are different questions. APIs were built for programs that parse JSON and make decisions programmatically. Agents read. They reason. Every token in the context window is working memory they are burning through.

A GitHub MCP call on a file returns paginated commits. Thirty per page by default. An agent investigating an incident does not stop at page one. It keeps calling. Page after page until it has enough context to reason about the problem. Each page is 1.5 to 3 kilobytes of raw JSON landing in the context window.

Then it needs separate calls to check which open PRs touch your file. Then it has no idea what your teammates are working on, what the Jira ticket says, or what was discussed in Slack last Tuesday.

That is tens of thousands of tokens. Across dozens of sequential API calls. No coordination context. Just data accumulating in working memory until the session runs out of room.

Do that twice in a session. Add Sentry pulling full event payloads.

Two million tokens is not an edge case. It is Tuesday night.

What 200 sessions a day looks like on the P&L

Take the 750,000-token session as the average for an incident triage that uses two or three MCP servers. Not the worst case. The middle of the distribution.

A 50-engineer team running four sessions a day is 200 sessions. Say 30 of them are heavy incident or refactoring sessions in the 500,000 to 1,000,000-token range. The token cost at Sonnet rates is $50 to $60 per day, $15,000 per year. Still not the catastrophe.

The catastrophe is the labor cost behind the same 30 sessions. If 20 percent of them are the shape of the session I ran, producing no resolution on the first pass, that is six engineer-hours per day burned without producing the fix. $600 per day in labor. $150,000 per year, behind a token cost of $15,000.

That is the line item the CFO is not seeing. The Anthropic bill arrives. The labor cost stays absorbed in the salary line. The connection between the two is invisible until someone draws it.

What I realized standing in my kitchen

A raw API response is built for a program to parse.

An agent context is working memory. You do not hand a surgeon a database export when they need a patient chart. You hand them a chart. Organized. Filtered. Showing what matters right now.

Most MCP tools skip that step. They open a pipe from the API to the context window and leave it running.

The fix is not a smarter agent. It is not a better prompt. It is deciding what goes into the context before it gets there.

Server-side aggregation. One call in. The work happens on the server. A shaped response comes out.

That is what I built into Zamski MCP.

What happens when you call zamski_file_context

One API call leaves the agent session.

On the server, five database queries run in parallel. Open PRs touching your files. Jira tickets linked to your branch. Slack messages from the last seven days. The engineers who know this code best, ranked by commit and review history. Active coordination failures in this area right now.

Before any of that reaches your agent, a response shaping layer cuts it to 8 kilobytes. Hard ceiling. Lists truncated to the top three entries. The rest disclosed progressively if the agent needs more.

What the agent reads is a paragraph. Fifteen lines. One action recommendation. Who is working here. What to watch out for.

Token cost: 570 to 2,340 per call. Never more.

The ratio against a naive GitHub MCP call for the same file is roughly 100 to 1.

Apply that ratio to the 30 heavy sessions per day in the example above. The token cost drops from $50 to under $1. The labor cost drops because the session no longer ends with the engineer reading raw JSON at 68 percent context. It ends with the engineer reading a chart and acting on it.

The thing even good MCP cannot do

GitHub MCP knows what is in GitHub. Sentry MCP knows what is in Sentry. Zamski MCP knows the coordination context across both.

None of them know what the agent on your teammate’s machine is doing right now.

That session running somewhere else in your organization is touching the same files. Making decisions based on incomplete information. The agents do not know about each other. They were not built to. They act on whatever has been made explicit.

The commits know. The agents do not.

That gap is what I built Zamski to close. Not after the merge conflict surfaces. Before the agent makes the wrong move.

What to do this month

If your MCP sessions are running out of context before they finish the work, the prompt is not the problem. The pipe is.

Check what your tools are returning. If the answer is raw API responses, you have a pipe problem, not a model problem. Two practical moves this week.

Audit one engineer’s heaviest session this month. Pull the token count for the largest single session. Multiply by the team size and the session frequency. The number you get is the order of magnitude of your current monthly MCP cost. If it surprises you, the line item is unmanaged.

Ask what your MCP tools know about coordination. Who else is in this file right now. What changed here in the last 48 hours. What is forming in this part of the codebase that nobody has named yet. If the answer is nothing, your agent is working blind. Fast. Correctly. Blind.

zamski.com. Free. Connect your repository and see what your agents are missing, before the next 750,000-token session lands on the P&L.

Keep reading