MCP Did Not Need More Tool Autonomy. It Needed a Context Boundary.
Your agent just produced a confident answer.
It sounds plausible.
It cites the right kind of context.
It may even have called the right tool.
But something feels off.
The answer is shaped by a viewpoint you did not expect. It treats one angle as important, smooths over another, and turns uncertainty into fluent prose.
So you open the trace.
There are model calls.
There are tool calls.
There is context moving through the system.
But you still cannot easily answer the question that matters:
Which context shaped this answer, and why was that context allowed in?
That was the pain that changed how I looked at MCP.
MCP makes it tempting to give agents more things to do.
Let the agent call tools.
Let it search.
Let it read files.
Let it query APIs.
Let it reach into more systems.
That is useful.
But while building contract-question-agent, I started caring less about how much the agent could do and more about a different question:
Which context is this agent allowed to think with?
I stopped treating MCP only as a way to expand agent capability.
I started treating it as a boundary for shaping context.
This article is not an argument against tool use. It is not an argument that agents should never act. MCP is useful exactly because it can expose capabilities across system boundaries.
But for some agent systems, especially systems where output boundaries matter, the first problem is not autonomy.
The first problem is context control.
In this design, the tradeoff is deliberate:
observability over autonomy
LLMs can answer almost anything
LLMs are good at producing plausible responses to almost any input.
Give one a contract clause, and it can explain the clause.
Give one an error log, and it can infer a possible cause.
Give one a product requirement, and it can suggest improvements.
Give one a vague concern, and it can turn that concern into polished prose.
That is impressive.
It is also a little dangerous when you are building an agent.
The problem is not only that the model may be wrong.
The problem is that the model can also invent the frame through which it answers.
It can decide what matters.
It can decide which angle to take.
It can decide what kind of response the user probably wants.
It can decide which unstated assumptions to smooth over.
When an LLM answers freely, it does not only generate text.
It often generates the viewpoint that makes the text feel natural.
For general chat, that can be helpful.
For bounded agent work, it can blur the system.
The issue was not access. It was viewpoint.
The project that made this visible for me was contract-question-agent.
The goal of this project is narrow: turn a vague concern about a contract clause into verification questions a human reviewer can raise before relying on the clause.
The agent is not supposed to decide whether a clause is legal, enforceable, fair, risky, or safe to sign.
It should not return a verdict.
It should help the reviewer ask better questions.
That difference sounds small, but architecturally it matters.
If I let the model freely decide how to look at a clause, it can produce something that sounds useful while quietly crossing the boundary of the task.
For example, it might say:
This clause creates risk.
The reviewer should be cautious.
This condition may be unfavorable.
Those sentences sound reasonable.
But where did that viewpoint come from?
Did the user ask for it?
Did the runtime provide it?
Was it selected from visible candidates?
Is it inside the agent's task boundary?
Can I inspect why that lens was used?
If the answer is unclear, the output becomes hard to trust and hard to debug.
The problem was not that the model lacked access to information.
The problem was that the model could choose its own review frame.
RAG can flatten what should stay separate
This is also why I do not think of this only as a tool-use problem.
It connects to a common RAG failure mode.
RAG retrieves relevant material and gives it to the model. For many explanation tasks, that works well enough.
But when the final answer becomes a smooth paragraph, several different things can collapse into the same prose:
A: an issue to check
B: an exception
C: a missing premise
D: a question for the counterparty
E: a reason to ask an expert
The final answer may become:
There are several points to consider, and the clause should be reviewed carefully.
That is not necessarily false.
But it is thin.
The issue, the exception, the missing premise, and the verification question have been averaged into a general caution.
For some domains, that flattening is not just disappointing. It is risky.
Contracts are one of those domains.
The useful artifact is not a general summary of concern. The useful artifact is a set of reviewable questions that preserve uncertainty and point back to specific lenses.
So I did not want MCP to simply give the model more things to call.
I wanted the runtime to control which lenses entered the model's context.
MCP is often framed as capability expansion
MCP is often discussed as a way to give agents capabilities.
A tool can read a file.
A tool can query a database.
A tool can call an API.
A tool can search external systems.
A tool can give the agent access to something it could not otherwise reach.
That framing is useful.
But it is incomplete.
If MCP only means “the agent can do more things,” then it can make an already broad model even broader.
The model can answer many kinds of questions.
The tools can access many kinds of systems.
Together, they can create a very capable agent.
But capability without shape is not the same as reliability.
The more the system can do, the more important it becomes to ask:
Which capability is allowed here?
Which context should enter the prompt?
Which lens should shape the output?
Which boundary should stop the workflow before generation?
For contract-question-agent, I did not need a freer agent.
I needed a more inspectable one.
Capability expansion vs capability shaping
The shift was this:
MCP as capability expansion:
give the agent more things it can call
MCP as capability shaping:
expose a controlled set of context candidates for this workflow state
That second framing is the one I needed.
The architecture I wanted looked less like an autonomous tool hunt and more like a visible context pipeline:
input
↓
scope boundary
↓
MCP context boundary
↓
visible candidate lenses
↓
model execution
↓
reflection
↓
output boundary
↓
verification questions
The important part is not that MCP appears in the diagram.
The important part is where it appears.
It sits at the point where context enters the workflow.
In this project, MCP is not used as a blank check for autonomous tool use.
The application calls an MCP capability deterministically based on runtime state. The model does not decide to go hunting for tools. The runtime retrieves candidate review lenses, injects them into the prompt, and asks the model to select from that visible set.
The shape is closer to this:
clause_type
-> lookup_clause_review_hints
-> candidate_review_lenses
-> prompt rendering
-> selected_review_lenses
-> verification questions
The model still does useful work.
It reads the clause.
It considers the candidate lenses.
It selects relevant lenses.
It generates verification questions.
But the important part is that the context entry point is controlled.
The runtime can inspect what candidates were available.
The output can show which lenses were selected.
The trace can show when the lens provider was called.
The workflow can remain bounded even though it uses an external capability.
That is why I started thinking of MCP as a controlled context provider.
MCP alone does not create the boundary
It would be too easy to say: “Use MCP and the boundary is solved.”
That is not true.
MCP is only one part of the boundary system.
In contract-question-agent, at least three boundaries matter:
1. Scope boundary
Should this input be handled by this agent at all?
2. Context boundary
Which review lenses may enter the prompt?
3. Output boundary
How far is the answer allowed to go?
These boundaries answer different questions.
Scope boundary:
Are we allowed to answer this input?
Context boundary:
Which viewpoint may shape the answer?
Output boundary:
What artifact are we allowed to return?
If someone asks an unrelated Python question, a general LLM can answer it.
But a contract verification-question agent should not.
That is a scope boundary.
If the input is a contract clause, the agent still should not invent arbitrary legal or business viewpoints.
That is a context boundary.
If the agent has enough context to generate useful questions, it still should not conclude that the clause is safe, risky, enforceable, or fair.
That is an output boundary.
MCP sits in the middle of this system.
It helps provide candidate context.
It does not replace scope checks, schema contracts, reflection, or output constraints.
Answerable is not the same as allowed
This distinction became one of the most useful debugging rules in the project:
Answerable is not the same as allowed.
A general-purpose model may be able to answer almost anything.
That does not mean this agent should answer it.
A model may be able to discuss legal risk.
That does not mean this workflow should produce a legal-risk verdict.
A model may be able to invent a plausible review angle.
That does not mean the runtime should let that angle silently enter the output.
This is where prompt-only boundaries become fragile.
You can write all of these constraints into a prompt:
Do not answer unrelated questions.
Use only these review lenses.
Do not produce verdicts.
Return verification questions only.
That helps.
But if every boundary lives only inside prompt text, the system has fewer places to observe what happened.
Did the input pass scope?
Which candidate lenses were provided?
Which lenses were selected?
Did the output cross the thesis boundary?
Did reflection reject it?
Where did the workflow stop?
Those should not be mysteries hidden inside model behavior.
They should be runtime events.
Context should be visible before it becomes prose
The reason I care about context boundaries is that final prose hides too much.
Once a model turns retrieved material, candidate lenses, assumptions, exceptions, and uncertainty into a fluent answer, debugging becomes harder.
The output may still sound good.
But the structure has disappeared.
For a review-oriented system, I want the important parts to remain visible before they become prose:
input clause
scope decision
clause type
candidate review lenses
selected review lenses
generated verification questions
reflection result
safe output
That shape makes debugging more concrete.
If the questions are weak, I can ask:
Were the candidate lenses weak?
Did the model select the wrong lens?
Did the prompt render the candidates poorly?
Did the output boundary remove too much?
Did reflection fail to catch drift?
That is much better than saying:
The agent gave a bad answer.
A context boundary turns invisible framing into something the system can inspect.
This is observability over autonomy
The design tradeoff is clear.
This system is less autonomous than an agent that freely decides which tools to call.
That is intentional.
In this project, I care more about observability than autonomy.
The agent does not need to surprise me with clever tool use.
It needs to show me which context shaped the output.
The more consequential the boundary, the more I want the runtime to own it.
For some systems, model-controlled tool use is the right design. If the task is exploratory, open-ended, or low-risk, giving the model more autonomy may be useful.
But when the task has a narrow artifact boundary, the runtime should control the context entry points.
MCP still matters there.
It is just not the center of autonomy.
It is part of the boundary.
GitHub repo as implementation proof
This article is part of the same design experiment as my first Boundary Log essay.
The implementation lives here:
https://github.com/mofuteq/contract-question-agent
The relevant part is not “this repo uses MCP.”
The relevant part is where MCP sits in the responsibility map.
Scope check:
decide whether the workflow should continue
MCP review-lens provider:
provide controlled candidate context
Prompt surface:
render the visible candidates
Model execution:
select lenses and generate questions
Reflection:
check whether the output stayed inside the thesis
Output boundary:
return verification questions, not verdicts
That is the shape I care about.
MCP is not the agent runtime.
MCP is not the safety system.
MCP is not the whole architecture.
It is one boundary where controlled context enters the workflow.
Closing
MCP makes it easy to think about what else an agent can do.
That is useful, but it is not always the most important question.
Before asking how many tools an agent can call, I now ask:
Should this input be handled at all?
Which context is allowed to enter?
Which viewpoint may shape the output?
Which artifact is the workflow allowed to return?
Where can I observe those decisions?
For my contract-question-agent, MCP became less about autonomous tool use and more about context boundaries.
Not capability expansion for its own sake.
Capability shaping.
The agent did not need more freedom.
It needed a visible boundary for the context it was allowed to think with.
Support the ongoing experiments
If these architectural notes helped you think more clearly about agent systems, you can support the ongoing experiments here:
Support goes toward LLM API credits, tracing tools, and small open-source design experiments.
