Minimizing the Context Window
Web search wasn’t the only source of bloat. The runtime itself was oversharing.
Why Stop at Optimizing Web Results?
After minimizing Tavily output, a more uncomfortable realization surfaced.
Web search wasn’t the biggest source of token inflation.
OpenClaw was.
Each request to Anthropic was carrying:
- Tool schemas
- System instructions
- Prior messages
- Memory injections
- Structured metadata
- Full JSON responses
Individually reasonable.
Collectively expensive.
The First Prompt
… I want to try and minimize information being passed to Anthropic from a web search result. Can we look at implementing a translation layer to minimize JSON responses?
That was the technical starting point.
But the deeper issue wasn’t JSON.
It was context accumulation.
The Translation Layer
The solution began with a translation layer.
Instead of passing raw JSON from tools directly into the model:
- Extract only relevant fields.
- Remove nested structures.
- Flatten the payload.
- Normalize the format.
- Strip redundant keys.
Anthropic didn’t need the entire response schema.
It needed the distilled meaning.
Translation became compression.
In practice, that meant dropping everything that made the payload feel like a raw API dump:
- navigation junk
- extra metadata
- nested objects that added no decision value
- anything the model would only paraphrase back to me anyway
The Second Realization
I hit usage limits.
Not because the system was unstable.
Because it was verbose.
I paused for two hours and asked a different question:
I want to look at how I can minimize context OpenClaw is sending per request.
That’s when the focus shifted from:
“Optimizing tool output”
to
“Budgeting the entire runtime.”
Context Is a Budget, Not a Bucket
Every request includes:
- System prompts
- Agent definitions
- Tool descriptions
- Prior turns
- Memory injections
- Tool outputs
Even well-structured systems can quietly expand.
And expansion increases:
- Cost
- Latency
- Drift
- Hallucination risk
Bigger context is not better context.
Better context is smaller and intentional.
What Changed
The new design principles:
- Include only the tools needed for the current turn.
- Inject only relevant memory, not full context packs.
- Trim system instructions to what the task requires.
- Avoid carrying forward redundant conversation history.
- Minimize tool schemas where possible.
The goal wasn’t to shrink capability.
It was to shrink waste.
That also surfaced another problem: Telegram was great for interaction, but not for verbose output.
Want the next chapter automatically?