Token-Saving Context Management
Save up to 80% on AI tokens by using ContextStream for context retrieval instead of including full chat history in every prompt.
The Token Problem, Solved
Traditional AI chat includes your entire conversation history with every prompt. By turn 10, you're sending 20,000+ tokens each time—mostly redundant.
ContextStream stores context externally and retrieves only what's relevant, keeping each prompt under 1,000 tokens regardless of conversation length.
How It Works
❌ Traditional Chat History
Turn 1: 2,000 tokens
Turn 5: 10,000 tokens
Turn 10: 20,000 tokens
Total over 10 turns: ~50,000 tokens
✅ With ContextStream
Turn 1: 500 tokens (summary)
Turn 5: 700 tokens (smart context)
Turn 10: 800 tokens (smart context)
Total over 10 turns: ~8,000 tokens (84% savings)
The Workflow
- 1
User sends a message
"How should I implement authentication?"
- 2
AI calls context_smart
Retrieves only context relevant to authentication
- 3
Returns minified context (~200 tokens)
D:Use JWT|D:No cookies|M:Auth API at /auth/login
- 4
AI responds with full context awareness
Without needing 10,000 tokens of chat history
Token-Saving Tools
context_smartCall this before every response. Analyzes user message and returns only relevant context in minified format.
context_smart(user_message="how to implement auth?")
# Returns: W:Maker|P:api|D:Use JWT|D:No cookies|M:Auth endpoint
# ~200 tokens instead of full chat historysession_summaryGet a compact summary of workspace context (~500 tokens). Use at conversation start.
ai_context_budgetGet context that fits within a specified token budget. Useful when you need more detail.
session_compressExtract key decisions, preferences, and insights from chat history and store them. Use at conversation end to preserve context while clearing history.
session_deltaGet context changes since a specific timestamp. Efficient for incremental sync.
Best Practices
Call context_smart before every response
This ensures AI always has relevant context without loading full history
Compress at the end of long conversations
Use session_compress to extract and store key context before clearing chat
Use session_summary at conversation start
Get workspace overview (~500 tokens) to orient the AI to your project