Token Optimization

Token-Saving Context Management

Save up to 80% on AI tokens by using ContextStream for context retrieval instead of including full chat history in every prompt.

The Token Problem, Solved

Traditional AI chat includes your entire conversation history with every prompt. By turn 10, you're sending 20,000+ tokens each time—mostly redundant.

ContextStream stores context externally and retrieves only what's relevant, keeping each prompt under 1,000 tokens regardless of conversation length.

How It Works

❌ Traditional Chat History

Turn 1: 2,000 tokens

Turn 5: 10,000 tokens

Turn 10: 20,000 tokens

Total over 10 turns: ~50,000 tokens

✅ With ContextStream

Turn 1: 500 tokens (summary)

Turn 5: 700 tokens (smart context)

Turn 10: 800 tokens (smart context)

Total over 10 turns: ~8,000 tokens (84% savings)

The Workflow

  1. 1

    User sends a message

    "How should I implement authentication?"

  2. 2

    AI calls context_smart

    Retrieves only context relevant to authentication

  3. 3

    Returns minified context (~200 tokens)

    D:Use JWT|D:No cookies|M:Auth API at /auth/login

  4. 4

    AI responds with full context awareness

    Without needing 10,000 tokens of chat history

Token-Saving Tools

context_smart
Essential

Call this before every response. Analyzes user message and returns only relevant context in minified format.

context_smart(user_message="how to implement auth?")
# Returns: W:Maker|P:api|D:Use JWT|D:No cookies|M:Auth endpoint
# ~200 tokens instead of full chat history
session_summary

Get a compact summary of workspace context (~500 tokens). Use at conversation start.

ai_context_budget

Get context that fits within a specified token budget. Useful when you need more detail.

session_compress

Extract key decisions, preferences, and insights from chat history and store them. Use at conversation end to preserve context while clearing history.

session_delta

Get context changes since a specific timestamp. Efficient for incremental sync.

Best Practices

Call context_smart before every response

This ensures AI always has relevant context without loading full history

Compress at the end of long conversations

Use session_compress to extract and store key context before clearing chat

Use session_summary at conversation start

Get workspace overview (~500 tokens) to orient the AI to your project

Next Steps