Caveman Syntax Slashes LLM Token Usage While Preserving Accuracy 🔗
Claude plugin forces terse prehistoric responses that cut output tokens by average 65 percent for coding tasks
Token expenditure has become one of the most tangible costs in AI-assisted software development. Every verbose explanation from Claude consumes context window space and inflates API bills. The caveman project offers a pragmatic solution: a Claude Code skill and Codex plugin that instructs the model to respond in abbreviated, caveman-style English.
Token expenditure has become one of the most tangible costs in AI-assisted software development. Every verbose explanation from Claude consumes context window space and inflates API bills. The caveman project offers a pragmatic solution: a Claude Code skill and Codex plugin that instructs the model to respond in abbreviated, caveman-style English.
The approach rests on a simple observation. Stripping away conversational politeness and redundant phrasing dramatically reduces token counts without discarding technical substance. A standard 69-token explanation of a React re-rendering bug becomes 19 tokens: "New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."
The plugin provides three intensity levels so developers can match output density to their needs. Lite retains some conventional grammar while still cutting words. Full delivers the signature caveman dialect. Ultra compresses further: "Inline obj prop → new ref → re-render. useMemo."
Real measurements matter more than claims. Version 1.1.0 ships with a reproducible benchmark system located in benchmarks/run.py. The script calls the Claude API directly, compares normal versus caveman output, and updates the README table with fresh data. Across ten coding prompts the project records an average 65 percent token reduction, with individual tasks reaching 87 percent savings. One React explanation dropped from 1180 to 159 tokens.
The latest release also adds official Codex plugin support, a proper contributing guide, and issue templates. Installation remains deliberately simple, reflecting the project's focus on immediate utility rather than complex configuration.
For development teams running dozens of AI coding sessions daily, these savings compound quickly. Reduced output tokens mean lower latency, cheaper API calls, and more room left in context windows for actual code. The technique works because large language models trained on internet text still understand the underlying technical concepts even when forced to express them with minimal vocabulary.
The project demonstrates that prompt engineering does not always require sophisticated algorithms. Sometimes the most effective optimization comes from changing the model's linguistic persona. Builders who spend significant time in Claude or Codex projects now have a practical tool to control their token budget without sacrificing answer quality.
Same fix. 75% less word. Brain still big.
- Frontend developers diagnosing React re-rendering performance issues
- Backend engineers debugging authentication middleware token validation
- Full-stack teams optimizing daily Claude API expenditure costs
- succinct-llm - Applies formal brevity instructions but lacks caveman persona and reproducible benchmarks
- prompt-compressor - Focuses on input token reduction through algorithmic summarization rather than output style
- claude-verbosity - Offers configurable response length settings without the linguistic compression of prehistoric speech