Proxy Unlocks Free Claude Code for Terminal and VSCode Users 🔗
Lightweight routing layer connects Anthropic's coding agent to free NVIDIA NIM, local Ollama instances, and open providers without changing a line of code.
free-claude-code is a clever drop-in proxy that lets developers run Anthropic's powerful Claude Code CLI, VSCode extension, and Discord bots without ever paying for an Anthropic API key. Instead of sending requests to paid Claude endpoints, the tool silently intercepts them and forwards the work to six different backends: NVIDIA NIM's 40-request-per-minute free tier, OpenRouter's vast model catalog, DeepSeek's direct API, or fully local options including LM Studio, llama.cpp, and Ollama.
free-claude-code is a clever drop-in proxy that lets developers run Anthropic's powerful Claude Code CLI, VSCode extension, and Discord bots without ever paying for an Anthropic API key. Instead of sending requests to paid Claude endpoints, the tool silently intercepts them and forwards the work to six different backends: NVIDIA NIM's 40-request-per-minute free tier, OpenRouter's vast model catalog, DeepSeek's direct API, or fully local options including LM Studio, llama.cpp, and Ollama.
The project solves a frustration that has quietly limited experimentation with state-of-the-art coding agents. Claude 3.5 Sonnet and Opus excel at complex software engineering tasks, yet their official API pricing makes heavy interactive use expensive. Many developers resorted to copy-pasting code between local editors and web interfaces or simply gave up on agentic workflows. free-claude-code removes that barrier by acting as a transparent translation layer. Users set two environment variables and continue using the official Claude Code tools exactly as before.
What makes the project technically interesting is the depth of compatibility it achieves. The proxy supports per-model mapping, so a developer can send Opus-class reasoning to a strong cloud model while routing faster Haiku requests to a local Ollama instance running on the same laptop. It also understands thinking tokens: when backend models return <thinking> tags or reasoning_content fields, the proxy converts them into native Claude thinking blocks that the official client can display correctly.
Tool use receives special attention. Many open models still emit tool calls as plain text rather than structured JSON. A heuristic parser examines these outputs, reconstructs valid tool-use blocks, and returns them to the Claude Code client. This maintains the full agentic loop even when the underlying model was never trained on Anthropic's exact XML format. Five categories of trivial API calls—model listings, simple status checks, and similar housekeeping—are intercepted and answered locally, saving quota and shaving latency.
Rate-limit handling is equally refined. The proxy implements proactive rolling-window throttling, reactive exponential backoff on 429 errors, and an optional concurrency cap. These safeguards prevent sudden lockouts while maximizing throughput on free tiers. For collaborative users, the included Discord and Telegram bot brings autonomous coding sessions into group chats. It features tree-based conversation threading, persistent sessions across restarts, and live progress indicators so teammates can watch an agent work in real time. Subagent control logic forces run_in_background=False, preventing the kind of runaway tool cascades that have plagued other agent frameworks.
The architecture itself invites extension. Clean abstract base classes (BaseProvider and MessagingPlatform) make adding new inference backends or chat platforms straightforward. Configuration lives in a single, well-documented file that supports mixing providers within the same session. A developer could, for example, use NVIDIA NIM for heavy reasoning steps and fall back to a local DeepSeek model when quotas run low.
As AI coding assistants move from novelty to daily infrastructure, projects that democratize access become force multipliers. free-claude-code does more than save money. It lets students, indie hackers, and resource-conscious teams experiment with frontier coding agents in the environments where they already work—terminals, editors, and chat channels—without negotiating budgets or compromising privacy by sending code to distant servers. The result is a more level playing field where the quality of ideas, not the size of an API bill, determines what gets built.
The project arrives at the perfect moment. Local models are reaching surprising competence, cloud free tiers are expanding, and developers have grown tired of context-switching between paid web UIs and their preferred tools. By bridging that gap with surgical compatibility fixes and thoughtful optimizations, free-claude-code turns expensive experimentation into everyday practice.
- Terminal developers running full Claude Code workflows at zero cost
- VSCode users adding free agentic coding assistance to daily editing
- Discord teams deploying persistent autonomous coding agents collaboratively
- LiteLLM - offers broad LLM proxying with OpenAI compatibility but lacks Claude-specific thinking token conversion and heuristic tool parsing.
- LocalAI - emulates API endpoints for local models yet requires more configuration and doesn't provide the seamless Claude Code drop-in experience.
- OpenClaw - delivers Discord-based Claude coding but depends on paid Anthropic keys, whereas free-claude-code adds free routing and local options.