Free Claude Code

Overview

This document provides a detailed description of Free Claude Code, a lightweight proxy that routes Claude Code’s Anthropic API calls to multiple backends. It enables free or low-cost access to a range of large language model providers via a single local proxy and familiar Claude Code interfaces (CLI and VSCode extension).
Visual reference: the project includes a representative image showing Claude Code in action, running via NVIDIA NIM with a free setup. Image:
The core idea is to preserve Claude Code’s request format while transparently delegating execution to one of several providers, enabling hybrid or local deployments without changing user workflows.

Core Capabilities and Design Philosophy

Zero cost operation with broad access
The system offers a free tier on NVIDIA NIM (40 requests per minute) and access to free models on OpenRouter. Other paths include fully local options through LM Studio, Ollama, or llama.cpp.
Drop-in replacement for Claude Code
By configuring two environment variables, you can switch providers without modifying the Claude Code CLI or the VSCode extension, ensuring a smooth transition and minimal integration effort.
Multi-provider support (six built-in options)
NVIDIA NIM
OpenRouter
DeepSeek
LM Studio (local)
llama.cpp (local via a server)
Ollama (local)
Per-model routing and flexible mixing
The system supports mapping specific Claude models (Opus, Sonnet, Haiku) to different providers. You can mix providers per model or use a common fallback model, enabling tailored configurations for different tasks.
Thinking tokens and native Claude thinking blocks
The tool parses special thinking tokens (e.g., tags and reasoning_content) and converts them into native Claude thinking blocks when thinking is enabled for the chosen model.
Tool-use parsing and automation
A heuristic parser detects tool-calling patterns embedded in model output and converts them into structured tool use, enabling smoother automated workflows.
Local request optimization to save quota
Five categories of trivial API calls—such as quota probes and title generation—can be intercepted locally to reduce remote usage and latency.
Smart rate limiting and reliability
Proactive throttling with a rolling window, reactive 429 backoff, and an optional concurrency cap help stabilize performance under load and across providers.
Remote, bot-enabled workflows
A Discord or Telegram bot provides remote autonomous coding with tree-based threading, session persistence, and live progress tracking, broadening access beyond a local terminal.
Subagent control and safety
The system interposes task tools to prevent runaway subagents by configuring subagent behavior, including forcing certain operations to run in the foreground or background as appropriate.
Extensible architecture
Clean abstractions (BaseProvider and MessagingPlatform ABCs) enable the addition of new providers or platforms with minimal effort.
Rich Quick Start and configuration flow
The project includes straightforward prerequisites, a guided installation path for uv, steps to clone and configure, and explicit environment variable examples for each provider.

Quick Start: Prerequisites and Initial Setup

Prerequisites
Obtain an API key for one or more providers, or choose a fully local option.
NVIDIA NIM: obtain an API key from the NVIDIA developer portal.
OpenRouter: obtain an API key from OpenRouter.
DeepSeek: obtain a DeepSeek API key.
LM Studio: local usage with no API key required.
llama.cpp or Ollama: local runtimes with no API key required.
Install Claude Code as described in the repository.
Install uv (the Python tool runner)
On macOS/Linux (recommended):
- curl -LsSf https://astral.sh/uv/install.sh | sh
- uv self update (keep it current)
- uv python install 3.14
On Windows PowerShell:
- powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- uv self update
- uv python install 3.14
Note: pip install uv can fail on certain Python environments; prefer the official installer.
Clone and configure
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code
cp .env.example .env
Edit .env to configure a provider. Examples include:
- NVIDIA NIM: NVIDIANIMAPIKEY, MODELOPUS, MODELSONNET, MODELHAIKU, MODEL plus thinking toggles.
- OpenRouter: OPENROUTERAPIKEY, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- DeepSeek: DEEPSEEKAPIKEY, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- LM Studio: MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL with local paths.
- llama.cpp: LLAMACPPBASEURL, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- Ollama: OLLAMABASEURL, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
You can mix providers by setting different MODEL_* variables; MODEL serves as the fallback.
Optional critical security feature
You can enable optional authentication by setting ANTHROPICAUTHTOKEN in .env. Clients must supply the token via the Anthropic-Auth header, providing an extra layer of control on public deployments.
Run it
Start the proxy server:
- uv run uvicorn server:app --host 0.0.0.0 --port 8082
Run Claude Code by pointing ANTHROPICBASEURL at the proxy root URL (not /v1):
- Example for shells:
- bash: ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
- powershell: ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
The setup enables Claude Code to use your chosen provider transparently.
Editor integrations
VSCode Extension: configure environment variables in settings.json under claudeCode.environmentVariables with ANTHROPICBASEURL and ANTHROPICAUTHTOKEN, then reload extensions.
IntelliJ Extension: adjust acp.json to inject ENV variables for ANTHROPICAUTHTOKEN and ANTHROPICBASEURL, then start the proxy and restart the IDE.
Quick-mode and model picker
A built-in model picker (claude-pick) allows selecting models from active providers each time you launch Claude, without editing .env MODEL values.
Optional: install fzf and set up an alias to use claude-pick for quick model selection.
Package-based installation (no clone)
uv tool install git+https://github.com/Alishahryar1/free-claude-code.git fcc-init
This creates ~/.config/free-claude-code/.env from the built-in template; edit that file and run free-claude-code to start the server.
Migration note
The release removed NIMENABLETHINKING and ENABLETHINKING in favor of ENABLEMODEL_THINKING with optional per-model overrides.

How It Works: Architecture and Data Flow

Visual flow
Claude Code (CLI or VSCode) communicates with the local proxy at a standard Anthropic-compatible endpoint.
The proxy translates requests into the appropriate provider-specific format and routes them to the configured backends, returning results back to Claude Code in the expected Anthropic-like format.
The translation layer supports native Anthropic Messages endpoints for OpenRouter, LM Studio, llama.cpp, and Ollama, while NVIDIA NIM and DeepSeek use their own compatible pathways.
Per-model routing and flexible backends
Opus, Sonnet, Haiku can be mapped to: NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama, depending on configuration.
The MODEL value acts as a general fallback and is validated based on whether it conforms to a provider prefix format.
Request optimization and thinking
Five categories of trivial API calls can be intercepted locally to reduce quota usage. These are specifically designed to avoid unnecessary consumption of provider quotas and to speed up routine interactions.
Thinking blocks and thinking tokens in the model’s output are converted to Claude-native thinking blocks when the per-model thinking switches are enabled.
Probing and testing endpoints
The proxy exposes Claude-compatible probe routes, including:
- GET /v1/models
- POST /v1/messages
- POST /v1/messages/count_tokens
- HEAD/OPTIONS support for common probe endpoints.
Security and access control
If ANTHROPICAUTHTOKEN is configured, clients must present the same token via the Anthropic header; otherwise, no authentication is required (backward compatible).
“How it works” summary
Transparent proxy: Claude Code requests are preserved in their original form and diverted to the chosen provider.
Per-model routing: Opus, Sonnet, Haiku can be resolved through provider-specific mappings with a global fallback.
Local optimizations: Local interception saves quota usage and lowers latency for common request patterns.
Flexible deployment: Providers can be mixed across a single configuration, enabling tailored performance and cost management.

Providers and Model Prefixes: What You Can Route

NVIDIA NIM
Prefix: nvidia_nim/…
Key API: NVIDIANIMAPI_KEY
Base URL and routing: integrate.api.nvidia.com/v1
Popular models (examples): minimaxai/minimax-m2.5, qwen3.5-397b-a17b, glm5, kimi-k2.5, stepfun-ai/step-3.5-flash
OpenRouter
Prefix: open_router/…
Key API: OPENROUTERAPIKEY
Base URL: openrouter.ai/api/v1
Free model set includes trinity, deepseek variants, OpenAI models in free tiers
DeepSeek
Prefix: deepseek/…
Key API: DEEPSEEKAPIKEY
Base URL: https://api.deepseek.com/anthropic
Models: deepseek-v4-pro, deepseek-chat, deepseek-reasoner
LM Studio
Prefix: lmstudio/…
Local-only operation (no API key)
Base URL: localhost:1234/v1
Examples: unsloth-MiniMax-M2.5-GGUF, unsloth/Qwen3.5-35B-A3B-GGUF, unsloth/GLM-4.7-Flash-GGUF
llama.cpp
Prefix: llamacpp/…
Local-only operation
Base URL: http://localhost:8080/v1
Ollama
Prefix: ollama/…
Local-only operation
Base URL: http://localhost:11434
Note on model prefix usage
An invalid prefix yields an error. The MODEL variable serves as a robust fallback when a prefix does not apply.

Discord and Telegram Bot Control: Remote Orchestration

Discord bot
Remote task execution with tree-based threading, session persistence, and live progress streams.
Commands and interactions: start/stop tasks, monitor progress, and manage multiple sessions concurrently.
Voice notes support: voice messages can be transcribed and processed as prompts.
Setup steps:
- Create a Discord Bot in the Developer Portal and obtain the token.
- Enable Message Content Intent.
- Configure .env with MESSAGINGPLATFORM=discord, DISCORDBOTTOKEN, and ALLOWEDDISCORD_CHANNELS.
- Start the proxy and invite the bot via an OAuth2 URL with appropriate permissions.
Telegram bot
Telegram setup mirrors Discord with MESSAGING_PLATFORM=telegram, a Telegram bot token, and an allowed user ID.
Voice notes can also be supported through transcription.
Security and deployment
The bot layer extends Claude Code’s reach to chat platforms while keeping the underlying provider logic intact.
Channel and user access are controlled via the ALLOWED_* parameters, ensuring that only approved endpoints can interact with the proxy.

Backend Interfaces and Transport Layers

Local Whisper and NVIDIA voice options
Voice input can be processed via Local Whisper (Hugging Face Whisper) or NVIDIA NIM-based Whisper via gRPC.
Transport and compatibility
OpenRouter and other providers use native Anthropic-like interfaces or OpenAI-style endpoints as appropriate.
The system translates and adapts endpoints to ensure Claude-compatible interactions with all supported backends.
Voice extras and device configuration
Hardware and model selection for Whisper (WHISPERDEVICE: cpu, cuda, or nvidianim) and WHISPER_MODEL (base, small, medium, large variants, or turbo options for larger models) are configurable to balance speed and accuracy.

Configuration: Core, Rate Limits, Messaging, and Advanced Flags

Core settings
MODEL: Fallback model tag in provider_prefix/name format; validation ensures correct routing.
MODELOPUS, MODELSONNET, MODEL_HAIKU: Model-specific routes; if empty, inherit MODEL.
NIM, OPENROUTER, DEEPSEEK, LM_STUDIO, LLAMACPP, OLLAMA keys and base URLs per provider.
ENABLEMODELTHINKING: Global thinking switch; optional per-model overrides exist.
Rate limiting and timeouts
PROVIDERRATELIMIT: Default 40 requests per window
PROVIDERRATEWINDOW: 60 seconds
PROVIDERMAXCONCURRENCY: Default 5 parallel streams
HTTP timeouts: READ (120s), WRITE (10s), CONNECT (10s)
Messaging and voice
MESSAGING_PLATFORM: discord or telegram
DISCORDBOTTOKEN, ALLOWEDDISCORDCHANNELS
TELEGRAMBOTTOKEN, ALLOWEDTELEGRAMUSER_ID
CLAUDEWORKSPACE and ALLOWEDDIR: directories Claude may operate in
MESSAGINGRATELIMIT and MESSAGINGRATEWINDOW: per-session messaging control
VOICENOTEENABLED: enable voice note handling
WHISPERDEVICE and WHISPERMODEL: voice configuration for local or NIM-based Whisper
HF_TOKEN: optional Hugging Face token for faster downloads (local Whisper)
Advanced request optimization
FASTPREFIXDETECTION: enable fast prompt prefix detection
ENABLENETWORKPROBE_MOCK: mock network probe requests
ENABLETITLEGENERATION_SKIP: skip title generation requests
ENABLESUGGESTIONMODE_SKIP: skip suggestion mode requests
ENABLEFILEPATHEXTRACTION_MOCK: mock filepath extraction
Environment and authentication
ANTHROPICAUTHTOKEN: optional token for access control
ANTHROPICBASEURL: base URL for the Anthropic-compatible endpoint

Development and Extensibility

Project structure (high level)
server.py: entry point
api/: API routes and service layer
core/: shared protocol helpers, token counting, and conversion
providers/: provider registry and transports
messaging/: Discord/Telegram bots and session management
config/: settings and model mappings
cli/: command-line utilities
tests/: Pytest test suite
Quick commands for developers
uv run ruff format
uv run ruff check
uv run ty check
uv run pytest
Extending with new providers
Add a new OpenAI-compatible provider by extending an OpenAI transport and registering a descriptor.
Add a native Anthropic provider by extending the AnthropicMessages transport and registering the descriptor.
Add a fully custom provider by extending BaseProvider and implementing a streaming interface.
Extending messaging platforms
Extend the MessagingPlatform base class to support a new platform (e.g., Slack) with start, stop, sendmessage, editmessage, and on_message handlers.

Contributing

Report bugs or request features via Issues on the repository.
Propose new providers (Groq, Together AI, etc.) and additional messaging platforms (Slack, etc.).
Improve test coverage to ensure stability across providers and configurations.
Note: Docker integration PRs are not currently accepted.
Development workflow example
Create a feature branch, format and lint the code, run type checks, and execute the Pytest suite.
Open a pull request to merge feature branches back into main.

License and Credits

The project is released under the MIT License. See the LICENSE file for details.
Built with FastAPI, the OpenAI Python SDK, discord.py, and python-telegram-bot.
Acknowledges integration with NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama as providers.

Quick Reference: Snippets and Commands (Summary)

Starting the proxy server
uv run uvicorn server:app --host 0.0.0.0 --port 8082
Running Claude Code against the proxy
ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
Discord bot setup (summary)
Create bot in Discord, obtain token, configure environment variables, start server, invite bot with appropriate permissions.
Multi-provider model mapping (concept)
Use MODELOPUS, MODELSONNET, MODEL_HAIKU to map Opus/Sonnet/Haiku prompts to the desired provider, with MODEL serving as a fallback.

Visualizing the Experience

The project emphasizes a seamless user experience: Claude Code users can operate as usual, while the underlying proxy handles provider routing, local optimizations, and optional authentication.
The included action image—Free Claude Code in action—illustrates the practical deployment with NIM-backed processing and a fully functional local proxy, embodying the “free and flexible” promise of the system.

Final Notes

Free Claude Code is designed to empower developers and researchers to experiment with multiple backends without changing their workflows.
By combining a transparent proxy, per-model routing, local optimizations, and convenient bot integrations, it offers a practical path to cost control, privacy, and flexibility in modern AI-assisted development.

Free Claude Code

Enjoying this project?

GitHub - Alishahryar1/free-claude-code: Free Claude Code

Stay Updated

Product

Learn

Company

Legal

Stay Updated

Browse by Category