GitHub Repo
MIT
April 28, 2026 at 01:12 PM0 views
Free Claude Code
@Alishahryar1Project Author
- Overview
- This document provides a detailed description of Free Claude Code, a lightweight proxy that routes Claude Code’s Anthropic API calls to multiple backends. It enables free or low-cost access to a range of large language model providers via a single local proxy and familiar Claude Code interfaces (CLI and VSCode extension).
- Visual reference: the project includes a representative image showing Claude Code in action, running via NVIDIA NIM with a free setup. Image:

- The core idea is to preserve Claude Code’s request format while transparently delegating execution to one of several providers, enabling hybrid or local deployments without changing user workflows.
- Core Capabilities and Design Philosophy
- Zero cost operation with broad access
- The system offers a free tier on NVIDIA NIM (40 requests per minute) and access to free models on OpenRouter. Other paths include fully local options through LM Studio, Ollama, or llama.cpp.
- Drop-in replacement for Claude Code
- By configuring two environment variables, you can switch providers without modifying the Claude Code CLI or the VSCode extension, ensuring a smooth transition and minimal integration effort.
- Multi-provider support (six built-in options)
- NVIDIA NIM
- OpenRouter
- DeepSeek
- LM Studio (local)
- llama.cpp (local via a server)
- Ollama (local)
- Per-model routing and flexible mixing
- The system supports mapping specific Claude models (Opus, Sonnet, Haiku) to different providers. You can mix providers per model or use a common fallback model, enabling tailored configurations for different tasks.
- Thinking tokens and native Claude thinking blocks
- The tool parses special thinking tokens (e.g.,
tagsand reasoning_content) and converts them into native Claude thinking blocks when thinking is enabled for the chosen model. - Tool-use parsing and automation
- A heuristic parser detects tool-calling patterns embedded in model output and converts them into structured tool use, enabling smoother automated workflows.
- Local request optimization to save quota
- Five categories of trivial API calls—such as quota probes and title generation—can be intercepted locally to reduce remote usage and latency.
- Smart rate limiting and reliability
- Proactive throttling with a rolling window, reactive 429 backoff, and an optional concurrency cap help stabilize performance under load and across providers.
- Remote, bot-enabled workflows
- A Discord or Telegram bot provides remote autonomous coding with tree-based threading, session persistence, and live progress tracking, broadening access beyond a local terminal.
- Subagent control and safety
- The system interposes task tools to prevent runaway subagents by configuring subagent behavior, including forcing certain operations to run in the foreground or background as appropriate.
- Extensible architecture
- Clean abstractions (BaseProvider and MessagingPlatform ABCs) enable the addition of new providers or platforms with minimal effort.
- Rich Quick Start and configuration flow
- The project includes straightforward prerequisites, a guided installation path for uv, steps to clone and configure, and explicit environment variable examples for each provider.
- Quick Start: Prerequisites and Initial Setup
- Prerequisites
- Obtain an API key for one or more providers, or choose a fully local option.
- NVIDIA NIM: obtain an API key from the NVIDIA developer portal.
- OpenRouter: obtain an API key from OpenRouter.
- DeepSeek: obtain a DeepSeek API key.
- LM Studio: local usage with no API key required.
- llama.cpp or Ollama: local runtimes with no API key required.
- Install Claude Code as described in the repository.
- Install uv (the Python tool runner)
- On macOS/Linux (recommended):
- curl -LsSf https://astral.sh/uv/install.sh | sh
- uv self update (keep it current)
- uv python install 3.14
- On Windows PowerShell:
- powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- uv self update
- uv python install 3.14
- Note: pip install uv can fail on certain Python environments; prefer the official installer.
- Clone and configure
- git clone https://github.com/Alishahryar1/free-claude-code.git
- cd free-claude-code
- cp .env.example .env
- Edit .env to configure a provider. Examples include:
- NVIDIA NIM: NVIDIANIMAPIKEY, MODELOPUS, MODELSONNET, MODELHAIKU, MODEL plus thinking toggles.
- OpenRouter: OPENROUTERAPIKEY, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- DeepSeek: DEEPSEEKAPIKEY, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- LM Studio: MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL with local paths.
- llama.cpp: LLAMACPPBASEURL, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- Ollama: OLLAMABASEURL, MODELOPUS, MODELSONNET, MODEL_HAIKU, MODEL.
- You can mix providers by setting different MODEL_* variables; MODEL serves as the fallback.
- Optional critical security feature
- You can enable optional authentication by setting ANTHROPICAUTHTOKEN in .env. Clients must supply the token via the Anthropic-Auth header, providing an extra layer of control on public deployments.
- Run it
- Start the proxy server:
- uv run uvicorn server:app --host 0.0.0.0 --port 8082
- Run Claude Code by pointing ANTHROPICBASEURL at the proxy root URL (not /v1):
- Example for shells:
- bash: ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
- powershell: ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
- The setup enables Claude Code to use your chosen provider transparently.
- Editor integrations
- VSCode Extension: configure environment variables in settings.json under claudeCode.environmentVariables with ANTHROPICBASEURL and ANTHROPICAUTHTOKEN, then reload extensions.
- IntelliJ Extension: adjust acp.json to inject ENV variables for ANTHROPICAUTHTOKEN and ANTHROPICBASEURL, then start the proxy and restart the IDE.
- Quick-mode and model picker
- A built-in model picker (claude-pick) allows selecting models from active providers each time you launch Claude, without editing .env MODEL values.
- Optional: install fzf and set up an alias to use claude-pick for quick model selection.
- Package-based installation (no clone)
- uv tool install git+https://github.com/Alishahryar1/free-claude-code.git fcc-init
- This creates ~/.config/free-claude-code/.env from the built-in template; edit that file and run free-claude-code to start the server.
- Migration note
- The release removed NIMENABLETHINKING and ENABLETHINKING in favor of ENABLEMODEL_THINKING with optional per-model overrides.
- How It Works: Architecture and Data Flow
- Visual flow
- Claude Code (CLI or VSCode) communicates with the local proxy at a standard Anthropic-compatible endpoint.
- The proxy translates requests into the appropriate provider-specific format and routes them to the configured backends, returning results back to Claude Code in the expected Anthropic-like format.
- The translation layer supports native Anthropic Messages endpoints for OpenRouter, LM Studio, llama.cpp, and Ollama, while NVIDIA NIM and DeepSeek use their own compatible pathways.
- Per-model routing and flexible backends
- Opus, Sonnet, Haiku can be mapped to: NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, or Ollama, depending on configuration.
- The MODEL value acts as a general fallback and is validated based on whether it conforms to a provider prefix format.
- Request optimization and thinking
- Five categories of trivial API calls can be intercepted locally to reduce quota usage. These are specifically designed to avoid unnecessary consumption of provider quotas and to speed up routine interactions.
- Thinking blocks and thinking tokens in the model’s output are converted to Claude-native thinking blocks when the per-model thinking switches are enabled.
- Probing and testing endpoints
- The proxy exposes Claude-compatible probe routes, including:
- GET /v1/models
- POST /v1/messages
- POST /v1/messages/count_tokens
- HEAD/OPTIONS support for common probe endpoints.
- Security and access control
- If ANTHROPICAUTHTOKEN is configured, clients must present the same token via the Anthropic header; otherwise, no authentication is required (backward compatible).
- “How it works” summary
- Transparent proxy: Claude Code requests are preserved in their original form and diverted to the chosen provider.
- Per-model routing: Opus, Sonnet, Haiku can be resolved through provider-specific mappings with a global fallback.
- Local optimizations: Local interception saves quota usage and lowers latency for common request patterns.
- Flexible deployment: Providers can be mixed across a single configuration, enabling tailored performance and cost management.
- Providers and Model Prefixes: What You Can Route
- NVIDIA NIM
- Prefix: nvidia_nim/…
- Key API: NVIDIANIMAPI_KEY
- Base URL and routing: integrate.api.nvidia.com/v1
- Popular models (examples): minimaxai/minimax-m2.5, qwen3.5-397b-a17b, glm5, kimi-k2.5, stepfun-ai/step-3.5-flash
- OpenRouter
- Prefix: open_router/…
- Key API: OPENROUTERAPIKEY
- Base URL: openrouter.ai/api/v1
- Free model set includes trinity, deepseek variants, OpenAI models in free tiers
- DeepSeek
- Prefix: deepseek/…
- Key API: DEEPSEEKAPIKEY
- Base URL: https://api.deepseek.com/anthropic
- Models: deepseek-v4-pro, deepseek-chat, deepseek-reasoner
- LM Studio
- Prefix: lmstudio/…
- Local-only operation (no API key)
- Base URL: localhost:1234/v1
- Examples: unsloth-MiniMax-M2.5-GGUF, unsloth/Qwen3.5-35B-A3B-GGUF, unsloth/GLM-4.7-Flash-GGUF
- llama.cpp
- Prefix: llamacpp/…
- Local-only operation
- Base URL: http://localhost:8080/v1
- Ollama
- Prefix: ollama/…
- Local-only operation
- Base URL: http://localhost:11434
- Note on model prefix usage
- An invalid prefix yields an error. The MODEL variable serves as a robust fallback when a prefix does not apply.
- Discord and Telegram Bot Control: Remote Orchestration
- Discord bot
- Remote task execution with tree-based threading, session persistence, and live progress streams.
- Commands and interactions: start/stop tasks, monitor progress, and manage multiple sessions concurrently.
- Voice notes support: voice messages can be transcribed and processed as prompts.
- Setup steps:
- Create a Discord Bot in the Developer Portal and obtain the token.
- Enable Message Content Intent.
- Configure .env with MESSAGINGPLATFORM=discord, DISCORDBOTTOKEN, and ALLOWEDDISCORD_CHANNELS.
- Start the proxy and invite the bot via an OAuth2 URL with appropriate permissions.
- Telegram bot
- Telegram setup mirrors Discord with MESSAGING_PLATFORM=telegram, a Telegram bot token, and an allowed user ID.
- Voice notes can also be supported through transcription.
- Security and deployment
- The bot layer extends Claude Code’s reach to chat platforms while keeping the underlying provider logic intact.
- Channel and user access are controlled via the ALLOWED_* parameters, ensuring that only approved endpoints can interact with the proxy.
- Backend Interfaces and Transport Layers
- Local Whisper and NVIDIA voice options
- Voice input can be processed via Local Whisper (Hugging Face Whisper) or NVIDIA NIM-based Whisper via gRPC.
- Transport and compatibility
- OpenRouter and other providers use native Anthropic-like interfaces or OpenAI-style endpoints as appropriate.
- The system translates and adapts endpoints to ensure Claude-compatible interactions with all supported backends.
- Voice extras and device configuration
- Hardware and model selection for Whisper (WHISPERDEVICE: cpu, cuda, or nvidianim) and WHISPER_MODEL (base, small, medium, large variants, or turbo options for larger models) are configurable to balance speed and accuracy.
- Configuration: Core, Rate Limits, Messaging, and Advanced Flags
- Core settings
- MODEL: Fallback model tag in provider_prefix/name format; validation ensures correct routing.
- MODELOPUS, MODELSONNET, MODEL_HAIKU: Model-specific routes; if empty, inherit MODEL.
- NIM, OPENROUTER, DEEPSEEK, LM_STUDIO, LLAMACPP, OLLAMA keys and base URLs per provider.
- ENABLEMODELTHINKING: Global thinking switch; optional per-model overrides exist.
- Rate limiting and timeouts
- PROVIDERRATELIMIT: Default 40 requests per window
- PROVIDERRATEWINDOW: 60 seconds
- PROVIDERMAXCONCURRENCY: Default 5 parallel streams
- HTTP timeouts: READ (120s), WRITE (10s), CONNECT (10s)
- Messaging and voice
- MESSAGING_PLATFORM: discord or telegram
- DISCORDBOTTOKEN, ALLOWEDDISCORDCHANNELS
- TELEGRAMBOTTOKEN, ALLOWEDTELEGRAMUSER_ID
- CLAUDEWORKSPACE and ALLOWEDDIR: directories Claude may operate in
- MESSAGINGRATELIMIT and MESSAGINGRATEWINDOW: per-session messaging control
- VOICENOTEENABLED: enable voice note handling
- WHISPERDEVICE and WHISPERMODEL: voice configuration for local or NIM-based Whisper
- HF_TOKEN: optional Hugging Face token for faster downloads (local Whisper)
- Advanced request optimization
- FASTPREFIXDETECTION: enable fast prompt prefix detection
- ENABLENETWORKPROBE_MOCK: mock network probe requests
- ENABLETITLEGENERATION_SKIP: skip title generation requests
- ENABLESUGGESTIONMODE_SKIP: skip suggestion mode requests
- ENABLEFILEPATHEXTRACTION_MOCK: mock filepath extraction
- Environment and authentication
- ANTHROPICAUTHTOKEN: optional token for access control
- ANTHROPICBASEURL: base URL for the Anthropic-compatible endpoint
- Development and Extensibility
- Project structure (high level)
- server.py: entry point
- api/: API routes and service layer
- core/: shared protocol helpers, token counting, and conversion
- providers/: provider registry and transports
- messaging/: Discord/Telegram bots and session management
- config/: settings and model mappings
- cli/: command-line utilities
- tests/: Pytest test suite
- Quick commands for developers
- uv run ruff format
- uv run ruff check
- uv run ty check
- uv run pytest
- Extending with new providers
- Add a new OpenAI-compatible provider by extending an OpenAI transport and registering a descriptor.
- Add a native Anthropic provider by extending the AnthropicMessages transport and registering the descriptor.
- Add a fully custom provider by extending BaseProvider and implementing a streaming interface.
- Extending messaging platforms
- Extend the MessagingPlatform base class to support a new platform (e.g., Slack) with start, stop, sendmessage, editmessage, and on_message handlers.
- Contributing
- Report bugs or request features via Issues on the repository.
- Propose new providers (Groq, Together AI, etc.) and additional messaging platforms (Slack, etc.).
- Improve test coverage to ensure stability across providers and configurations.
- Note: Docker integration PRs are not currently accepted.
- Development workflow example
- Create a feature branch, format and lint the code, run type checks, and execute the Pytest suite.
- Open a pull request to merge feature branches back into main.
- License and Credits
- The project is released under the MIT License. See the LICENSE file for details.
- Built with FastAPI, the OpenAI Python SDK, discord.py, and python-telegram-bot.
- Acknowledges integration with NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama as providers.
- Quick Reference: Snippets and Commands (Summary)
- Starting the proxy server
- uv run uvicorn server:app --host 0.0.0.0 --port 8082
- Running Claude Code against the proxy
- ANTHROPICAUTHTOKEN="freecc"; ANTHROPICBASEURL="http://localhost:8082"; claude
- Discord bot setup (summary)
- Create bot in Discord, obtain token, configure environment variables, start server, invite bot with appropriate permissions.
- Multi-provider model mapping (concept)
- Use MODELOPUS, MODELSONNET, MODEL_HAIKU to map Opus/Sonnet/Haiku prompts to the desired provider, with MODEL serving as a fallback.
- Visualizing the Experience
- The project emphasizes a seamless user experience: Claude Code users can operate as usual, while the underlying proxy handles provider routing, local optimizations, and optional authentication.
- The included action image—Free Claude Code in action—illustrates the practical deployment with NIM-backed processing and a fully functional local proxy, embodying the “free and flexible” promise of the system.
- Final Notes
- Free Claude Code is designed to empower developers and researchers to experiment with multiple backends without changing their workflows.
- By combining a transparent proxy, per-model routing, local optimizations, and convenient bot integrations, it offers a practical path to cost control, privacy, and flexibility in modern AI-assisted development.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/Alishahryar1/free-claude-code
GitHub - Alishahryar1/free-claude-code: Free Claude Code
Free Claude Code is a lightweight proxy that routes Claude Code’s Anthropic API calls to multiple backends. It enables free or low-cost access to a range of lar...
github - alishahryar1/free-claude-code
Project
free-claude-code
Created
April 28
Last Updated
April 28, 2026 at 01:12 PM