ML Intern

smolagents logo

ML Intern: An Autonomous Research, Writing, and Deployment Agent for the Hugging Face Ecosystem

Introduction In an era where AI agents increasingly shoulder the heavy lifting of research, experimentation, and deployment, ML Intern stands out as a practical embodiment of autonomy in the Hugging Face ecosystem. This is not a mere script or a one-off tool; it is a thoughtfully designed framework that lets an agent autonomously explore documentation, read papers and datasets, and ship high-quality ML-related code. It does so by leveraging a rich map of tools, a modular architecture, and a robust flow that balances exploration with safety and human oversight when needed. The goal is simple to state, yet ambitious in scope: to enable an ML intern that can research, write, and ship ML projects with deep access to docs, datasets, and cloud compute—all while staying anchored to the familiar and trusted Hugging Face ecosystem.

What You’ll Find Inside ML Intern

A turnkey quick-start path that makes installation and first runs almost effortless.
An interactive, iterative agent loop that can run in both interactive and headless modes.
Rich support for both cloud-hosted models and local, OpenAI-compatible endpoints.
A comprehensive tracing and sharing system that uploads sessions to private Hugging Face datasets for visibility and auditing.
A documented architecture that clarifies responsibilities across components, from user input to tool execution and context management.
Extensible tooling and MCP (Master Control Program) support to customize the agent for diverse workflows.
Clear guidance for integrating alerting via Slack and other gateways to keep humans in the loop when needed.
Developer-oriented notes about pre-commit checks, built-in tools, and server configurations that make contributing straightforward.

Quick Start: Installing and Launching ML Intern Getting started is surprisingly smooth. The core goal of the quick-start section is to get you from zero to a working ML Intern in the shortest possible path, so you can begin experimenting with real workflows and tool integrations.

Installation
First, clone the repository, navigate into it, and install the package in editable mode: git clone git@github.com:huggingface/ml-intern.git cd ml-intern uv sync uv tool install -e .
This setup makes ml-intern available from any directory, so you can launch it with a simple command:
```
ml-intern
```
Environment Preparation
Create a .env file at the project root (or export the following in your shell). These tokens enable access to model providers, local endpoints, and GitHub actions: ANTHROPIC_API_KEY= OPENAI_API_KEY= LOCAL_LLM_BASE_URL=http://localhost:8000 LOCAL_LLM_API_KEY= HF_TOKEN= GITHUB_TOKEN=
If HF_TOKEN is not provided, the CLI will prompt you on first launch unless you’re starting with a local model. For a GitHub token, you’ll typically follow a fine-grained PAT creation flow.
Quick Usage Modes
Interactive mode: start a chat session ml-intern
Headless mode: single prompt with auto-approval ml-intern "fine-tune llama on my dataset"
Model selection: you can pass different models to the agent ml-intern --model anthropic/claude-opus-4-7 "your prompt" ml-intern --model openai/gpt-5.5 "your prompt" ml-intern --model ollama/llama3.1:8b "your prompt" ml-intern --model vllm/meta-llama/Llama-3.1-8B-Instruct "your prompt" ml-intern --max-iterations 100 "your prompt" ml-intern --no-stream "your prompt"
After starting, you can discover the full repertoire of model IDs with the /model command inside the session for Claude, GPT, HF-router variants, and local model prefixes.
Local Models and Shared Local Endpoints
Local models are supported via OpenAI-compatible HTTP endpoints through LiteLLM. Rather than loading weights directly, the agent connects to an inference server and selects the provider via a model prefix, e.g.: ml-intern --model ollama/llama3.1:8b "your prompt" ml-intern --model vllm/meta-llama/Llama-3.1-8B-Instruct "your prompt"
Inside interactive mode, you can switch models with the /model command: /model ollama/llama3.1:8b /model lm_studio/google/gemma-3-4b /model llamacpp/llama-3.1-8b-instruct
Supported local prefixes include ollama/, vllm/, lmstudio/, and llamacpp/. You can set a shared LOCALLLMBASEURL and LOCALLLMAPIKEY, or override a specific provider with its own BASEURL and APIKEY (e.g., OLLAMABASEURL, VLLMAPI_KEY). Provider-specific overrides take precedence over the shared settings.
Sharing Traces: Privacy, Repos, and Opt-Out
Every session is auto-uploaded to your private Hugging Face dataset, stored in a Claude Code JSONL format for trace viewing. The Hugging Face Agent Trace Viewer lets you browse turns, tool calls, and model responses on the Hub.
By default, the dataset is named {your-hf-username}/ml-intern-sessions and is private. You can flip to public from the CLI using: /share-traces /share-traces public /share-traces private
Visibility can also be toggled from the dataset’s page on HuggingFace. If you want to opt out entirely, configure your CLI: { "share_traces": false }
To change the destination repository for traces, set: { "personal_trace_repo_template": "{hf_user}/my-custom-traces" }
Note that the shared smolagents/ml-intern-sessions dataset is unrelated; it only receives anonymized telemetry rows used by the backend KPI scheduler.
Gateways: Notifications to External Services
ML Intern can push one-way notifications to gateways such as Slack. These are status updates that help humans stay informed about approvals, errors, or completion.
Slack setup involves creating a Slack app with a bot token and inviting it to the destination channel. Then you configure: SLACK_BOT_TOKEN=xoxb-... SLACK_CHANNEL_ID=C...
The CLI can auto-create a slack.default destination when both values are present. For persistent, user-level configuration, you can add overrides such as:
- MLINTERNSLACK_NOTIFICATIONS=false
- MLINTERNSLACK_DESTINATION=slack.ops
- MLINTERNSLACKAUTOEVENTS=approvalrequired,error,turncomplete
- MLINTERNSLACKALLOWAGENT_TOOL=true
- MLINTERNSLACKALLOWAUTO_EVENTS=true
A sample persistent configuration might look like: { "messaging": { "enabled": true, "auto_event_types": ["approval_required", "error", "turn_complete"], "destinations": { "slack.ops": { "provider": "slack", "token": "${SLACK_BOT_TOKEN}", "channel": "${SLACK_CHANNEL_ID}", "allow_agent_tool": true, "allow_auto_events": true } } } }
This gateway approach helps keep collaborators in the loop without requiring them to monitor the console, enabling a smoother human-in-the-loop workflow.

Architecture: A Clear View of How ML Intern Is Built The backbone of ML Intern is a well-structured architecture designed to separate concerns, manage complexity, and support experimentation. The architecture can be summarized as a layered, modular graph of components that communicate through clearly defined queues and interfaces.

Component Overview
User/CLI: The human operator or automated script that starts the session and issues prompts.
Operations: The primary event source that feeds user input, tool results, and system events into the processing pipeline.
event_queue: Collects events such as user input, tool calls, and status updates for downstream processing.
submissionloop (agentloop.py): The core loop that orchestrates turns, tool calls, and model interactions.
ContextManager: Maintains the conversation history, tool results, and auto-compaction. It stores the state of the session (e.g., a historical list of messages) and handles the pruning of very long histories (170k tokens cap is a practical example of what happens under the hood).
ToolRouter: Routes requests to a suite of tools, including documentation access, research repositories, code search, sandbox environments, and MCP server utilities. The tool router is the gateway that ties external capabilities into the agent’s decision-making loop.
HF docs & Research: The hub for accessing Hugging Face documentation, research papers, datasets, and associated resources.
HF Repos, Datasets, Jobs, Papers: The browsing and retrieval layer for code, datasets, models, and publications.
GitHub Code Search: A tool to search code repositories in support of writing ML code and understanding project structure.
Sandbox & Local Tools: Local, safe environments for testing and prototyping changes without affecting production workflows.
Planning: Systems that help plan tasks, organize steps, and manage expectations about what the agent should attempt to do next.
MCP Server Tools: Utilities to interact with MCP servers, enabling advanced orchestration of model control processes.
Doom Loop Detector: A safeguard mechanism that detects repeated patterns or circular tool usage and injects corrective prompts to keep the interaction productive.
Agentic Loop: A capped iteration loop (e.g., max 300 iterations) that orchestrates the dialogue between the agent and the world, including tool calls, results, and memory updates.
The Agentic Loop Flow
The agent begins with a user message and adds it to the ContextManager.
It enters a structured Iteration Loop (up to 300 iterations) where it:
- Retrieves messages and tool specifications
- Invokes the LLM (litellm.acompletion)
- Checks for tool_calls in the model’s response
- If tool_calls exist, it enters a decision flow: does it require user approval? If yes, it pauses and waits for user confirmation. If no, it executes the calls via ToolRouter and captures the results
- Results feed back into the ContextManager
- The loop repeats until no tool_calls remain or a pre-defined threshold is reached
The Doom Loop Detector monitors for repetitive tool usage and autogenerates prompts to realign the agent’s strategy, ensuring continued progress without degenerating into endless cycles.
Events: What You See Happen Under the Hood
The agent emits a well-defined set of events via the event_queue, including:
- processing: when the agent starts processing user input
- ready: the agent is ready for input
- assistant_chunk: streaming token chunk
- assistant_message: the complete LLM response
- tool_call: the tool being invoked
- tool_output: the tool result
- tool_log: tool-related logs
- toolstatechange: status transitions of tool execution
- approval_required: a request for user approval for sensitive actions
- turn_complete: the agent finishes a turn
- error: processing error
- interrupted: the agent was interrupted
- compacted: the context was compacted
- undo_complete: an undo operation completed
- shutdown: the agent is shutting down
These events are essential for building observability into the agent’s behavior and for integrating with external dashboards or alerting systems.

Shared Data, Privacy, and Telemetry A distinctive feature of ML Intern is its emphasis on traceability and collaboration. Each session’s dialogue, tool calls, and model outputs are uploaded to a private HF dataset for auditing and research. This enables a high level of transparency and reusability: researchers and engineers can replay sessions, inspect decisions, and improve the agent’s behavior over time.

Opt-Out and Customization
If privacy or policy considerations require, you can opt out of trace uploads by adjusting your CLI config. The system respects the user’s preference and provides a straightforward path to disable sharing entirely.
You can customize where traces go by setting a personal trace repository template, letting you tailor the destination to organizational schemes or private workflows.

Gateways and Notifications: Keeping Humans in the Loop ML Intern supports the concept of outbound status updates via simple, dependable gateways. Slack is a core example, but the architecture is designed to accommodate additional gateways with minimal friction.

Slack Integration
To enable Slack notifications, you supply a bot token and target channel ID. The agent uses Slack’s Web API to push messages when approvals are required, errors occur, or a turn completes.
The default destination is slack.default. You can override this and configure more destinations, including enabling or disabling automatic event types as needed.

Development and Extensibility: How to Grow ML Intern ML Intern is designed with a developer-friendly mindset. The repository includes guidance on keeping quality high, extending the toolset, and configuring servers for MCP-based operations.

Pre-Commit Checks
Before committing changes, run Ruff checks to enforce code quality and formatting: uv run ruff check . uv run ruff format --check .
If formatting fails, run: uv run ruff format .
It’s recommended to re-run the checks until they pass to maintain a clean and consistent codebase.
Adding Built-in Tools
Developers can extend the toolkit by editing agent/core/tools.py to introduce new built-in tools. A minimal example shows how to define a tool spec and a corresponding handler: def create_builtin_tools() -> list[ToolSpec]: return [ ToolSpec( name="your_tool", description="What your tool does", parameters={ "type": "object", "properties": { "param": {"type": "string", "description": "Parameter description"} }, "required": ["param"] }, handler=your_async_handler, ), # ... existing tools ]
This pattern makes it straightforward to expand the agent’s capabilities with domain-specific tooling.
MCP Server and CLI Configurations
You can customize the MCP servers that the agent can reach by editing:
- configs/cliagentconfig.json for CLI defaults
- configs/frontendagentconfig.json for web-session defaults
Example MCP configuration: { "model_name": "anthropic/claude-sonnet-4-5-20250929", "mcpServers": { "your-server-name": { "transport": "http", "url": "https://example.com/mcp", "headers": { "Authorization": "Bearer ${YOUR_TOKEN}" } } } }
Environment variable substitution is supported through your .env file, ensuring tokens and secrets can be threaded through secure channels.

Use Cases: Real-World Scenarios Where ML Intern Shines

Rapid Prototyping of ML Pipelines
The agent can search the Hugging Face ecosystem for state-of-the-art models, compare datasets, and prototype a pipeline that fine-tunes a model on your data. The ToolRouter can fetch papers, locate experimental results, and pull in code snippets from relevant repositories.
Research-Oriented Tasks
The agent can read papers, extract experimental setups, and generate initial code scaffolds that implement the reported methods. It can also collect and organize related datasets or metrics, enabling a quick path from literature to a runnable experiment.
Documentation and Reproducibility
By archiving sessions and tool calls, teams gain a reproducible trail of the decisions and actions taken by the agent. This makes audits, reviews, and collaboration more straightforward.
Local and Cloud Hybrid Workflows
ML Intern supports hybrid workflows that combine cloud-based models with local inference servers. This design gives teams flexibility in how they deploy and test models, without being locked into a single environment.

Best Practices for Working with ML Intern

Start with a Well-Defined Prompt
Because the agent’s performance heavily depends on the prompts and tool configurations, begin with a clear objective, a defined success criteria, and a small, testable scope for your first session.
Leverage Local Models for Rapid Iteration
Use local models during prototyping to keep iteration cycles fast and costs manageable. Switch to cloud providers when you need scale or specific capabilities.
Use Traces for Continuous Improvement
Review uploaded traces to understand how the agent makes decisions. Use these insights to refine tools, adjust prompts, and improve the planning logic.
Align Gateways with Team Workflows
Slack and other gateways should reflect your team’s workflows. Configure destinations and events so that humans receive timely, actionable information without being overwhelmed by noise.
Safeguards and Safety
Be mindful of the Doom Loop Detector and approval workflows. Ensure that sensitive or destructive operations require explicit approval to protect resources and data.

Conclusion: A Practical Path to Autonomous ML Workflows ML Intern represents a thoughtful attempt to give an ML practitioner a capable, autonomous assistant built on top of the Hugging Face ecosystem. It blends interactive exploration, robust tool integration, and a clear, auditable trace mechanism to create a workflow that’s both powerful and transparent. It invites researchers and engineers to shift from manual glue code assembly to higher-level, goal-oriented automation where the agent can discover the best routes to a solution, write the necessary code, and ship results—while staying visible to collaborators and governed by practical safety gates.

If you’re exploring ML at scale, ML Intern offers a concrete blueprint for how to structure an AI agent that can do more than just answer questions. It can discover relevant papers, locate datasets, interact with code repositories, run experiments, and push results to a shared space where teammates can review and extend. With its modular architecture, local and cloud model support, trace sharing, gateway notifications, and thoughtful development practices, it provides a compelling path toward practical, productive AI-assisted ML development.

In short, ML Intern is not just a tool—it’s a blueprint for autonomous ML workflows that respects the realities of collaboration, reproducibility, and safety in modern AI research and development. As you experiment with its capabilities, you’ll likely discover new workflows, tooling enhancements, and integration points that can further accelerate your ML projects while preserving the human-in-the-loop oversight that ensures quality and accountability. This is the kind of ecosystem-aware automation that helps teams move from ad hoc experimentation to reliable, repeatable, and scalable ML practice.

ML Intern

Enjoying this project?

GitHub - huggingface/ml-intern: ML Intern

Stay Updated

Product

Learn

Company

Legal

Stay Updated

Browse by Category