GBrain
GBrain: The Production Brain That Makes AI Agents Smarter While They Sleep
GBrain is more than a project name. It’s a production-grade brain for AI agents—an engine that powers OpenClaw and Hermes deployments with self-wiring knowledge, structured memory, and a living, evolving understanding of people, companies, and ideas. Born from the hands of Garry Tan and his YC ecosystem, this brain ingests meetings, emails, tweets, voice calls, and original notions while you rest, then wakes up smarter than the day before. It doesn’t just store facts; it enriches, links, and organizes them into a graph that grows with every interaction. The result is an agent that not only answers questions but reasons over an ever-expanding map of relationships, timelines, and curated truth.
What Makes GBrain Special
- A real production brain: 17,888 pages spanning 4,383 people and 723 companies, with 21 autonomous cron jobs running in the background, all built in 12 days. It is designed to ingest, organize, and synthesize knowledge at scale, while maintaining strict control over citations and memory consolidation.
- Self-wiring memory: Every page authored updates typed links (for example, attended, worksat, investedin, founded, advises) without triggering new LLM calls. Over time, the brain becomes more capable of connecting dots—without ever asking you to re-train or re-label.
- Hybrid search and a knowledge graph: The system blends vector search with structured graph traversal, enabling queries that pure vector models or single-search methods miss. Questions like “who works at Acme AI?” or “what did Bob invest in this quarter?” return precise, graph-grounded results.
- A disciplined knowledge model: The brain uses a compiled truth with an append-only timeline. This creates a reliable, evolving narrative for each page, with backlinks and typed relationships that reinforce accuracy and traceability.
- 29 skills, 30 minutes to a working brain: The architecture ships with a complete set of capabilities, organized into skill families that empower the agent to ingest, reason, and act. The system is designed to be installed quickly and to scale through modular skills and a robust governance framework.
The Core Architecture: How GBrain Thinks
At its heart, GBrain is a trifecta: a codebase (the brain repository), a retrieval layer (the 29 skills and the graph), and an AI agent that reads and writes through both. The brain repo stores the master content: the compiled truth, the timelines, and the knowledge graph definitions. The retrieval layer—the 29 skills and their deterministic logic—provides the guided workflows that turn raw input into structured pages and reliable answers. The agent reads the brain, updates it, and relies on the graph to relate people, companies, and events.
Key components include:
- A Postgres plus pgvector-backed store for fast retrieval and scalable embeddings.
- Typed links that automatically create meaningful relations on every write, such as attended, worksat, investedin, founded, and advises.
- A hybrid search stack that blends vector similarity, keyword matching, and a ranking fusion mechanism (Reciprocal Rank Fusion) to deliver precise and comprehensive results.
- A four-layer retrieval and ranking approach that ensures exact phrases aren’t missed by pure vector methods while still capturing conceptual connections.
- A “knowledge graph” page model: every person, company, and concept becomes a node with typed relationships, backlinked to improve search relevance and discovery.
A Day in the Life: How the Signal Becomes Knowledge
- Signal arrives: meetings, emails, tweets, voice calls, or a new idea. A signal detector captures both ideas and entities in parallel, never blocking the flow.
- Brain-ops first: the brain checks for existing knowledge (gbrain search and gbrain get) before returning a response.
- Respond with context: answers include context drawn from the compiled truth, timelines, and the graph to prevent hallucinations.
- Write and link: new knowledge is written to brain pages, with citations fixed and memory augmented. Typed links are extracted and established on every write, without additional LLM calls.
- Sync and index: changes are propagated so that the next query benefits from fresh connections and updated relationships.
- Overnight improvements: the system evolves through dream-like syntheses—transcripts become reflections and long-range patterns—so the agent wakes with deeper, longer-term insight.
The 29 Ships: What GBrain Can Do
GBrain is designed around 29 skills, grouped into coherent families. The organization emphasizes a “thin harness, fat skills” approach: the intelligence lives in the skills, while the runtime remains lean and deterministic where possible.
- Always-on skills: signal-detector, brain-ops
- signal-detector: captures signal from every message and seeds an autopilot growth of ideas and entities.
- brain-ops: performs brain-first lookups before external API calls, powering a smarter response before asking the outside world.
- Content ingestion: turning raw input into brain pages
- ingest: routes input types to the correct ingestion path
- idea-ingest, media-ingest, meeting-ingestion: transform ideas, articles, videos, transcripts into connected brain pages with cross-linking and entities.
- Brain operations: enrichment, search, and maintenance
- enrich: multi-tier enrichment of people and companies with compiled truth and timelines
- query: three-layer search with synthesis and citations
- maintain: health checks for stale pages, dead links, and back-link consistency
- citation-fixer: standardizes missing or malformed citations
- repo-architecture: where new brain files go
- publish: share brain pages as password-protected HTML, zero LLM calls
- data-research: structured data extraction from emails and messages
- Operational: day-to-day reliability and workflows
- daily-task-manager, daily-task-prep, cron-scheduler, reports, cross-modal-review, webhook-transforms, testing, skill-creator, skillify, skillpack-check, smoke-test, minion-orchestrator
- These create a durable, auditable, scalable engine for ongoing work
- Identity and setup
- soul-audit: builds a multi-part identity for the agent
- setup, migrate, briefing: set up environments, migrate legacy content, and provide ongoing context
- Conventions cross-cutting rules
- quality, brain-first, model-routing, test-before-bulk, cross-modal.yaml
- Additional capabilities
- The architecture supports remote MCP servers, CC-enabled Claude Code workflows, GStack integration for code lookup, and a robust plugin system to extend subagents.
The Minions: Durable Sub-Agents That Won’t Drop the Ball
Minions bring a new durability layer to long-running tasks. They’re a Postgres-native job queue embedded in the brain, designed to survive gateway restarts and to provide progress streams, pause/resume controls, and robust resilience. A typical production setup might run multiple cron jobs while a single brain hosts 19 cron jobs in one container, with real throughput on real-world workloads. Minions guarantee that long tasks don’t disappear when the gateway crashes or restarts, and they demonstrate how the system fixes earlier pains: runaway processes, abandoned tasks, gateway crashes, and inconsistent state.
- Deterministic routing: minionmode with a paintriggered default ensures they handle work predictably.
- Hierarchical task orchestration: parent-child DAGs with a central inbox for “child_done” messages, ensuring durability through worker restarts.
- Zero or low token cost: tasks run with minimal or no LLM calls in many cases, delivering high performance at low cost.
- Observability and reliability: a health dashboard, smoke tests, automatic migrations, and a self-healing upgrade path.
New in v0.25.0: BrainBench-Real
BrainBench-Real introduces session capture and contributor opt-in. With GBRAINCONTRIBUTORMODE=1, every real query and search call across the main MCP, CLI, or subagent bridge is captured (PII-scrubbed) into an eval_candidates table. You can snapshot with gbrain eval export and replay against your code changes with gbrain eval replay. Three metrics come back: mean Jaccard@k between captured and current retrieved slugs, top-1 stability, and latency delta. This feature is off by default for production users to respect privacy and data minimization. Documentation includes eval-bench.md and eval-capture.md, and you can expect roughly 30 minutes to a working brain. The database can run on PGLite (no server) for quick setup.
A Glimpse at the Numbers: Production Magnitude
- The author’s OpenClaw deployment is described as a single Render container with a Supabase Postgres store housing a 45,000-page brain and 19 cron jobs firing on schedule. The scenario includes ingesting a month of social posts into the brain as structured pages, illustrating the scale and practicality of the system.
- Minions deliver impressive metrics: sub-agents can spawn in under a second, with near-zero token costs, and a durable, scalable pipeline that performs much faster than prior setups.
- The ecosystem includes a full benchmarking repository (gbrain-evals) with scorecards, corpus, and multi-adapter comparisons, demonstrating the rigor behind the claims of improved retrieval quality and reliability.
The Balance of Graph and Vector: Knowledge Graph Meets Vector Space
- The knowledge model follows a two-layer approach: a compiled truth that serves as the anchor for truth and reliability, plus a timeline that records append-only evidence. This ensures that the brain’s understanding isn’t a moving target but a narrative that grows with evidence.
- The graph model makes every page a node with typed relations. As pages are written, relationships are extracted and refined, allowing complex queries about relationships, such as “who works at Acme AI?” or “what is Bob’s investment history this quarter?” to be answered with precision and verifiability.
- Hybrid search and ranking: the system uses a multi-step, multi-query approach with intent classification, expansion, vector search, and keyword search, then applies RRF fusion to balance precision and recall. The result is higher-quality results than vector-only or keyword-only approaches, with deterministic improvements verified across benchmarks.
Getting Data In: Recipes and Ingestion
GBrain ships data-in recipes that guide your agent on credentials, validation, and cron scheduling. These include:
- Public Tunnel (ngrok-tunnel)
- Credential Gateway
- Voice-to-Brain
- Email-to-Brain
- X-to-Brain (Twitter)
- Calendar-to-Brain
- Meeting Sync
- Data research recipes for investor updates, expenses, and company metrics These recipes enable the agent to connect external data sources and convert them into structured brain pages, ready for cross-linking and later reasoning. The kit also includes a robust set of integration dashboards and status checks to monitor health and progress.
GBrain and GStack: A Tight Coupling
GStack is the coding engine that powers brain-level operations. GBrain complements GStack by handling non-code, knowledge-centric tasks. The separation is intentional: GStack handles coding skills—ship, review, QA, investigation—while GBrain handles brain operations, signal detection, ingestion, enrichment, and the knowledge graph. A small bridge at hosts/gbrain.ts ensures that GStack’s coding skills consult the brain first, ensuring safe and intelligent code modifications.
Architecture at a Glance
- The system sits in a triad: Brain Repo (markdown, truth, timelines), GBrain (retrieval and graph), and the AI Agent (the reader and writer that interacts with both).
- The brain acts as the source of truth, the retrieval layer as the engine for reading and writing, and the agent as the orchestration layer that uses both to deliver answers, actions, and updates.
- File storage is tiered: content can live in database-backed storage, or be kept in git-like storage with selective db-only directories. The system can auto-manage .gitignore rules and support exports to restore missing pages when needed.
From Origin to Practice: The Story Behind the Brain
The origin story captures a bold insight: starting with a markdown brain repository (one page for each person and company) and evolving quickly into a scalable system, capable of ingesting years of calendar data, transcripts, and a vast corpus of ideas. The dream cycle—where transcripts become reflections and patterns—creates a personal experience for the agent’s owner while producing a generalizable pattern: the brain’s power grows as more content flows in, and its utility becomes clearer as it ingests more of the user’s world.
Getting Data In: Recipes and Data Excellence
The platform ships a suite of data integration recipes to help the agent gather information. These recipes cover ngrok tunnels, credential gateways, voice-to-brain pipelines, email-to-brain pipelines, Twitter-to-brain, calendar-to-brain, and meeting-sync workflows. The system also includes data-research recipes designed to extract structured metrics from investor updates, expenses, and company dashboards. The recipe approach provides repeatable, auditable methods to bring diverse data into the brain, enabling consistent cross-linking and search results.
Contributing, Conventions, and the Open-Source Ethos
GBrain’s documentation is explicit about the conventions that guide its operation, including:
- Quality conventions for citations, back-links, notability gates, and source attribution
- Brain-first rules that ensure a five-step lookup before external API calls
- Model-routing conventions for choosing the right tool for the right task
- Test-before-bulk principles for safe batch operations
- Cross-modal review and refusal routing to preserve accuracy when models disagree The project’s open-source spirit is evident in references to the GBRAIN_EVALS repository, change logs, and contributor guides. The license is MIT, inviting collaboration and adaptation.
Images and Visual Context
In the blog, a visual touchpoint anchors the voice communication capability:
- Voice calls are not abstract: the brain can answer on a call, and after the call ends, a brain page appears with the transcript, entity detection, and cross-references. A representative image shows a voice client connected, reinforcing how voice data becomes brain content in real time. See the image titled Voice client connected in the project’s docs.
A Path Forward: Why GBrain Matters for Your AI Agent
- Reliability and scale: The architecture is designed to endure real production workloads, not just academic demonstrations. Durable sub-agents, deterministic routing, and a robust memory graph make the system resilient in everyday use.
- Better questions, better answers: The combination of graph-based reasoning and vector-augmented retrieval helps the agent surface precise answers and meaningful connections that single-method search would miss.
- Incremental improvement: Nightly syntheses, dream-cycle reflections, and structured timeline updates ensure the brain becomes more insightful over time, without user intervention.
- Openness and extensibility: The architecture is modular, with 29 skills and a well-defined skillpack system. The plugin-based approach allows the brain to be extended with new subagents and capabilities as needs grow.
A Final Word
GBrain embodies a bold approach to AI agents: a self-wiring, self-improving brain that sits between raw data and human insight, curating a living map of people, companies, and ideas. It is not merely an information store; it is a reasoning engine that links evidence, preserves provenance, and evolves with usage. The result is a smarter, more capable agent that can answer nuanced questions, draft well-supported conclusions, and carry out complex workflows with a level of reliability that mirrors human decision processes—yet with scalable automation and machine-fast memory.
If you’re building or deploying AI agents today, GBrain offers a compelling blueprint for turning scattered data into a coherent, actionable intelligence layer. It demonstrates how to embody the patterns of expert knowledge—typed links, compiled truth, and a graph-aware retrieval stack—into an agent that learns, remembers, and improves over time. The 29 skills, the durable minions, the hybrid search, and the practical data-in recipes come together to form a compelling ecosystem for next-generation AI assistants. And with a community and an MIT license, it invites collaboration, experimentation, and ongoing refinement to push what AI agents can achieve in the real world.
Images referenced or included in this post:
- Voice client connected (docs/images/voice-client.png)
Notes for readers who want to explore further
- If you’re curious about the measurable outcomes, check the BrainBench scores in the gbrain-evals repository to see how graph-enabled retrieval compares to other configurations.
- For an in-depth view of installation and deployment, peruse the install and MCP sections, including standalone CLI workflows and agent-based deployment guidance.
- For developers, the skill framework offers a modular path to extend or replace capabilities, ensuring your own innovations can slot into the same proven execution loop.
In short, GBrain is not just a clever idea; it’s a practical platform designed to grow smarter with your data—and to do so with a discipline that prioritizes truth, provenance, and reliable operation.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/garrytan/gbrain
GitHub - garrytan/gbrain: GBrain
GBrain is an open-source AI assistant that self-wires knowledge from diverse data sources and evolves with usage, providing a reasoning engine for AI agents....
github - garrytan/gbrain