LLM Wiki: A Self-Building Personal Knowledge Base
LLM Wiki: A Personal Knowledge Base That Builds Itself
[Logo: LLM Wiki Logo]
Welcome to a new kind of personal knowledge base. LLM Wiki is a cross‑platform desktop application that reads your documents, builds a structured wiki, and keeps it current — incrementally, persistently, and intelligently. It reimagines how you store, connect, and retrieve knowledge by moving away from re-deriving answers on every query and toward a living, self‑improving knowledge network you own and curate.
[Overview Image]
What is LLM Wiki?
LLM Wiki is a desktop application designed to turn your scattered documents into an organized, interlinked knowledge base. It analyzes your sources in two steps, constructs wiki pages with traceable sources, and maintains a persistent index that evolves as new material arrives. This approach, inspired by Karpathy’s LLM Wiki pattern, emphasizes building a wiki once and keeping it up to date, rather than repeatedly assembling answers from scratch at query time.
The architecture and flow are visually represented in the included diagrams, which illustrate how raw sources feed a centralized wiki that is continually refined by the LLM, while being anchored by a stable schema and a navigable knowledge graph.
[LLM Wiki Architecture Image]
Why a wiki instead of a traditional RAG cycle?
- Incremental building: Your wiki grows from real sources and is updated as new material arrives, rather than re-deriving everything from scratch for each question.
- Persistent knowledge: The system maintains state across sessions, restart, and long ingest operations, so your work doesn’t vanish.
- Rich interconnections: The wiki is linked through wikilinks, cross-references, and a graph that reveals relationships between entities, concepts, and sources.
- Multimodal ingestion: Images embedded in PDFs and other documents are extracted, captioned, and surfaced in image-aware searches.
- Human-in-the-loop options: You can review, refine, and guide the knowledge network with asynchronous review and deep research features.
What We Kept from the Original
The project adheres to the core architecture and patterns originally described by Andrej Karpathy’s LLM Wiki concept. The foundational ideas are preserved and extended with a concrete desktop implementation and substantial enhancements.
Key original elements retained:
- Three-layer architecture: Raw Sources (immutable) → Wiki (LLM-generated) → Schema (rules & config)
- Core operations: Ingest, Query, Lint
- index.md as content catalog and LLM navigation entry point
- log.md as a chronological operation record
- Wikilinks syntax for cross-references
- YAML frontmatter on every wiki page
- Obsidian compatibility: the wiki directory behaves like an Obsidian vault
- Human curation with LLM maintenance as a central pattern
[Obsidian Compatibility Image]
What We Changed & Added
1) From CLI to Desktop Application We transformed the abstract pattern into a full cross‑platform desktop app with:
- Three-column layout: Knowledge Tree / File Tree (left) + Chat (center) + Preview (right)
- Icon sidebar for quick switching between Wiki, Sources, Search, Graph, Lint, Review, Deep Research, Settings
- Custom resizable panels with logical min/max constraints
- Activity panel showing real‑time ingest progress
- All state persists across restarts — conversations, settings, review items, and project config
- Scenario templates (Research, Reading, Personal Growth, Business, General) preconfiguring purpose.md and schema.md
2) Purpose.md — The Wiki’s Soul We added a dedicated purpose.md to define goals, key questions, and research scope. The LLM reads it during every ingest and query for context, and can propose updates based on usage patterns. Purpose is directional, complementing the structural role of schema.
3) Two‑Step Chain‑Of‑Thought Ingest The ingest process is split into two sequential LLM calls for higher quality:
- Step 1 (Analysis): LLM reads the source and produces structured analysis — entities, concepts, arguments, existing knowledge connections, contradictions, and recommendations for wiki structure
- Step 2 (Generation): LLM uses the analysis to generate wiki files — source summaries with frontmatter, entity and concept pages with cross-references, updated index.md, log.md, overview.md, and review items for human judgment
Additional ingest enhancements:
- SHA256 incremental cache to skip unchanged sources
- Persistent ingest queue with crash recovery and retry
- Recursive folder import preserving directory structure and folder context as a classification hint
- Queue visualization in the Activity Panel
- Auto-embedding when vector search is enabled
- Source traceability: every generated wiki page links back to the raw sources
- overview.md auto-update to reflect the current wiki state
- Guaranteed source summary, even if the LLM omits it
- Language-aware generation (English or Chinese)
4) Knowledge Graph with Relevance Model A full knowledge graph visualization and a four‑signal relevance model augment the wiki’s connectivity:
- Signals in the 4‑Signal Relevance Model:
- Direct link: pages linked via wikilinks
- Source overlap: pages sharing the same raw source
- Adamic-Adar: pages sharing common neighbors, weighted by neighbor degree
- Type affinity: bonus for same page type (entity↔entity, concept↔concept)
Graph visualization details:
- Node colors indicate page type or community
- Node sizes scale with link count
- Edge thickness and color reflect relevance strength
- Hover highlights neighbors; non-neighbors dim; labels show relevance scores
- Zoom controls and position caching to maintain stability during updates
- A legend that can switch between type counts and community information
[Knowledge Graph Image]
5) Louvain Community Detection This feature introduces automatic clustering of wiki pages based on link topology:
- Auto-clustering discovers natural groupings independent of predefined types
- Color modes: by page type or by discovered knowledge clusters
- Cohesion scoring flags low‑cohesion clusters for review
- A 12‑color palette provides clear visual separation
- Community legend shows top node labels, member counts, and cohesion per cluster
[Louvain Community Detection Image]
6) Graph Insights — Surprising Connections & Knowledge Gaps The system analyzes graph structure to surface actionable insights:
- Surprising connections identify cross‑community edges, cross‑type links, and bridge nodes
- A composite surprise score ranks noteworthy connections
- Dismissable insights can be marked as reviewed
- Knowledge gaps highlight isolated pages, sparse communities, and bridge nodes
- Interactive insights: click to highlight related nodes and edges; deep research can be triggered from insight cards
- A dedicated Deep Research button opens domain‑aware topics, reads overview.md and purpose.md for context, and launches an LLM‑driven research workflow
[Graph Insights Image]
7) Optimized Query Retrieval Pipeline A multi‑phase retrieval pipeline enhances recall, with optional vector search:
- Phase 1: Tokenized search (English) and CJK tokenization (Chinese), including a title match bonus
- Phase 1.5: Optional vector semantic search via an embedding endpoint; stored in LanceDB for fast retrieval and cosine similarity ranking
- Phase 2: Graph expansion using top results as seeds; a 4-signal relevance model guides 2‑hop traversal with decay
- Phase 3: Budget control to balance context window size (4K to 1M tokens) and proportional content allocation (60% wiki, 20% chat history, 5% index, 15% system)
- Phase 4: Context assembly with numbered pages and full content; system prompt includes purpose.md, language rules, and citation format
- Vector search is optional and configurable; when disabled, the pipeline uses tokenized search plus graph expansion
- Benchmark: vector search improves overall recall from 58.2% to 71.4%
8) Multi‑Conversation Chat with Persistence We added full multi‑conversation support:
- Independent chat sessions that can be created, renamed, or deleted
- A conversation sidebar for quick topic switching
- Per‑conversation persistence saved to .llm-wiki/chats/{id}.json
- Configurable history depth to control context size
- Cited references panel on each response, showing which wiki pages were used
- References stored with messages to survive restarts
- Regenerate function to re‑generate the last response
- Save to Wiki: archive valuable answers to wiki/queries/ and auto-ingest to extract entities/concepts
9) Thinking / Reasoning Display For LLMs that emit explicit thinking blocks, the UI presents:
- Streaming thinking with a rolling 5‑line display
- Think blocks collapsed by default and expandable on demand
- Clear visual separation from the main response
10) KaTeX Math Rendering Full LaTeX math support across all views:
- Inline and block math rendered via KaTeX
- Milkdown math plugin enables native rendering in the editor
- Auto-detection wraps LaTeX environments with proper delimiters
- Unicode fallback provides mappings for common symbols outside math blocks
11) Review System (Async Human‑in‑the‑Loop) An asynchronous review queue helps with ingest governance:
- LLM flags items requiring human judgment
- Predefined action types limit actions to page creation, deep research, or skip
- Pre-generated optimized web search queries for review items
- Humans handle reviews without blocking ingestion
12) Deep Research A new Deep Research workflow activates when the graph reveals knowledge gaps:
- Web search via Tavily API to fetch full-content sources
- Multiple topic queries generated at ingest time, domain‑specific rather than generic
- LLM‑synthesized topics and queries, read purpose.md + overview.md for context
- Human confirmation dialog before starting advanced research
- LLM synthesizes findings into a wiki research page with cross‑references
- Thinking blocks appear as collapsible sections during synthesis
- Auto-ingest processes research results to extract entities/concepts
- A practical task queue with three concurrent tasks
- A dedicated Research Panel with dynamic height and real‑time progress
13) Browser Extension (Web Clipper) A dedicated Chrome extension (Manifest V3) for clip-and-ingest:
- Readability.js for accurate article extraction (ads and navigation stripped)
- Turndown.js for HTML → Markdown conversion with table support
- Project picker to clip into one or multiple wiki projects
- Local HTTP API (port 19827) to communicate extension ↔ app
- Auto-ingest triggers for clipped content
- Clip watcher polls for new clips and processes automatically
- Offline preview shows extracted content even when the app isn’t running
[Chrome Extension Web Clipper Image]
14) Multi-format Document Support Beyond plain text and Markdown, LLM Wiki preserves document semantics during extraction:
- PDF: pdf-extract with content caching
- DOCX: docx-rs for headings, formatting, lists, and tables
- PPTX: slide-by-slide extraction with headings and lists
- XLSX/XLS/ODS: calamine for cells and multi‑sheet support (including Markdown tables)
- Images: native preview
- Video/Audio: built‑in player
- Web clips: Readability.js + Turndown.js into clean Markdown
15) File Deletion with Cascade Cleanup Intelligent cascade deletion keeps the wiki clean:
- Deleting a source removes its wiki summary page
- Three-method matching finds related pages: frontmatter sources field, source summary page name, and references
- Shared entity preservation: pages linked to multiple sources survive with the deleted source removed from their sources
- Index cleanup removes deleted pages from index.md
- Wikilinks to deleted pages are cleaned up from remaining pages
16) Configurable Context Window Users control how much context the LLM receives:
- Slider range from 4K to 1M tokens
- Proportional budget allocation gives larger contexts more wiki content
- A fixed 60/20/5/15 split for wiki, chat history, index, and system prompt
17) Cross‑Platform Compatibility Concrete cross‑platform considerations:
- Path normalization for consistent file handling
- Unicode-safe string handling to prevent crashes on CJK filenames
- macOS window behavior that hides the app but keeps it running
- Windows/Linux quit confirmation to prevent data loss
- Tauri v2 for native desktop experiences across macOS, Windows, Linux
- GitHub Actions CI/CD for automated multi‑platform builds
18) Other Additions
- i18n with English and Chinese interfaces
- Settings persistence via a local store
- Obsidian config directory auto-generated with recommended settings
- Markdown rendering improvements (GFM tables with borders, proper code blocks, wikilink processing)
- Multi-provider LLM support (OpenAI, Anthropic, Google, Ollama, Custom) with provider-specific streaming
- 15‑minute ingest timeout to avoid premature failures
- Data version signaling so the graph and UI refresh when wiki content changes
Tech Stack
- Desktop: Tauri v2 (Rust backend)
- Frontend: React 19 + TypeScript + Vite
- UI: shadcn/ui + Tailwind CSS v4
- Editor: Milkdown
- Graph: sigma.js + graphology + ForceAtlas2
- Search: Tokenized search, graph relevance, optional vector search
- Vector DB: LanceDB (Rust, embedded, optional)
- PDF: pdf-extract
- Office formats: docx-rs + calamine
- i18n: react-i18next
- State: Zustand
- LLM: Streaming endpoints (OpenAI, Anthropic, Google, Ollama, Custom)
- Web search: Tavily API
Installation
Pre-built binaries
- macOS: .dmg (Apple Silicon + Intel)
- Windows: .msi
- Linux: .deb / .AppImage
Build from source
- Prereqs: Node.js 20+, Rust 1.70+
- Steps:
- git clone https://github.com/nashsu/llm_wiki.git
- cd llm_wiki
- npm install
- npm run tauri dev
- npm run tauri build
Chrome Extension
- Enable Developer mode in chrome://extensions
- Load unpacked from extension/ folder
Quick Start
- Launch the app and create a new project using a template
- In Settings, configure your LLM provider with API key and model
- In Sources, import documents (PDF, DOCX, MD, etc.)
- Watch the Activity Panel: the LLM incrementally builds wiki pages
- Use Chat to query your knowledge base
- Browse the Knowledge Graph to explore connections
- Check the Review panel for items needing attention
- Run Lint to maintain wiki health
Project Structure
- purpose.md: Goals, key questions, research scope
- schema.md: Wiki structure rules and page types
- raw/
- sources/: Uploaded documents (immutable)
- assets/: Local images
- wiki/
- index.md: Content catalog
- log.md: Operation history
- overview.md: Global summary (auto-updated)
- entities/: People, organizations, products
- concepts/: Theories, methods, techniques
- sources/: Source summaries
- queries/: Saved chat answers and research
- synthesis/: Cross-source analysis
- comparisons/: Side-by-side comparisons
- .obsidian/: Obsidian vault config (auto-generated)
- .llm-wiki/: App config, chat history, review items
Deep Research
Not part of the original concept, but included as a powerful extension:
- Web search returns full content via Tavily API
- Domain‑specific, LLMed topics and queries generated for deep dives
- Human confirmation dialog before initiating deep research
- Synthesis creates wiki research pages with cross references
- Thinking blocks show during synthesis and auto-scroll to the latest content
- Research results auto-ingest into entities/concepts
- A dedicated Research Panel with dynamic height and streaming progress
License
This project is licensed under the GNU General Public License v3.0. See LICENSE for details.
Credits
The foundational methodology comes from Andrej Karpathy’s llm-wiki pattern, which describes incrementally building and maintaining a personal wiki using LLMs. This project implements those ideas as a robust desktop application with many enhancements and extensions.
Images in this post
- LLM Wiki Logo
- Overview
- LLM Wiki Architecture
- Obsidian Compatibility
- Knowledge Graph
- Louvain Community Detection
- Graph Insights
- Deep Research
- Chrome Extension Web Clipper
If you’re looking for a self-contained, persistent, and increasingly capable personal knowledge base, LLM Wiki offers a comprehensive suite of features designed to keep your knowledge structured, connected, and actionable. It blends robust file handling, advanced graph analytics, and flexible LLM integration into a single desktop experience that grows with your information needs.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/nashsu/llm_wiki
GitHub - nashsu/llm_wiki: LLM Wiki: A Self-Building Personal Knowledge Base
LLM Wiki: A Personal Knowledge Base That Builds Itself...
github - nashsu/llm_wiki