
graphify run . to instantly transform your project folder into a queryable knowledge graph.graph.json for AI, GRAPH_REPORT.md for auditing, and an Obsidian Wiki for interactive navigation.When you master how to use Graphify, you solve the classic "Context Window Trap" that plagues every developer working with Large Language Models. How to use Graphify is the specific question developers with massive messy projects (code, PDFs, LOOM videos, screenshots) are asking right now. The standard RAG approach usually fails here because it treats documents as semantic blobs rather than structured entities. Graphify answers this by turning any folder into a queryable knowledge graph that excels at reasoning, not just retrieval.
Think about Andrej Karpathy's workflow: he dumped raw folders into an AI and struggled with the noise. Graphify bridges that gap. It moves beyond simple "File System RAG" by parsing the structure of your data before handing it to the LLM.
The problem: You cannot fit a 10 million line repository, plus 50 meetings, and 300 PDF docs into a single prompt. The solution: Snap a folder of data and let Graphify build a logical topology you can query.
Graphify is an open-source CLI tool designed to build persistent knowledge graphs from heterogeneous data sources (code, images, audio, PDFs). It acts as a pre-processing pipeline that runs locally, building a graph where nodes represent entities (classes, functions, concepts) and edges represent relationships (dependencies, calls, mentions).
Unlike traditional Vector Databases, which rely on semantic similarity (embeddings), Graphify relies on strict structural relationships (topology). This allows AI assistants to navigate Logic Graphs rather than just searching for similar text.
"Embeddings lie; graphs don't."
The AI industry obsesses over semantic search, but for code and engineering logic, embeddings are often misleading. Embeddings group "pizza" and "pizzeria" together, but Python's import statement is a logical relationship. Graphify's true superpower is separating deterministic fact extraction (what the code actually does) from probabilistic AI inference (what the AI thinks is implied). By forcing the AI to be explicit about its confidence—tagging edges as EXTRACTED (100% sure) vs INFERRED (low confidence)—you force the AI to stop guessing and start engineering.
To truly understand how to use Graphify effectively, you need to grasp its architecture. It runs in three distinct passes:
This step happens locally, 100% offline, and requires no API key.
EXTRACTED (Confidence: 1.0).If your folder contains media (LOOM videos, Zoom MP4s), Graphify processes them here.
EXTRACTED.Graphify uses Claude subagents to process the "fuzzy" unstructured data.
INFERRED (e.g., "User flow suggests this logic").The workflow is a pipeline optimization designed for production scale:
Input Layer:
Processing Layer (The Graph Engine):
Output Layer:
Here is your actionable workflow to integrate Graphify with your AI coding assistant.
# Install the CLI
pip install graphify
# Run it on your project root
cd /path/to/your/project
graphify run .
After the run finishes, check your project root. You should see three new items:
GRAPH_REPORT.md: Scan this to ensure the graph looks sane.graph.json: Contains the full topology..obsidian/vault/: A new folder for visual graphs.Most modern VS Code editors (like Cursor or Desktop Claude) support Plugins or Virtual File Systems.
"Explain the
createOrderfunction's dependency chain."
graph.json, highlights the specific nodes, and constructs a 300-token summary accurate to the implementation.Start with just a medium-sized codebase. Trying to graph a 100k file open-source repo might take a while, but a 5-10 file domain model will produce an almost magical feedback loop where the AI understands your intent instantly.
| Feature | Graphify (Knowledge Graph) | Standard Vector DB (Pinecone/Weaviate) | Raw File Context |
|---|---|---|---|
| Search Type | Topological / Structural | Semantic / Vector Similarity | File System Directory |
| Context Understanding | High (Logic flows preserved) | Medium (Similar text) | Low (Text dump) |
| Token Usage | Low (Subgraphs) | Medium (Chunks) | High (Full Files) |
| Data Provenance | Explicit Tags (Extracted/Inferred) | Implicit/Probabilistic | None |
| Technical Complexity | Easy (One command) | High (Indexing pipelines) | None |
Verdict: Use Vector DBs for finding "documents that talk about X." Use Graphify to make your "codebase understand X."
EXTRACTED vs. INFERRED tagging system ensures you know exactly what the AI "hallucinated" vs. what the code actually contained.The roadmap for Graphify implies a deeper integration with local model runners (like Ollama). Future updates will likely allow the graph to "reason" in real-time, where the graph acts as a dynamic state machine that updates the LLM as you type, rather than a static snapshot you query once a day.
Q: Is Graphify local only? A: The extraction phase is local. However, for the LLM-driven inference pass (Pass 3), it requires internet access to call an API (currently Claude).
Q: Does it support all programming languages? A: It uses Tree-sitter and supports over 20 languages (Python, JavaScript, Go, Rust, C#, etc.) out of the box.
Q: Can I query this graph from Python?
A: Yes, the graph.json output is standard and can be parsed by any script.
Q: What if the AI infers a wrong relationship?
A: You will see these flagged as INFERRED with a lower confidence score in GRAPH_REPORT.md, allowing you to verify them.
To answer your question on how to use Graphify: you install it, point it at a folder, and watch the "Context Window" problem vanish. This tool shifts the fundamental unit of data collection from "File Content" to "Logic Structure." If you are serious about building production-grade AI applications that understand complex codebases rather than just reading them, the future is structural, not just semantic.
Start your workflow today: pip install graphify && graphify run .
See you in the next one.