The personal knowledge pipeline โ document ingestion via Paperless-NGX, semantic search and knowledge graph via the Life Archive RAG system, and structured knowledge management in Tana.
Last updated: March 2026. Previous version referenced old infrastructure (TrueNAS, Proxmox at .230, paperless-ai) โ all decommissioned.
Source documents (PDFs, email attachments, Evernote exports, magazines)
โ
Paperless-NGX (Mac Studio :8100)
OCR, full-text index, tagging, archival
โ
Life Archive RAG Pipeline (Mac Studio :8900)
gte-Qwen2-7B embeddings โ LanceDB vectors โ knowledge graph
Multi-strategy retrieval: dense + SPLADE + QA pairs + KG + HyDE
โ
Query interfaces
Claude MCP tools (life_archive_search, entity_lookup, etc.)
HTTP API (:8900) ยท KG web explorer (:1313/kg/)
โ
Tana โ structured knowledge workspace
394+ contacts, 85+ plant species, homelab docs, tasks
Runs on Mac Studio via Docker at http://192.168.8.180:8100.
Start/stop:
cd ~/paperless-ngx/docker
docker compose up -d
docker compose down
Key volumes:
| Host Path | Purpose |
|---|---|
~/paperless-ngx/data/ |
SQLite DB and search index |
~/paperless-ngx/media/ |
Stored documents |
~/paperless-ngx/consume/ |
Drop files here to ingest (polls every 10s) |
~/paperless-ngx/export/ |
Bulk export output |
Ingest a file: Drop it in ~/paperless-ngx/consume/ โ Paperless OCRs, tags, and indexes automatically within ~10 seconds.
Paperless API:
curl -H "Authorization: Token 9838fafecb452b514ee0cfcc84ce42df718d4984" \
http://localhost:8100/api/documents/
Note: paperless-ai container (AI auto-tagger) is stopped โ was causing memory pressure. Manual tagging or tag rules handle classification instead.
See the full Life Archive page for complete documentation. Summary:
| Metric | Value |
|---|---|
| Total documents | ~74K in LanceDB |
| Paragraphs indexed | ~2.69M |
| Knowledge graph entities | ~276K |
| Sources | Evernote, emails, magazines, Tana nodes, Paperless docs |
| Embedding model | gte-Qwen2-7B on Apple MPS (port 1235) |
| Query API | FastAPI at port 8900 |
| MCP server | Streamable HTTP at port 8901 |
Check service status:
launchctl list | grep beedifferent
Quick search:
curl -X POST http://localhost:8900/search \
-H "Content-Type: application/json" \
-d '{"query": "your question here"}'
Evernote exports (155 notebooks) were converted using Yarle and are staged at ~/Sync/ED/life_archive/:
| Directory | Contents |
|---|---|
Evernote/ |
Source ENEX exports |
EmailAttachments/ |
~9,490 extracted email attachments |
Yarle configs:
~/paperless-ngx/yarle_config_tana.jsonโ Tana Internal Format output~/paperless-ngx/yarle_config_paperless.jsonโ HTML/MD for Paperless ingestion
Both passes were completed. Evernote notes are fully indexed in both Paperless and the Life Archive.
Tana is the structured knowledge layer โ everything that needs relationships, fields, and queries rather than just search.
Two workspaces:
- Main / BeeDifferent โ contacts (~394), general knowledge, tasks
- Brownsville โ 93-acre property management: 85+ plant species, 9 habitat zones, 10 custom supertags, ecological tracking
MCP integration: tana-local MCP server connects Claude directly to both workspaces. See MCP Servers.
Key supertags: Contact, Place, Thing, Event, Activity, Resource, Concept, Plant (55 fields), Habitat Zone, Observation.
Ingest a document into Paperless:
cp /path/to/document.pdf ~/paperless-ngx/consume/
Search the Life Archive:
Ask Claude directly โ life_archive_search MCP tool is connected. Or use the API at http://192.168.8.180:8900/docs.
Check Life Archive services:
launchctl list | grep beedifferent
curl http://localhost:8900/health
curl http://localhost:8901/mcp
Restart embed server:
launchctl kickstart -k gui/$(id -u)/com.beedifferent.embed-server
View logs:
tail -f ~/Sync/ED/life_archive/http_api.stdout.log
tail -f ~/Sync/ED/life_archive/http_api.stderr.log