Second Brain Workflow

Paperless-NGX ↗ Life Archive API ↗ Life Archive KG ↗ Tana ↗ Paperless Docs ↗

The personal knowledge pipeline — document ingestion via Paperless-NGX, semantic search and knowledge graph via the Life Archive RAG system, and structured knowledge management in Tana.

Last updated: March 2026. Previous version referenced old infrastructure (TrueNAS, Proxmox at .230, paperless-ai) — all decommissioned.

Architecture Overview

Source documents (PDFs, email attachments, Evernote exports, magazines)
    ↓
Paperless-NGX (Mac Studio :8100)
    OCR, full-text index, tagging, archival
    ↓
Life Archive RAG Pipeline (Mac Studio :8900)
    gte-Qwen2-7B embeddings → LanceDB vectors → knowledge graph
    Multi-strategy retrieval: dense + SPLADE + QA pairs + KG + HyDE
    ↓
Query interfaces
    Claude MCP tools (life_archive_search, entity_lookup, etc.)
    HTTP API (:8900)  ·  KG web explorer (:1313/kg/)
    ↓
Tana — structured knowledge workspace
    394+ contacts, 85+ plant species, homelab docs, tasks

Paperless-NGX

Runs on Mac Studio via Docker at http://192.168.8.180:8100.

Start/stop:

cd ~/paperless-ngx/docker
docker compose up -d
docker compose down

Key volumes:

Host Path	Purpose
`~/paperless-ngx/data/`	SQLite DB and search index
`~/paperless-ngx/media/`	Stored documents
`~/paperless-ngx/consume/`	Drop files here to ingest (polls every 10s)
`~/paperless-ngx/export/`	Bulk export output

Ingest a file: Drop it in ~/paperless-ngx/consume/ — Paperless OCRs, tags, and indexes automatically within ~10 seconds.

Paperless API:

curl -H "Authorization: Token 9838fafecb452b514ee0cfcc84ce42df718d4984" \
  http://localhost:8100/api/documents/

Note: paperless-ai container (AI auto-tagger) is stopped — was causing memory pressure. Manual tagging or tag rules handle classification instead.

Life Archive

See the full Life Archive page for complete documentation. Summary:

Metric	Value
Total documents	~74K in LanceDB
Paragraphs indexed	~2.69M
Knowledge graph entities	~276K
Sources	Evernote, emails, magazines, Tana nodes, Paperless docs
Embedding model	gte-Qwen2-7B on Apple MPS (port 1235)
Query API	FastAPI at port 8900
MCP server	Streamable HTTP at port 8901

Check service status:

launchctl list | grep beedifferent

Quick search:

curl -X POST http://localhost:8900/search \
  -H "Content-Type: application/json" \
  -d '{"query": "your question here"}'

Evernote Source Material

Evernote exports (155 notebooks) were converted using Yarle and are staged at ~/Sync/ED/life_archive/:

Directory	Contents
`Evernote/`	Source ENEX exports
`EmailAttachments/`	~9,490 extracted email attachments

Yarle configs:

~/paperless-ngx/yarle_config_tana.json — Tana Internal Format output
~/paperless-ngx/yarle_config_paperless.json — HTML/MD for Paperless ingestion

Both passes were completed. Evernote notes are fully indexed in both Paperless and the Life Archive.

Tana

Tana is the structured knowledge layer — everything that needs relationships, fields, and queries rather than just search.

Two workspaces:

Main / BeeDifferent — contacts (~394), general knowledge, tasks
Brownsville — 93-acre property management: 85+ plant species, 9 habitat zones, 10 custom supertags, ecological tracking

MCP integration: tana-local MCP server connects Claude directly to both workspaces. See MCP Servers.

Key supertags: Contact, Place, Thing, Event, Activity, Resource, Concept, Plant (55 fields), Habitat Zone, Observation.

Common Tasks

Ingest a document into Paperless:

cp /path/to/document.pdf ~/paperless-ngx/consume/

Search the Life Archive: Ask Claude directly — life_archive_search MCP tool is connected. Or use the API at http://192.168.8.180:8900/docs.

Check Life Archive services:

launchctl list | grep beedifferent
curl http://localhost:8900/health
curl http://localhost:8901/mcp

Restart embed server:

launchctl kickstart -k gui/$(id -u)/com.beedifferent.embed-server

View logs:

tail -f ~/Sync/ED/life_archive/http_api.stdout.log
tail -f ~/Sync/ED/life_archive/http_api.stderr.log