Injest

Custom ingestion pipeline for Paperless-ngx. Scans source directories, deduplicates files by content hash, and feeds unique PDFs/images to Paperless for OCR and archival. Also available as Injest.app in /Applications.

Injest Cheatsheet
Command Description
injest Run full ingestion pipeline
injest --status Show processing dashboard (files done, pending, errors)
injest --dry-run Preview what would be processed without copying
injest --duplicates Show duplicate file report with hash groups
injest --add-source ~/path Register a new source folder for scanning
injest --remove-source ~/path Remove a source folder
injest --max-queue 10 Run with larger queue size (default: 5, faster with more)
injest --reset Clear all processing history (requires confirmation)
injest --help Show full help with examples and config paths

Config files:

File Purpose
~/paperless-ngx/docker/docker-compose.yml Docker Compose stack definition
~/paperless-ngx/docker/.env Docker env (API token)
~/paperless-ngx/data/ingest_state.json Ingestion state (resumable progress)
~/paperless-ngx/data/logs/ingest.log Ingestion log

Docker services:

Command Description
docker compose -f ~/paperless-ngx/docker/docker-compose.yml up -d Start all services
docker compose -f ~/paperless-ngx/docker/docker-compose.yml down Stop all services
docker compose -f ~/paperless-ngx/docker/docker-compose.yml logs -f Tail all logs
docker logs paperless --tail 50 Check Paperless logs
open http://192.168.8.180:8100 Paperless web UI

Stack: Redis 7 + Postgres + Paperless-ngx (port 8100)

Note: paperless-ai (port 3000) is decommissioned. AI document search is now via the Life Archive RAG pipeline.