Inferencer
Local model inference runner
Inferencer Cheatsheet
Overview
Inferencer is a local AI model runner for macOS.
Features
| Feature | Description |
|---|---|
| Local inference | Run models on-device |
| Model library | Browse and download models |
| Chat interface | Conversational UI |
| API server | Local API endpoint |
| GPU acceleration | Metal/MPS support |
Supported Formats
| Format | Description |
|---|---|
| GGUF | llama.cpp quantized models |
| MLX | Apple MLX framework |
| CoreML | Apple CoreML models |
Performance Tips
| Tip | Description |
|---|---|
| Quantization | Use Q4_K_M for good balance |
| GPU layers | Maximize GPU offload |
| Context size | Reduce for faster inference |
| Batch size | Increase for throughput |
| Metal | Ensure GPU acceleration enabled |
Inferencer Shortcuts
General
| Shortcut | Action |
|---|---|
| βN | New session |
| βK | Quick model switch |
| β, | Settings |
| βEnter | Send/Generate |
| βL | Clear context |
| β/ | Toggle sidebar |
Model Management
| Shortcut | Action |
|---|---|
| ββ§M | Model browser |
| ββ§D | Download model |
| ββ§I | Model info |
| βR | Reload model |