Inferencer Local model inference runner
Inferencer Cheatsheet
Overview

Inferencer is a local AI model runner for macOS.

Features

Feature Description
Local inference Run models on-device
Model library Browse and download models
Chat interface Conversational UI
API server Local API endpoint
GPU acceleration Metal/MPS support

Supported Formats

Format Description
GGUF llama.cpp quantized models
MLX Apple MLX framework
CoreML Apple CoreML models
Performance Tips
Tip Description
Quantization Use Q4_K_M for good balance
GPU layers Maximize GPU offload
Context size Reduce for faster inference
Batch size Increase for throughput
Metal Ensure GPU acceleration enabled
Inferencer Shortcuts
General
Shortcut Action
⌘N New session
⌘K Quick model switch
⌘, Settings
⌘Enter Send/Generate
⌘L Clear context
⌘/ Toggle sidebar
Model Management
Shortcut Action
βŒ˜β‡§M Model browser
βŒ˜β‡§D Download model
βŒ˜β‡§I Model info
⌘R Reload model