Inferencer Cheatsheet
Overview

Inferencer is a local AI model runner for macOS.

Features

Feature Description
Local inference Run models on-device
Model library Browse and download models
Chat interface Conversational UI
API server Local API endpoint
GPU acceleration Metal/MPS support

Supported Formats

Format Description
GGUF llama.cpp quantized models
MLX Apple MLX framework
CoreML Apple CoreML models
Performance Tips
Tip Description
Quantization Use Q4_K_M for good balance
GPU layers Maximize GPU offload
Context size Reduce for faster inference
Batch size Increase for throughput
Metal Ensure GPU acceleration enabled