Troubleshooting Guide
Solutions to common problems and issues with LocalAI
This guide helps you diagnose and fix common issues with LocalAI. If you can’t find a solution here, check the FAQ or ask for help on Discord.
Getting Help
Before asking for help, gather this information:
- LocalAI version:
local-ai --versionor check container image tag - System information: OS, CPU, RAM, GPU (if applicable)
- Error messages: Full error output with
DEBUG=true - Configuration: Relevant model configuration files
- Logs: Enable debug mode and capture logs
Common Issues
Model Not Loading
Symptoms: Model appears in list but fails to load or respond
Solutions:
Check backend installation:
local-ai backends list local-ai backends install <backend-name> # if missingVerify model file:
- Check file exists and is not corrupted
- Verify file format (GGUF recommended)
- Re-download if corrupted
Check memory:
- Ensure sufficient RAM available
- Try smaller quantization (Q4_K_S instead of Q8_0)
- Reduce
context_sizein configuration
Check logs:
DEBUG=true local-aiLook for specific error messages
Verify backend compatibility:
- Check Compatibility Table
- Ensure correct backend specified in model config
Out of Memory Errors
Symptoms: Errors about memory, crashes, or very slow performance
Solutions:
Reduce model size:
- Use smaller quantization (Q2_K, Q4_K_S)
- Use smaller models (1-3B instead of 7B+)
Adjust configuration:
context_size: 1024 # Reduce from default gpu_layers: 20 # Reduce GPU layers if using GPUFree system memory:
- Close other applications
- Reduce number of loaded models
- Use
--single-active-backendflag
Check system limits:
# Linux free -h ulimit -a
Slow Performance
Symptoms: Very slow responses, low tokens/second
Solutions:
Check hardware:
- Use SSD instead of HDD for model storage
- Ensure adequate CPU cores
- Enable GPU acceleration if available
Optimize configuration:
threads: 4 # Match CPU cores gpu_layers: 35 # Offload to GPU if available mmap: true # Enable memory mappingCheck for bottlenecks:
# Monitor CPU top # Monitor GPU (NVIDIA) nvidia-smi # Monitor disk I/O iostatDisable unnecessary features:
- Set
mirostat: 0if not needed - Reduce context size
- Use smaller models
- Set
Check network: If using remote models, check network latency
GPU Not Working
Symptoms: GPU not detected, no GPU usage, or CUDA errors
Solutions:
Verify GPU drivers:
# NVIDIA nvidia-smi # AMD rocm-smiCheck Docker GPU access:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smiUse correct image:
- NVIDIA:
localai/localai:latest-gpu-nvidia-cuda-12 - AMD:
localai/localai:latest-gpu-hipblas - Intel:
localai/localai:latest-gpu-intel
- NVIDIA:
Configure GPU layers:
gpu_layers: 35 # Adjust based on GPU memory f16: trueCheck CUDA version: Ensure CUDA version matches (11.7 vs 12.0)
Check logs: Enable debug mode to see GPU initialization messages
API Errors
Symptoms: 400, 404, 500, or 503 errors from API
Solutions:
404 - Model Not Found:
- Verify model name is correct
- Check model is installed:
curl http://localhost:8080/v1/models - Ensure model file exists in models directory
503 - Service Unavailable:
- Model may not be loaded yet (wait a moment)
- Check if model failed to load (check logs)
- Verify backend is installed
400 - Bad Request:
- Check request format matches API specification
- Verify all required parameters are present
- Check parameter types and values
500 - Internal Server Error:
- Enable debug mode:
DEBUG=true - Check logs for specific error
- Verify model configuration is valid
- Enable debug mode:
401 - Unauthorized:
- Check if API key is required
- Verify API key is correct
- Include Authorization header if needed
Installation Issues
Symptoms: Installation fails or LocalAI won’t start
Solutions:
Docker issues:
# Check Docker is running docker ps # Check image exists docker images | grep localai # Pull latest image docker pull localai/localai:latestPermission issues:
# Check file permissions ls -la models/ # Fix permissions if needed chmod -R 755 models/Port already in use:
# Find process using port lsof -i :8080 # Use different port docker run -p 8081:8080 ...Binary not found:
- Verify binary is in PATH
- Check binary has execute permissions
- Reinstall if needed
Backend Issues
Symptoms: Backend fails to install or load
Solutions:
Check backend availability:
local-ai backends listManual installation:
local-ai backends install <backend-name>Check network: Backend download requires internet connection
Check disk space: Ensure sufficient space for backend files
Rebuild if needed:
REBUILD=true local-ai
Configuration Issues
Symptoms: Models not working as expected, wrong behavior
Solutions:
Validate YAML syntax:
# Check YAML is valid yamllint model.yamlCheck configuration reference:
- See Model Configuration
- Verify all parameters are correct
Test with minimal config:
- Start with basic configuration
- Add parameters one at a time
Check template files:
- Verify template syntax
- Check template matches model type
Debugging Tips
Enable Debug Mode
# Environment variable
DEBUG=true local-ai
# Command line flag
local-ai --debug
# Docker
docker run -e DEBUG=true ...
Check Logs
# Docker logs
docker logs local-ai
# Systemd logs
journalctl -u localai -f
# Direct output
local-ai 2>&1 | tee localai.log
Test API Endpoints
# Health check
curl http://localhost:8080/healthz
# Readiness check
curl http://localhost:8080/readyz
# List models
curl http://localhost:8080/v1/models
# Test chat
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]}'
Monitor Resources
# CPU and memory
htop
# GPU (NVIDIA)
watch -n 1 nvidia-smi
# Disk usage
df -h
du -sh models/
# Network
iftop
Performance Issues
Slow Inference
- Check token speed: Look for tokens/second in debug logs
- Optimize threads: Match CPU cores
- Enable GPU: Use GPU acceleration
- Reduce context: Smaller context = faster inference
- Use quantization: Q4_K_M is good balance
High Memory Usage
- Use smaller models: 1-3B instead of 7B+
- Lower quantization: Q2_K uses less memory
- Reduce context size: Smaller context = less memory
- Disable mmap: Set
mmap: false(slower but uses less memory) - Unload unused models: Only load models you’re using
Platform-Specific Issues
macOS
- Quarantine warnings: See FAQ
- Metal not working: Ensure Xcode is installed
- Docker performance: Consider building from source for better performance
Linux
- Permission denied: Check file permissions and SELinux
- Missing libraries: Install required system libraries
- Systemd issues: Check service status and logs
Windows/WSL
- Slow model loading: Ensure files are on Linux filesystem
- GPU access: May require WSL2 with GPU support
- Path issues: Use forward slashes in paths
Getting More Help
If you’ve tried the solutions above and still have issues:
- Check GitHub Issues: Search GitHub Issues
- Ask on Discord: Join Discord
- Create an Issue: Provide all debugging information
- Check Documentation: Review relevant documentation sections
See Also
- FAQ - Common questions
- Performance Tuning - Optimize performance
- VRAM Management - GPU memory management
- Model Configuration - Configuration reference
Last updated 17 Nov 2025, 19:34 +0100 .