Troubleshooting Guide

Solutions to common problems and issues with LocalAI

This guide helps you diagnose and fix common issues with LocalAI. If you can’t find a solution here, check the FAQ or ask for help on Discord.

Getting Help

Before asking for help, gather this information:

LocalAI version: local-ai --version or check container image tag
System information: OS, CPU, RAM, GPU (if applicable)
Error messages: Full error output with DEBUG=true
Configuration: Relevant model configuration files
Logs: Enable debug mode and capture logs

Common Issues

Model Not Loading

Symptoms: Model appears in list but fails to load or respond

Solutions:

Check backend installation:

  local-ai backends list
local-ai backends install <backend-name>  # if missing

Verify model file:
- Check file exists and is not corrupted
- Verify file format (GGUF recommended)
- Re-download if corrupted
Check memory:
- Ensure sufficient RAM available
- Try smaller quantization (Q4_K_S instead of Q8_0)
- Reduce context_size in configuration
Check logs:
```
  DEBUG=true local-ai
  
```
Look for specific error messages
Verify backend compatibility:
- Check Compatibility Table
- Ensure correct backend specified in model config

Out of Memory Errors

Symptoms: Errors about memory, crashes, or very slow performance

Solutions:

Reduce model size:
- Use smaller quantization (Q2_K, Q4_K_S)
- Use smaller models (1-3B instead of 7B+)

Adjust configuration:

  context_size: 1024  # Reduce from default
gpu_layers: 20      # Reduce GPU layers if using GPU

Free system memory:
- Close other applications
- Reduce number of loaded models
- Use --single-active-backend flag
Check system limits:
```
  # Linux
free -h
ulimit -a
  
```

Slow Performance

Symptoms: Very slow responses, low tokens/second

Solutions:

Check hardware:
- Use SSD instead of HDD for model storage
- Ensure adequate CPU cores
- Enable GPU acceleration if available

Optimize configuration:

  threads: 4  # Match CPU cores
gpu_layers: 35  # Offload to GPU if available
mmap: true  # Enable memory mapping

Check for bottlenecks:

  # Monitor CPU
top

# Monitor GPU (NVIDIA)
nvidia-smi

# Monitor disk I/O
iostat

Disable unnecessary features:
- Set mirostat: 0 if not needed
- Reduce context size
- Use smaller models
Check network: If using remote models, check network latency

GPU Not Working

Symptoms: GPU not detected, no GPU usage, or CUDA errors

Solutions:

Verify GPU drivers:

  # NVIDIA
nvidia-smi

# AMD
rocm-smi

Check Docker GPU access:

  docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

Use correct image:
- NVIDIA: localai/localai:latest-gpu-nvidia-cuda-12
- AMD: localai/localai:latest-gpu-hipblas
- Intel: localai/localai:latest-gpu-intel

Configure GPU layers:

  gpu_layers: 35  # Adjust based on GPU memory
f16: true

Check CUDA version: Ensure CUDA version matches (11.7 vs 12.0)
Check logs: Enable debug mode to see GPU initialization messages

API Errors

Symptoms: 400, 404, 500, or 503 errors from API

Solutions:

404 - Model Not Found:
- Verify model name is correct
- Check model is installed: curl http://localhost:8080/v1/models
- Ensure model file exists in models directory
503 - Service Unavailable:
- Model may not be loaded yet (wait a moment)
- Check if model failed to load (check logs)
- Verify backend is installed
400 - Bad Request:
- Check request format matches API specification
- Verify all required parameters are present
- Check parameter types and values
500 - Internal Server Error:
- Enable debug mode: DEBUG=true
- Check logs for specific error
- Verify model configuration is valid
401 - Unauthorized:
- Check if API key is required
- Verify API key is correct
- Include Authorization header if needed

Installation Issues

Symptoms: Installation fails or LocalAI won’t start

Solutions:

Docker issues:

  # Check Docker is running
docker ps

# Check image exists
docker images | grep localai

# Pull latest image
docker pull localai/localai:latest

Permission issues:

  # Check file permissions
ls -la models/

# Fix permissions if needed
chmod -R 755 models/

Port already in use:

  # Find process using port
lsof -i :8080

# Use different port
docker run -p 8081:8080 ...

Binary not found:
- Verify binary is in PATH
- Check binary has execute permissions
- Reinstall if needed

Backend Issues

Symptoms: Backend fails to install or load

Solutions:

Check backend availability:
```
  local-ai backends list
  
```

Manual installation:

  local-ai backends install <backend-name>

Check network: Backend download requires internet connection
Check disk space: Ensure sufficient space for backend files
Rebuild if needed:
```
  REBUILD=true local-ai
  
```

Configuration Issues

Symptoms: Models not working as expected, wrong behavior

Solutions:

Validate YAML syntax:

  # Check YAML is valid
yamllint model.yaml

Check configuration reference:
- See Model Configuration
- Verify all parameters are correct
Test with minimal config:
- Start with basic configuration
- Add parameters one at a time
Check template files:
- Verify template syntax
- Check template matches model type

Debugging Tips

Enable Debug Mode

  # Environment variable
DEBUG=true local-ai

# Command line flag
local-ai --debug

# Docker
docker run -e DEBUG=true ...

Check Logs

  # Docker logs
docker logs local-ai

# Systemd logs
journalctl -u localai -f

# Direct output
local-ai 2>&1 | tee localai.log

Test API Endpoints

  # Health check
curl http://localhost:8080/healthz

# Readiness check
curl http://localhost:8080/readyz

# List models
curl http://localhost:8080/v1/models

# Test chat
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "test"}]}'

Monitor Resources

  # CPU and memory
htop

# GPU (NVIDIA)
watch -n 1 nvidia-smi

# Disk usage
df -h
du -sh models/

# Network
iftop

Performance Issues

Slow Inference

Check token speed: Look for tokens/second in debug logs
Optimize threads: Match CPU cores
Enable GPU: Use GPU acceleration
Reduce context: Smaller context = faster inference
Use quantization: Q4_K_M is good balance

High Memory Usage

Use smaller models: 1-3B instead of 7B+
Lower quantization: Q2_K uses less memory
Reduce context size: Smaller context = less memory
Disable mmap: Set mmap: false (slower but uses less memory)
Unload unused models: Only load models you’re using

Platform-Specific Issues

macOS

Quarantine warnings: See FAQ
Metal not working: Ensure Xcode is installed
Docker performance: Consider building from source for better performance

Linux

Permission denied: Check file permissions and SELinux
Missing libraries: Install required system libraries
Systemd issues: Check service status and logs

Windows/WSL

Slow model loading: Ensure files are on Linux filesystem
GPU access: May require WSL2 with GPU support
Path issues: Use forward slashes in paths

Getting More Help

If you’ve tried the solutions above and still have issues:

Check GitHub Issues: Search GitHub Issues
Ask on Discord: Join Discord
Create an Issue: Provide all debugging information
Check Documentation: Review relevant documentation sections

Star us on GitHub !

Troubleshooting Guide

Getting Help

Common Issues

Model Not Loading

Out of Memory Errors

Slow Performance

GPU Not Working

API Errors

Installation Issues

Backend Issues

Configuration Issues

Debugging Tips

Enable Debug Mode

Check Logs

Test API Endpoints

Monitor Resources

Performance Issues

Slow Inference

High Memory Usage

Platform-Specific Issues

macOS

Linux

Windows/WSL

Getting More Help

See Also

Star us on GitHub !

Troubleshooting Guide

Getting Help link

Common Issues link

Model Not Loading link

Out of Memory Errors link

Slow Performance link

GPU Not Working link

API Errors link

Installation Issues link

Backend Issues link

Configuration Issues link

Debugging Tips link

Enable Debug Mode link

Check Logs link

Test API Endpoints link

Monitor Resources link

Performance Issues link

Slow Inference link

High Memory Usage link

Platform-Specific Issues link

macOS link

Linux link

Windows/WSL link

Getting More Help link

See Also link

Getting Help

Common Issues

Model Not Loading

Out of Memory Errors

Slow Performance

GPU Not Working

API Errors

Installation Issues

Backend Issues

Configuration Issues

Debugging Tips

Enable Debug Mode

Check Logs

Test API Endpoints

Monitor Resources

Performance Issues

Slow Inference

High Memory Usage

Platform-Specific Issues

macOS

Linux

Windows/WSL

Getting More Help

See Also