Integration Examples

Learn how to integrate LocalAI with popular frameworks and tools

This tutorial shows you how to integrate LocalAI with popular AI frameworks and tools. LocalAI’s OpenAI-compatible API makes it easy to use as a drop-in replacement.

Prerequisites

LocalAI running and accessible
Basic knowledge of the framework you want to integrate
Python, Node.js, or other runtime as needed

Python Integrations

LangChain

LangChain has built-in support for LocalAI:

  from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

# For chat models
llm = ChatOpenAI(
    openai_api_key="not-needed",
    openai_api_base="http://localhost:8080/v1",
    model_name="gpt-4"
)

response = llm.predict("Hello, how are you?")
print(response)

OpenAI Python SDK

The official OpenAI Python SDK works directly with LocalAI:

  import openai

openai.api_base = "http://localhost:8080/v1"
openai.api_key = "not-needed"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

LangChain with LocalAI Functions

  from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI

llm = OpenAI(
    openai_api_key="not-needed",
    openai_api_base="http://localhost:8080/v1"
)

tools = [
    Tool(
        name="Calculator",
        func=lambda x: eval(x),
        description="Useful for mathematical calculations"
    )
]

agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
result = agent.run("What is 25 * 4?")

JavaScript/TypeScript Integrations

OpenAI Node.js SDK

  import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'not-needed',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
  });

  console.log(completion.choices[0].message.content);
}

main();

LangChain.js

  import { ChatOpenAI } from "langchain/chat_models/openai";

const model = new ChatOpenAI({
  openAIApiKey: "not-needed",
  configuration: {
    baseURL: "http://localhost:8080/v1",
  },
  modelName: "gpt-4",
});

const response = await model.invoke("Hello, how are you?");
console.log(response.content);

Integration with Specific Tools

AutoGPT

AutoGPT can use LocalAI by setting the API base URL:

  export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=not-needed

Then run AutoGPT normally.

Flowise

Flowise supports LocalAI out of the box. In the Flowise UI:

Add a ChatOpenAI node
Set the base URL to http://localhost:8080/v1
Set API key to any value (or leave empty)
Select your model

Continue (VS Code Extension)

Configure Continue to use LocalAI:

  {
  "models": [
    {
      "title": "LocalAI",
      "provider": "openai",
      "model": "gpt-4",
      "apiBase": "http://localhost:8080/v1",
      "apiKey": "not-needed"
    }
  ]
}

AnythingLLM

AnythingLLM has native LocalAI support:

Go to Settings > LLM Preference
Select “LocalAI”
Enter your LocalAI endpoint: http://localhost:8080
Select your model

REST API Examples

cURL

  # Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List models
curl http://localhost:8080/v1/models

# Embeddings
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "Hello world"
  }'

Python Requests

  import requests

response = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

print(response.json())

Advanced Integrations

Custom Wrapper

Create a custom wrapper for your application:

  class LocalAIClient:
    def __init__(self, base_url="http://localhost:8080/v1"):
        self.base_url = base_url
        self.api_key = "not-needed"
    
    def chat(self, messages, model="gpt-4", **kwargs):
        response = requests.post(
            f"{self.base_url}/chat/completions",
            json={
                "model": model,
                "messages": messages,
                **kwargs
            },
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        return response.json()
    
    def embeddings(self, text, model="text-embedding-ada-002"):
        response = requests.post(
            f"{self.base_url}/embeddings",
            json={
                "model": model,
                "input": text
            }
        )
        return response.json()

Streaming Responses

  import requests
import json

def stream_chat(messages, model="gpt-4"):
    response = requests.post(
        "http://localhost:8080/v1/chat/completions",
        json={
            "model": model,
            "messages": messages,
            "stream": True
        },
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            data = json.loads(line.decode('utf-8').replace('data: ', ''))
            if 'choices' in data:
                content = data['choices'][0].get('delta', {}).get('content', '')
                if content:
                    yield content

Common Integration Patterns

Error Handling

  import requests
from requests.exceptions import RequestException

def safe_chat_request(messages, model="gpt-4", retries=3):
    for attempt in range(retries):
        try:
            response = requests.post(
                "http://localhost:8080/v1/chat/completions",
                json={"model": model, "messages": messages},
                timeout=30
            )
            response.raise_for_status()
            return response.json()
        except RequestException as e:
            if attempt == retries - 1:
                raise
            time.sleep(2 ** attempt)  # Exponential backoff

Rate Limiting

  from functools import wraps
import time

def rate_limit(calls_per_second=2):
    min_interval = 1.0 / calls_per_second
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            left_to_wait = min_interval - elapsed
            if left_to_wait > 0:
                time.sleep(left_to_wait)
            ret = func(*args, **kwargs)
            last_called[0] = time.time()
            return ret
        return wrapper
    return decorator

@rate_limit(calls_per_second=2)
def chat_request(messages):
    # Your chat request here
    pass

Testing Integrations

Unit Tests

  import unittest
from unittest.mock import patch, Mock
import requests

class TestLocalAIIntegration(unittest.TestCase):
    @patch('requests.post')
    def test_chat_completion(self, mock_post):
        mock_response = Mock()
        mock_response.json.return_value = {
            "choices": [{
                "message": {"content": "Hello!"}
            }]
        }
        mock_post.return_value = mock_response
        
        # Your integration code here
        # Assertions

What’s Next?

API Reference - Complete API documentation
Integrations - List of compatible projects
Examples Repository - More integration examples

Star us on GitHub !

Integration Examples

Prerequisites

Python Integrations

LangChain

OpenAI Python SDK

LangChain with LocalAI Functions

JavaScript/TypeScript Integrations

OpenAI Node.js SDK

LangChain.js

Integration with Specific Tools

AutoGPT

Flowise

Continue (VS Code Extension)

AnythingLLM

REST API Examples

cURL

Python Requests

Advanced Integrations

Custom Wrapper

Streaming Responses

Common Integration Patterns

Error Handling

Rate Limiting

Testing Integrations

Unit Tests

What’s Next?

See Also

Star us on GitHub !

Integration Examples

Prerequisites link

Python Integrations link

LangChain link

OpenAI Python SDK link

LangChain with LocalAI Functions link

JavaScript/TypeScript Integrations link

OpenAI Node.js SDK link

LangChain.js link

Integration with Specific Tools link

AutoGPT link

Flowise link

Continue (VS Code Extension) link

AnythingLLM link

REST API Examples link

cURL link

Python Requests link

Advanced Integrations link

Custom Wrapper link

Streaming Responses link

Common Integration Patterns link

Error Handling link

Rate Limiting link

Testing Integrations link

Unit Tests link

What’s Next? link

See Also link

Prerequisites

Python Integrations

LangChain

OpenAI Python SDK

LangChain with LocalAI Functions

JavaScript/TypeScript Integrations

OpenAI Node.js SDK

LangChain.js

Integration with Specific Tools

AutoGPT

Flowise

Continue (VS Code Extension)

AnythingLLM

REST API Examples

cURL

Python Requests

Advanced Integrations

Custom Wrapper

Streaming Responses

Common Integration Patterns

Error Handling

Rate Limiting

Testing Integrations

Unit Tests

What’s Next?

See Also