[Local AI with Ollama] Using Python to Call Ollama REST API

Ollama provides a powerful REST API that allows you to interact with local language models programmatically from any language, including Python. In this guide, you'll learn how to use Python to call the Ollama REST API for text generation and chat, including how to process streaming responses.

Why Use the REST API?

While Ollama's CLI and third-party GUIs are great for quick experimentation, building real applications often requires backend integration. The REST API lets you:

- Send prompts and receive completions from local models

- Stream long responses for real-time interaction

- Integrate with your own Python code, web apps, or data pipelines

Prerequisites

- Ollama is installed and running locally

- Make sure you have Ollama running (ollama serve or by launching the desktop app).

- Python environment set up

- Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

- Install dependencies : You'll need the requests library:

pip install requests

Making Basic API Calls with Python

You can use Python's requests library to interact with the REST API. The main endpoint for text generation is:

POST http://127.0.0.1:11434/api/generate

Example: Generate a Description about the Richness of the Ocean

Create a file (for example, ollama_generate.py) with the following code:

import requests
import json

url = "http://127.0.0.1:11434/api/generate"
data = {
    "model": "llama3.2",  # Replace with your model name as installed in Ollama
    "prompt": "Describe the richness and diversity of the ocean."
}

response = requests.post(url, json=data, stream=True)
if response.status_code == 200:
    for line in response.iter_lines():
        if line:
            output = json.loads(line.decode("utf-8"))
            print(output.get("response", ""), end="", flush=True)
else:
    print("Error:", response.status_code)

Explanation

- The JSON payload specifies which model to use and what prompt to send.

- The stream=True parameter enables streaming output for longer responses.

- The code reads each line of the streamed response and prints the generated text as it arrives.

- Each line in the response is a JSON object containing a "response" field with generated text.

- If the response code is not 200, an error message is printed.

Running Your Python Script

After saving your file (e.g., as ollama_generate.py), run it in your terminal:

python ollama_generate.py

You should see the model's output about the richness and diversity of the ocean streamed live in your console.

Customizing Your Requests

You can customize the payload to adjust model behavior using parameters such as system instructions and options.

Example: Ask about tulip flowers, require a concise and informative answer, and set temperature in options

import requests
import json

url = "http://127.0.0.1:11434/api/generate"
data = {
    "model": "llama3.2",
    "prompt": "Tell me about tulip flowers.",
    "system": "Your answer must be concise and informative.",
    "options": {
        "temperature": 0.7
    }
}

response = requests.post(url, json=data, stream=True)
if response.status_code == 200:
    for line in response.iter_lines():
        if line:
            output = json.loads(line.decode("utf-8"))
            print(output.get("response", ""), end="", flush=True)
else:
    print("Error:", response.status_code)

Explanation:

- The "system" field instructs the model to give a concise and informative answer.

- The "prompt" asks specifically about tulip flowers.

- The "options" dictionary allows you to set generation parameters, such as "temperature", for more creative or focused responses.

In the next article, you'll learn how to use the official Ollama Python library for even more convenient and Pythonic model interaction.

Ubuntu

Fedora

CentOS

Debian

Rocky Linux

DevOps

Database

Other