Brief IA

Gemma 4 and Ollama: Turning a Local LLM into a Research Agent

🛠️ AI Tools·Tom Levy·

Gemma 4 and Ollama: Turning a Local LLM into a Research Agent

Gemma 4 and Ollama: Turning a Local LLM into a Research Agent
Key Takeaways
1Gemma 4, Ollama, and OpenAI Agents SDK enable the creation of a local search agent.
2Installing Ollama and Gemma 4 E4B is essential for setting up the agent.
3Tavily MCP is used to integrate web search capabilities into the agent.
💡Why it mattersThe creation of autonomous local agents could revolutionize access to information without relying on cloud infrastructures.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

From Local LLM to User Tool Agent

LLM Applications

Using a local LLM is an intriguing prospect, but after a few interactions, one might wonder what additional features can be added. This article explores the transformation of a local LLM into an agent capable of using external tools. We will use:

  • The Gemma 4 model (with device-adapted variants) as the local LLM
  • Ollama to serve the local LLM
  • OpenAI Agents SDK for agent execution
  • Tavily as an example of an external tool

The goal is to build a mini deep research agent capable of searching the web, gathering evidence, and synthesizing a response with citations based on a user question. By the end of this article, you will have a functional local research agent and a reusable implementation model to transform a local model into a local AI agent.

Setting Up the Local Agent Stack

Before coding, four elements need to be prepared: Ollama, Gemma 4 (specifically the Gemma 4 E4B model), OpenAI Agents SDK, and Tavily MCP.

First, install Ollama. On Windows, download the installer from the official Ollama website or use winget in PowerShell:

winget install Ollama.Ollama

On Linux, install Ollama with:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation with:

ollama --version

On Windows, launch Ollama from the Start menu to activate the local API endpoint.

Next, retrieve the local model. Use the Gemma 4 E4B variant:

ollama pull gemma4:e4b

The E4B model is well-suited for local agent workflows. If your machine is more limited, try the lighter E2B variant:

ollama pull gemma4:e2b

Then, install the agent runtime library with OpenAI Agents SDK:

pip install openai-agents

Also, install the OpenAI-compatible client:

pip install openai

Note that the client will be pointed to the local Ollama endpoint, so we will not be sending requests to OpenAI.

Finally, set up a Tavily MCP endpoint. If you haven't used it before, Tavily is a search API for LLM applications. Create a Tavily account and obtain an API key. Generate an MCP link:

https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>

Tavily is used here as a convenient MCP tool, but other MCP-compatible tools can be integrated.

Configuring the Local Research Agent

With OpenAI Agents SDK, here is the final Agent object to compose:

from agents import Agent
name="Local Research Agent",
instructions=RESEARCH_AGENT_INSTRUCTIONS,
mcp_servers=[tavily_server],
mcp_config={"include_server_in_tool_names": True},

The Model

from openai import AsyncOpenAI
from agents import OpenAIChatCompletionsModel
MODEL_NAME = "gemma4:e4b"
OLLAMA_BASE_URL = "http://localhost:11434/v1"
client = AsyncOpenAI(
api_key="ollama",
base_url=OLLAMA_BASE_URL,
model = OpenAIChatCompletionsModel(
model=MODEL_NAME,
openai_client=client,

Create a client pointing to the local OpenAI-compatible endpoint of Ollama. Use OpenAIChatCompletionsModel to wrap the Gemma model in a model object, allowing the agent SDK to use this model in the agent loop.

The Instruction

Define the instruction for the agent with the desired research behavior:

from datetime import datetime
CURRENT_DATE = datetime.now().strftime("%B %d, %Y")
RESEARCH_AGENT_INSTRUCTIONS = f"""
You are a concise research assistant.
Answer the user's question by turning it into a small web research task.
Use the current date when interpreting time-sensitive questions: {CURRENT_DATE}.
[Research Behavior]
Start with a targeted search query.
For recommendation or comparison questions, complete this search loop before responding:
first identify the main options, then search for comparative context, and then synthesize a recommendation.
Use follow-up searches when initial results are insufficient, contradictory, or only cover part of the question.
Prefer relevant and credible sources, and track which source supports each important claim.
Before responding, check if the gathered evidence is sufficient to support the conclusion.
[Expected Output]
First provide a direct answer, then briefly explain the evidence that supports it.
Include links to sources for key factual claims.
Do not rely on memory for facts that may have changed.
Do not invent missing details.
Keep the response concise.
"""

Equip the agent with the web search tool. Use the Tavily search engine via MCP:

from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp
TAVILY_MCP_URL = "YOUR_TAVILY_MCP_URL"
async with MCPServerStreamableHttp(
params={"url": TAVILY_MCP_URL},
) as tavily_server:
tools = await tavily_server.list_tools()
print("Available Tavily tools:")
for tool in tools:
description = (tool.description or "").replace("\n", " ")
print(f"- {tool.name}: {description[:120]}")
name="Local Research Agent",
instructions=RESEARCH_AGENT_INSTRUCTIONS,
mcp_servers=[tavily_server],
mcp_config={"include_server_in_tool_names": True},
result = await Runner.run(agent, RESEARCH_QUESTION, max_turns=MAX_TURNS)

This code performs three actions:

  • It opens a connection to the Tavily MCP server with async with MCPServerStreamableHttp(...) as tavily_server:. Once connected, Tavily exposes its available tools to the agent SDK.

  • Creates the Agent object within the MCP context. Note that mcp_servers=[tavily_server] attaches Tavily's MCP tools to the agent.

  • Executes the agent with result = await Runner.run(agent, RESEARCH_QUESTION, max_turns=MAX_TURNS). The context manager is crucial as the MCP connection is only active within the async with block.

mcp_config={"include_server_in_tool_names": True} is mainly for readability in the trace. Without this, the tool name would appear only as tavily_search. With it, the tool name will appear as mcp_tavily__tavily_search. This makes it clearer that the tool call comes from the Tavily MCP server.

Running a Research Question

Now that the agent is configured, let's test it with a concrete question:

"What World Cup match on June 23, 2026, had the most significant stakes in the group stage, and why?"

To inspect what happened, print a compact trace:

def compact(value: object, limit: int = 220) -> str:
text = str(value).replace("\n", " ")
return text if len(text) <= limit else text[:limit] + "..."
for step, item in enumerate(result.new_items, start=1):
raw_item = getattr(item, "raw_item", None)
raw_type = getattr(raw_item, "type", "")
raw_name = getattr(raw_item, "name", "")
raw_output = getattr(raw_item, "output", "")
f"{step:02d} | {type(item).__name__} | "
f"{raw_type or raw_name} | {compact(raw_output or raw_item)}"

In my run, the trace looked like this:

01 | ToolCallItem | function_call | ResponseFunctionToolCall(arguments='{"query":"World Cup 2026 group stage matches June 23, 2026 stakes"}', name='mcp_tavily__tavily_search', ...)
02 | ToolCallOutputItem |  | {'call_id': ..., 'output': ...}
03 | MessageOutputItem | message | ResponseOutputMessage(... final response ...)

This allows you to see the agentic behavior directly. In this execution, the local Gemma model decided to call the Tavily search tool, the agent SDK executed this tool call, and returned the results to the model. Then, the model produced the final response.

To see the final response, print:

print(result.final_output)

The agent used a single search turn to arrive at this conclusion.

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.