Structured Outputs is finally available on Ollama

Lets give Structured Outputs on Ollama a run


VS Code

Ollama finally supports structured outputs!!! This has to be one of the most exciting features as open models continue to get to parity with the flagship model features. But what is structured outputs you may ask? It is a feature that OpenAI first released back in August 2024 and its main purpose is to generate structured data from unstructured inputs which is one of the primary usecases when building an AI applicaiton.

Before then, there was JSON mode, where you ask the LLM to return JSON - the main problem was it would only return properly formatted JSON only about 40% of the time, which was not realiable. Structured Ouput returned properly formatted JSON 100% of the time.

As AI is now extending to end-user devices i.e. desktop, mobiles and tablets. The versions that run in the cloud require lots of horsepower to run, thus the solution for devices is currently Small Language Models (SLMs). As much as I appreciate out of the box solutions like Apple Intelligence or Microsoft Copilot, I also like to tinker with my own solutions where I have full autonomy and control. As I dont have an fully decked out MacBook Pro with an M4 Max with infinite performance cores, I usally need to be modest on what I can run on my machine which is typically a 3B -6B model on Ollama, utiilizing 2GB - 4GB in storage. I currently have been running Microsofts Phi 3.5. and Meta's Llama3.2.

One of the blockers I had when building my local applications was that I could not reliably get JSON consistently. I tried using an agentic model so one can check the JSON and reprompt, but its easy to get in a loop, burning through precious tokens and still not end up with JSON and thus had no solid solution - and thus my excitement around the annoucement of Structured Outputs for Ollama.

I decided to do some basic testing using the example Ollama provided. I have been working with Structured Outputs for a while now and know that it is not a silver bullet. A flagship model will always make it easier to get consistent outputs, but a smaller model needs to be paired with a solid prompt. I dont want this to be prompt engineering discussion, but you need to provide clear instructions almost as if you were defining the task to someone who had never done it before.

Below is my code to respond to the question "Tell me about Australia", iterating ten times - I was looking to get a response with the structured outputs of name of coutry, capital city and language(s) spoken in JSON format.

from ollama import chat
from pydantic import BaseModel
 
 
class Country(BaseModel):
    name: str
    capital: str
    languages: list[str]
 
 
prompt1 = "Tell me about Australia"
 
results_file1 = open("ollama_results_prompt1.txt", "w")
 
for i in range(10):
    response = chat(
        messages=[{"role": "user", "content": prompt1}],
        model="phi3.5",
        format=Country.model_json_schema(),
    )
 
    country = Country.model_validate_json(response.message.content)
 
    results_file1.write(country.model_dump_json() + "\n")
 

And this was the results that I got:

{"name":"Commonwealth of Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia (Aboriginal names for various sites are also recognized by government agencies.)","capital":"Australia is a sovereign country comprising the mainland, various offshore islands and territories. It'sin geographical position as both a continent (the smallest on Earth) surrounded by oceans makes it one of only two nations—alongside New Zealand—to occupy an entire oceanic landmass.","languages":["English is the de facto national language and holds official status at federal level.","Indigenous Australian English includes a number of accents, with regional variations across different parts of Australia."]}
{"name":"Australia is often referred to as 'The Land Down Under,' symbolizing its position in the Southern Hemisphere, specifically below other countries on maps when oriented correctly. With a diverse range of environments from urban centers like Sydney and Melbourne with iconic skyline views, national parks such as Kakadu National Park featuring ancient rock formations and rich biodiversity, to vast outback landscapes where the rugged terrain meets indigenous cultures preserved through millennia.  'Uniquely Australian' experiences are offered in its varied cuisine that fuses influences from Europe, Asia, and Indigenous ingredienses like kangaroo or emu meat; arts including contemporary Aboriginal art which often tells ancestral stories using symbols and dot painting technique on canvases. The country is known for having a high standard of living with health care provided by Medicare—a publicly funded universal insurance scheme covering all citizens regardless their employment status nor income level.","capital":"Australia is a sovereign country comprising the mainland of the Australian continent, the island of Tasmania (an enclave in the Tasman Sea), and numerous smaller islands. It's located between the Indian Ocean to its north and Pacific Ocean westward.","languages":["English"]}
{"name":"Consequently named after the continent Australia itself – originally described by Dutch explorers in 1606 as 'New Holland'. In indigenous languages, there are many different names for various places across its expansive territory.","capital":"Australia is a sovereign country comprising the mainland of the Australian continent, the island of Tasmania (including Ocean Island of Cocos Islands), and ongoing discussions regarding its external territories. Situated in Oceania between Indonesia to the northwest and New Zealand southeast across 16,050 kilometers of ocean, it is a vast nation with diverse landscapes ranging from dense urban centers like Sydney and Melbourne to remote outback areas.","languages":["English","Australian Aboriginal English"]}
{"name":"Commonwealth of Australia","capital":"Australia is a sovereign country comprising the mainland, various offshore islands and territories located in Oceania'thy on the continent of Australia. It shares its land borders with Indonesian island Papua to the northwest as well as New Guinea.","languages":["English","Australian English"]}
{"name":"The name 'Australia' comes from Latin and means ‘the opposite’ or anti-something, commonly referring to Asia; it was coined by early European explorers who saw the continent as being on another side of their world compared with mainland Europe.","capital":"Australia, often referred to as 'The Land Down Under,' is a sovereign country and continent surrounded by the Indian Ocean. It's famous for its diverse range of climates from tropical in the north with rainforests along the northeast coastline near Papua New Guinea down through central savannah regions into temperate areas further south where cities like Sydney, Melbourne, Brisbendo (the capital), and Adelaide are located. Home to iconic wildlife such as kangaroos, koalas, platypuses, echidnas, the Tasmanian devil amongst others; Australia is unique among other countries for its wide array of native species that cannot be found anywhere else in the world.","languages":["English","Aboriginal languages (approximately 250)","Australian English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Commonwealth of Australia","capital":"Canberra","languages":["English"]}
{"name":"Commonwealth of Australia, officially named 'The Commonwealth of Australia', is a federation comprising six states (New South Wales, Queensland, South Australia, Tasmania, Victoria and Western Australia) plus two territories ('Northern Territory' & the Australian Capital Territory). It was established as an independent country on January 1, 1dictional insofar that it is not a colony of any other nation. The national anthem 'Advance Australia Fair', adopted since Federation day (July), remains popular to this date and symbolizes unity amongst its diverse population which speaks over fifty different languages including English.","capital":"Australia is a sovereign country comprising the mainland of the Australian continent, the island of Tasmania (an enclave state), as well as numerous smaller islands. It's located in Oceania between the Indian and Pacific oceans.","languages":["English","Aboriginal English varieties"]}
{"name":"Australia is the world's sixth-largest country by total area, with a diverse landscape that includes deserts like the Great Victoria and Simpson basins; tropical rainforests in its northern regions; extensive agricultural lands along rivers such as Murray–Darling Basin; mountain ranges including the Australian Alps (Mount Kosciuszko being the highest peak); coral reef systems around Queensland, most notably The Great Barrier Reef – one of Earth's largest and oldest living structures. Politically speaking,","capital":"Australia is a sovereign country comprising the mainland of the Australian continent, the island of Tasmania (a self-governing state), and numerous smaller islands. With a population of over 25 million people as of my knowledge cutoff in early 2 extrentay,","languages":["English"]}

This exposes a weakness of Structured Ouput when used with a "weak" prompt - while the data is properly formatted JSON, the content is not what was expected. If this was part of a larger applicaiton and I was passing these as variables and it was only expecting a capital city, you can see how the fields above would break the application e.g. if the next task was to purchase an airline ticket for the city in the city field.

The above results were from using Microsofts Phi 3.5. which is a 3.8B parameter model. Lets see how this compared with Metas Llama 3.2 which is a 3B parameter model (both of these running on Ollama on my locall machine)

{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English","Mandarin","Cantonese","Vietnamese"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English","Indigenous Australian languages"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}

Now that is getting better, but we are still seeing inconsistencies in the "languages". How does this compare with a flagship model running in the cloud - I run this using gpt-4o running on Azure and got the results below:

{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English","Other Indigenous languages"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}

Almost a perfect, but goes to show how a poor prompt can limit your results. Thus I tightened up the prompt to be more in line with what you would expect a proper prompt to look like:

prompt2 = """I am looking to get the following information about Australia: 
                The Name of the Country
                The capital city of the country and
                The national language or languages.
            You should return the information in JSON format with an example provided below:
                {"name":"name of the country", "capital":"name of the capital city", "language":"national language spoken in the country"}
            DO NOT make up any informaiton. If you do not know, say "I do not know"
            It is very important that all data should be in JSON format as any other format will not be acceptable"""

With the prompt tighted up, lets see how they LLM's now faired. Lets start with gpt-4o running in Azure:

{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}

No surprise there - it would be dissapointing if it was otherwise, but well done gpt-4o. Lets see how the local models faired on my machine starting with Microsoft Phi 3.5:

{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Australia","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Australia City","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}

Close, but no cigar - it even included some hallucination with the capital as "Australia City". Finally lets see how Llama 3.2 faired:

{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}
{"name":"Australia","capital":"Canberra","languages":["English"]}

Now look at that - brilliantly done. But this is not really a "production ready" prompt - there are promot engineering principles such as using XML tags, evaluation criteria, dividing complex tasks and task alignment, word choice etc which can tighten your prompt.

Additionally I would run this more than 10 times and use an evaluation and monitoring framework, to ensure the prompt works during deployment and delivers the expected results.

To summarize I am very excited Structured Outputs is now in Ollama, but always remember this needs to be paired with a well engineered prompt and and the selection of the correct model for the task.