Not long ago, when we wanted to get OpenAI to return JSON data, we had to use various prompt engineering techniques to get the desired JSON data in the output. Then we had to parse the AI response text and typically extract the JSON data that was enclosed within ```. This was burdensome and prone to errors. But most importantly, there was no certainty that the necessary JSON elements would be there.
Luckily for us, OpenAI introduced a feature called “structured output” that offers two main benefits:
The JSON output from the model is coherent with a JSON schema we supply at runtime
The output contains only JSON
There are some key benefits:
Processing the results is easier and more predictable: We can now parse the results programmatically with certainty, as the model will not invent arbitrary JSON properties.
Prompt design is simpler: It’s no longer necessary to do reinforcement in the prompt to achieve the desired JSON output.
In this blog, we will explore a simple example of how to use structured output. We will implement a simple text translator from any language to English, and provide CURL commands to execute the OpenAI request in a command prompt (you will only need to specify your API key).
Let’s start with a simple example where the output is plain text (unstructured—no JSON response). This request has:
System prompt: “Your role is to translate to English.”
User prompt (the text to translate): “bonjour, comment ca va!”
Here is a CURL command:
curl https://api.openai.com/v1/chat/completions ^
-H "Content-Type: application/json" ^
-H "Authorization: Bearer YOUR API KEY HERE" ^
-d "{\"model\": \"gpt-4o-mini\", \"messages\": [{\"role\": \"system\",\"content\": \"Your role is to translate to English.\" },{\"role\": \"user\",\"content\": \"bonjour, comment ca va!\"}]}"
Response:
We will get the following assistant message: “content”: “Hello, how are you!”
Now, let’s instruct OpenAI to return the data in a structured way by using JSON and applying a specific schema. We want the answer to contain the following elements:
Here is the corresponding JSON schema:
{
"type": "object",
"properties": {
"original_text": {
"type": "string",
"description": "The original text that needs to be translated."
},
"translated_text": {
"type": "string",
"description": "The translated text in the target language."
}
},
"required": [
"original_text",
"translated_text"
],
"additionalProperties": false
}
The JSON schema is inserted in the OpenAI request in the following JSON fragment:
"response_format": {“type”: “json_schema”, “json_schema”: “Insert the schema here” }
Here is the full CURL command:
curl https://api.openai.com/v1/chat/completions ^
-H "Content-Type: application/json" ^
-H "Authorization: Bearer YOUR API KEY" ^
-d "{\"model\": \"gpt-4o-mini\",\"response_format\": {\"type\": \"json_schema\", \"json_schema\": {\"strict\": true, \"name\": \"example1\", \"schema\": {\"type\": \"object\",\"properties\": {\"original_text\": {\"type\": \"string\",\"description\": \"The original text that needs to be translated.\"},\"translated_text\": {\"type\": \"string\",\"description\": \"The translated text in the target language.\"}},\"required\": [\"original_text\",\"translated_text\"],\"additionalProperties\": false}} }, \"messages\": [{\"role\": \"system\",\"content\": \"Your role is to translate to English. Provide the original text as well as the translated text in the response.\" },{\"role\": \"user\",\"content\": \"json: bonjour, comment ca va!\"}]}"
When we test the same query as we did earlier, we get the following content:
{
"original_text": "bonjour, comment ca va!",
"translated_text": "hello, how are you!"
}
That’s it. We now have a response with pure JSON data that can be parsed by a program in a predictable way.
There is another mode in the OpenAI API to get JSON output. It’s called JSON mode. It requires the prompt to include the keyword “JSON,” but in this mode, there is no certainty that the output JSON will follow a schema. As the doc puts it: “JSON mode is a more basic version of the Structured Outputs feature. While JSON mode ensures that model output is valid JSON, Structured Outputs reliably match the model’s output to the schema you specify. We recommend you use Structured Outputs if it is supported for your use case. You can read more about it on OpenAI’s Structured Outputs page.
In my experience, determining the proper JSON schema file that follows all the constraints can be challenging. It’s easy to miss adding the additional required properties fields in all the needed locations or forget to add a required field. Fortunately, using the schema generator tool offers an easier path. You provide a description of everything you will need, and the tool will generate a working schema!
Initially, OpenAI results were intended to be directly consumed by users as text or other media, but now structured output provides a formal mechanism to enable software programs to use results in a consistent and predictable way. This is not only useful when calling an LLM directly, but it is also the basis for the agentic world.
To learn how Progress enables customers to develop, deploy and manage responsible AI-powered applications and digital experiences, visit our AI solutions page.
Thierry Ciot is a Software Architect on the Corticon Business Rule Management System. Ciot has gained broad experience in the development of products ranging from development tools to production monitoring systems. He is now focusing on bringing Business Rule Management to Javascript and in particular to the serverless world where Corticon will shine. He holds two patents in the memory management space.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites