Structured Output

In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.

There are several approaches to achieve a structured output:

  • Prompting the LLM to strictly return a defined structure.

  • Using LLMs that natively support schema enforcement.

  • Post-processing the LLM's response to extract structured content.

In practice, Prompting is simple and reliable for modern LLMs.

Example Use Cases

  • Extracting Key Information

product:
  name: Widget Pro
  price: 199.99
  description: |
    A high-quality widget designed for professionals.
    Recommended for advanced users.
  • Summarizing Documents into Bullet Points

summary:
  - This product is easy to use.
  - It is cost-effective.
  - Suitable for all skill levels.
  • Generating Configuration Files

server:
  host: 127.0.0.1
  port: 8080
  ssl: true

Prompt Engineering

When prompting the LLM to produce structured output:

  1. Wrap the structure in code fences (e.g., yaml).

  2. Validate that all required fields exist (and let Node handles retry).

Example Text Summarization

import yaml
from brainyflow import Node, Memory

# Assume call_llm is defined elsewhere
# async def call_llm(prompt: str) -> str: ...

class SummarizeNode(Node):
    async def prep(self, memory):
        # Assuming the text to summarize is in memory.text
        return memory.text or ""

    async def exec(self, text_to_summarize: str):
        if not text_to_summarize:
             return {"summary": ["No text provided"]}

        prompt = f"""
Please summarize the following text as YAML, with exactly 3 bullet points:

{text_to_summarize}

Now, output ONLY the YAML structure:
```yaml
summary:
  - bullet 1
  - bullet 2
  - bullet 3
```"""
        response = await call_llm(prompt)
        structured_result: dict

        try:
            # Extract YAML block
            yaml_str = response.split("```yaml")[1].split("```")[0].strip()
            structured_result = yaml.safe_load(yaml_str)

            # Basic validation
            if not isinstance(structured_result, dict) or "summary" not in structured_result or not isinstance(structured_result["summary"], list):
                 raise ValueError("Invalid YAML structure")

        except (IndexError, ValueError, yaml.YAMLError) as e:
            print(f"Failed to parse structured output: {e}")
            # Handle error, maybe return a default structure or re-throw
            return {"summary": [f"Error parsing summary: {e}"]}

        return structured_result # e.g., {"summary": ["Point 1", "Point 2", "Point 3"]}

    async def post(self, memory, prep_res, exec_res: dict):
        # Store the structured result in memory
        memory.structured_summary = exec_res
        print("Stored structured summary:", exec_res)
        # No trigger needed if this is the end of the flow/branch

Besides using assert statements, another popular way to validate schemas is Pydantic

Why YAML instead of JSON?

Current LLMs struggle with escaping. YAML is easier with strings since they don't always need quotes.

In JSON

{
  "dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
}
  • Every double quote inside the string must be escaped with \".

  • Each newline in the dialogue must be represented as .

In YAML

dialogue: |
  Alice said: "Hello Bob.
  How are you?
  I am good."
  • No need to escape interior quotes—just place the entire text under a block literal (|).

  • Newlines are naturally preserved without needing .

Last updated