How to Fix LangChain with_structured_output Malformed JSON

langchain structured output openai pydantic json schema

Malformed JSON from LangChain's with_structured_output is most commonly caused by using the default method="function_calling" instead of method="json_schema", which uses OpenAI's native Structured Outputs to guarantee valid JSON. The second most common cause is a Pydantic model that uses features unsupported by OpenAI's strict schema, such as Union types without discriminators, Optional fields missing explicit defaults, or nested models without additionalProperties: false. Switch to method="json_schema", pin every optional field with a default, and the problem disappears.

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class ExtractedEntity(BaseModel):
    name: str = Field(description="Entity name.")
    entity_type: str = Field(description="One of: person, org, location.")
    # Always provide a default for optional fields.
    confidence: float = Field(default=0.0, description="Confidence score 0-1.")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Use json_schema method for guaranteed valid JSON.
structured_llm = llm.with_structured_output(
    ExtractedEntity,
    method="json_schema",
    strict=True,
)

result = structured_llm.invoke("Satya Nadella is the CEO of Microsoft.")
print(result)

Why method="json_schema" Fixes Malformed JSON

LangChain's with_structured_output supports three methods under the hood, and they behave very differently.

function_calling (the default) sends your schema as a tool definition. The model usually returns valid JSON, but it's not guaranteed. The model can still hallucinate trailing commas, unescaped characters, or truncated output. LangChain then tries to parse the raw string and throws OutputParserException or json.JSONDecodeError.

json_schema uses OpenAI's native Structured Outputs feature (the response_format parameter with type: "json_schema"). This constrains token sampling at decode time so that the output is always valid JSON matching your schema. It can't produce malformed JSON because the constraint is structural, not behavioral.

json_mode guarantees valid JSON but does not guarantee that your schema is followed. Fields can be missing or renamed.

Use json_schema with strict=True unless you have a specific reason not to.

# Compare the three methods side by side.

# Method 1: function_calling (default) — no JSON guarantee.
llm.with_structured_output(ExtractedEntity, method="function_calling")

# Method 2: json_schema — guaranteed valid JSON matching schema.
llm.with_structured_output(ExtractedEntity, method="json_schema", strict=True)

# Method 3: json_mode — guaranteed valid JSON, schema NOT enforced.
llm.with_structured_output(ExtractedEntity, method="json_mode")

Pydantic Schema Gotchas That Break Strict Mode

OpenAI's strict schema mode has specific requirements that Pydantic doesn't enforce. If your schema violates these, you get a 400 error from the API or a silent fallback to non-strict parsing. A few rules catch people off guard.

Every Optional field needs an explicit default. OpenAI's strict mode requires that all properties appear in required, and optional fields must use a Union[type, None] pattern with default=None. If you write confidence: Optional[float] without = None, LangChain's schema conversion might produce an invalid spec.

No unsupported types. Decimal, datetime, set, and custom validators that alter the JSON schema aren't supported. Stick to str, int, float, bool, list, dict, nested BaseModel, Literal, and Enum.

All nested objects need additionalProperties: false. LangChain handles this automatically when you pass strict=True, but if you're building the schema dict manually, you must set it on every nested object.

from typing import Optional, Literal
from pydantic import BaseModel, Field

class Address(BaseModel):
    street: str = Field(description="Street address.")
    city: str = Field(description="City name.")
    # Use Literal for constrained values.
    country_code: Literal["US", "UK", "DE", "FR"] = Field(description="ISO country code.")

class Person(BaseModel):
    name: str = Field(description="Full name.")
    age: int = Field(description="Age in years.")
    # Optional with explicit default — required for strict mode.
    nickname: Optional[str] = Field(default=None, description="Nickname if known.")
    address: Address = Field(description="Primary address.")

Debugging Structured Output Errors in LangChain

If you've switched to json_schema and still see failures, the issue is almost always in the schema conversion layer. LangChain converts your Pydantic model to a JSON Schema dict and sends it to OpenAI. You can inspect exactly what gets sent.

import json
from langchain_openai.chat_models.base import _get_llm_output_schema

# Inspect the schema LangChain actually sends to OpenAI.
schema = Person.model_json_schema()
print(json.dumps(schema, indent=2))

# If a field shows "anyOf" instead of a clean type, OpenAI strict mode rejects it.
# Fix: simplify the type or add a discriminator.

Handling Streaming with Structured Output

Here's a subtle gotcha: if you use .stream() with with_structured_output, LangChain accumulates partial JSON chunks and only parses at the end. With method="function_calling", a network interruption mid-stream produces a truncated string that fails to parse. With method="json_schema", OpenAI guarantees that even a finish_reason: "length" (max tokens hit) returns a valid JSON prefix. However, LangChain might still fail to hydrate your Pydantic model because fields are missing. Always check finish_reason in production.

structured_llm = llm.with_structured_output(
    ExtractedEntity,
    method="json_schema",
    strict=True,
    # Return raw output alongside parsed for debugging.
    include_raw=True,
)

response = structured_llm.invoke("Extract: Apple Inc. is based in Cupertino.")

# Access both parsed result and raw API response.
parsed = response["parsed"]
raw_message = response["raw"]
print(f"Finish reason: {raw_message.response_metadata.get('finish_reason')}")
print(f"Parsed: {parsed}")

Version Compatibility for LangChain Structured Output

The method="json_schema" option requires langchain-openai>=0.1.19 and works with gpt-4o-mini, gpt-4o, gpt-4.1, and newer models. Older models like gpt-3.5-turbo don't support Structured Outputs, so you must use function_calling and add retry logic for parse failures. If you're on an older langchain-openai, upgrade first.

# Upgrade to get json_schema support.
# pip install --upgrade langchain-openai langchain-core

import langchain_openai
import langchain_core
print(f"langchain-openai: {langchain_openai.__version__}")
print(f"langchain-core: {langchain_core.__version__}")

Summary

Set method="json_schema" with strict=True. This is the single most impactful fix. Give every Optional field an explicit default of None. Avoid unsupported types like datetime or Decimal. Use include_raw=True during debugging to see exactly what the API returned before parsing. If you're on a model that doesn't support Structured Outputs, wrap function_calling in a retry with exponential backoff, because malformed JSON is a matter of when, not if.

← Back to all articles