OpenAI API

https://openai.com · v2.3.0 · 242 operations · 959 schemas

OpenAI scores 32, an F. The company building the LLMs ships an OpenAPI spec that LLMs struggle with. Examples score 1 out of 100. Error handling scores 5. These are the two dimensions that most directly determine whether an agent can call the API. Parameter documentation (97) and pagination (98) are excellent, but agents cannot reach those wins without first figuring out what to send and how to recover when something breaks.

Category breakdown

Examples1

Semantics39

Intent75

Error Handling5

Parameters97

Pagination98

Key findings

241 of 242 operations lack request and response examples
223 of 242 operations have missing or very short descriptions
Most operations omit error response documentation
951 of 959 schemas lack examples
286 of 959 schemas lack descriptions

What OpenAI should fix

Add examples to the 241 operations and 951 schemas missing them. This is the highest-leverage fix.
Document error responses on every operation, especially the rate limiting and content moderation errors agents will hit constantly.
Expand the 223 operations with missing or very short descriptions. "Creates a chat completion" does not tell an agent when to use it versus /completions.
Replace weak operationIds with descriptive verbs.

How does your API score?

Run AgenticScore on your own OpenAPI spec — find issues before LLMs do.

npx agenticscore score ./openapi.yaml

Install the CLI →