← Back to leaderboard
OpenAI

OpenAI API

https://openai.com · v2.3.0 · 242 operations · 959 schemas
43
D

A particularly ironic result: the company building the LLMs has an OpenAPI spec that LLMs struggle with. Examples score 1% and error handling 4% — the two dimensions that would help agents call the API confidently. Parameters and pagination are excellent, but agents can't reach those wins without first knowing what to send and how to recover when something breaks.


Category breakdown

Examples1
Semantics51
Intent74
Error Handling4
Parameters97
Pagination98

Key findings

  • 241 of 242 operations lack request/response examples
  • 223 of 242 operations have missing or very short descriptions
  • 228 of 242 operations omit error response documentation
  • 951 of 959 schemas lack examples
  • 81 operations have missing or poor operationIds

What OpenAI should fix

  1. Add examples to the 241 operations and 951 schemas missing them. This is the highest-leverage fix.
  2. Document error responses on the 228 operations omitting them — especially the rate limiting and content moderation errors agents will hit constantly.
  3. Expand the 223 operations with missing or very short descriptions. "Creates a chat completion" doesn't tell an agent when to use it vs. /completions.
  4. Replace the 81 weak operationIds with descriptive verbs.


How does your API score?

Run AgenticScore on your own OpenAPI spec — find issues before LLMs do.

npx agenticscore score ./openapi.yaml
Get API Key →