The technical term is constrained decoding. OpenAI has had this for almost a yea...

		svachalek 8 months ago \| parent \| context \| favorite \| on: MCP Specification – version 2025-06-18 changes The technical term is constrained decoding. OpenAI has had this for almost a year now. They say it requires generating some artifacts to do efficiently, which slows down the first response but can be cached.