Cool post. We did a similar evaluation for document segmentation using the DocLayNet benchmark from IBM: https://ds4sd.github.io/icdar23-doclaynet/task/ but on modern document OCR models like Mistral, OpenAI, and Gemini. And what do you know, we found similar performance -- DETR-based segmentation models are about 2x better.
Disclosure: I work for https://aryn.ai/