Are you using that approach in production for grounding when PDFs don't include embedded text, like in the case of scanned documents? I did some experiments for that use case, and it wasn't really reaching the bar I was hoping for.
Yes, this was completely image-based. Not quite of a point of using it in production since I agree it can be flakey at times. Although I do think there's viable workarounds, like sending the same prompt multiple times, and seeing if the returned results overlap.
It really feels like we're maybe half a model generation away from this being a solved problem.
Are you using that approach in production for grounding when PDFs don't include embedded text, like in the case of scanned documents? I did some experiments for that use case, and it wasn't really reaching the bar I was hoping for.