It's an issue that you run into as long as you're forced to start with a yes/no ...

		85392_school 11 months ago \| parent \| context \| favorite \| on: Block Diffusion: Interpolating between autoregress... It's an issue that you run into as long as you're forced to start with a yes/no answer. It's a problem forward-only LLMs have and diffusion models don't, and normal block diffusion is closer to forward LLMs than diffusion models. You could increase the block size to act more like a full diffusion model, but you would lose some of the benefits of block diffusion.

Interesting. Makes me want to play around with an open diffusion LM. Do you have any recommendations?