Sounds interesting. Care to publish your results (yes I know caniuse however I'm more after a core set of browser APIs/features defining a "real" long-term standard)? Btw I myself have also put quite some effort in distilling a HTML DTD grammar from WHATWG/W3C materials [1], defending it in conferences, etc.
[1]: http://sgmljs.net/docs/html52.html