Vector graphics rendered at very low resolutions like 16x16 or whatever favicons are rarely look as good as hand-crafted bitmaps. Is it possible to make them look good?
Yes, but it requires some graphics techniques that browsers mostly don’t implement, or implement in all environments, yet. I can’t remember the full range of techniques to do it properly, but it starts at least conceptually with supersampling: not rendering at 16×16, since that causes real problems with aliasing when you have overlapping objects, but rather rendering at a higher resolution (where the aliasing effects are reduced or eliminated) and scaling down. For perfect results on 8-bit colour, I think you’d need to do 16×16 supersampling (rendering 16× as large in each direction, so 256×256 and scale down to 16×16) so that each pixel is the combination of 256 pixels, but except for very fine detail, 4×4 supersampling (render at 64×64 and scale down to 16×16) should generally be good enough.
I think there are fancier ways of doing this, and of getting GPU support, but I can’t think of the terms now. Hopefully someone else will chip in.
But if you’re aware of the ways in which most renderers do an inferior job, you can work around them and craft your SVG so that it won’t be affected: things like making sure that objects align with pixel boundaries and that each pixel never has more than one object partially covering it.
I wonder if it is possible to have an ML filter that can downsample in a way that is aesthetically pleasing, even if things are not aligned at pixel-boundaries.
What you’re describing sounds like automated hinting. It’s a nice idea, but in practice it’s going to mess up too much stuff to be worth it, even if performance weren’t a problem.
Machine learning is singularly well-suited for superresolution, because you can trivially downsample and train it on that inverse operation, so training at scale is really easy.
But pleasing hinting—even operating on raster rather than vector sources—can’t be trained in the same way, because it’s inherently more subjective; it may be possible to come up with an alternative approach to training, but I suspect it’ll still be much more prone to inducing significant errors.
And of course, performance in all these things is such that they’re not going to be shipped in browsers; from your link, that model is thousands of times more expensive than the (admittedly-inferior) alternatives. (What’s their disk space like? I’m not familiar with how big such ML models end up.)