> Perhaps a better approach would be building an open source www index or even a full current cache - as an enabler for people to build their own search engines?
That's a excellent idea! In the spirit of open-data, and people can do with it what they want.
I think this is a great idea. How does this work with copyright? Search engines seem to be able to download a reproduce content from scraped pages (and wrap it in ads, and derive content from it) this is called “indexing” when they do it but scraping when everyone else does it.
E.g. on Google if you search for "how to tie a tie", a little info box may pop up with step by step instructions. This content is taken from some website, but that website gets no page hits or ad revenue. Instead, Google gets to serve ads on the search engine results page.
(I don't know if this happens for this specific example, but Google does this for some searches)
Part of why sites participate in the infobox program is that in practice you do get quite a lot of hits from it: many people click through to see the answer in context.
I think they're referring to how Google "extracts" answers from your website and shows it on the search results page. Effectively meaning that the user doesn't even need to go to your site to get the answer, because Google extracted it and gave it to them directly.
It seems to me that what they usually extract is some junk only vaguely related to the query and often cut apart and reassembled in a way that's just wrong.
That's a excellent idea! In the spirit of open-data, and people can do with it what they want.