> Perhaps a better approach would be building an open source www index or even a...

artificial · on July 10, 2022

I think this is a great idea. How does this work with copyright? Search engines seem to be able to download a reproduce content from scraped pages (and wrap it in ads, and derive content from it) this is called “indexing” when they do it but scraping when everyone else does it.

bosie · on July 10, 2022

> and wrap it in ads, and derive content from it

i am probably missing something but can you give an example where this happens?

Flashtoo · on July 10, 2022

E.g. on Google if you search for "how to tie a tie", a little info box may pop up with step by step instructions. This content is taken from some website, but that website gets no page hits or ad revenue. Instead, Google gets to serve ads on the search engine results page.

(I don't know if this happens for this specific example, but Google does this for some searches)

jefftk · on July 10, 2022

> that website gets no page hits

Part of why sites participate in the infobox program is that in practice you do get quite a lot of hits from it: many people click through to see the answer in context.

bosie · on July 10, 2022

Ok, I just tried but don't see that info box but that is exactly what i was asking for. Thank you very much, did not think of that.

zo1 · on July 10, 2022

I think they're referring to how Google "extracts" answers from your website and shows it on the search results page. Effectively meaning that the user doesn't even need to go to your site to get the answer, because Google extracted it and gave it to them directly.

thfuran · on July 10, 2022

It seems to me that what they usually extract is some junk only vaguely related to the query and often cut apart and reassembled in a way that's just wrong.