Fair enough but there is now zero need to load them from the AMP cache at all - ...

jsnell · on April 17, 2019

If you're loading the content from the originating site, surely there's no benefit at all to signing. If you're loading the content directly from the site, the browser just needs TLS to verify the integrity of the content.

And you're also back to the situation where you can't preload the content in a controlled manner or privacy-preserving manner, nor have the page-speed guarantees since the version being served to the user is not the version that Google crawled.

It's kind of the opposite. The cache is where the actual benefits come from. That's not the part you want to get rid of. The AMP spec was just a vehicle for making the caching possible in a secure manner.

This model would theoretically allow the validation, caching and prefetching to be done for all (signed, so opt-in by the publisher) HTML pages. Which is another one of the historical top complaints about AMP: why can't light, fast-loading, mobile-friendly HTML get the same treatment in search results.

> Can you maybe see that people feel the browser is now lying to them about where the content is coming from?

I can see that they are feeling like that, I just don't understand how they arrived there.

How is this different from a e.g. company X's website being behind Cloudflare? The browser didn't contact the actual server that company X hosted the content on. Instead the browser contacted a server run by Cloudflare that could prove cryptographically (via TLS) that it was authorized to serve content on behalf of the actual site.

AlexandrB · on April 17, 2019

> And you're also back to the situation where you can't preload the content in a controlled manner or privacy-preserving manner...

A few people have pointed out the privacy-preserving aspect of AMP. I'm not sure I get how that's the case. Is this referring to the fact that the page is not being pre-loaded from the content owner's own webserver? The main privacy violators on the internet are Google and Facebook. How is loading something from Google cache protecting my privacy?

Worse still, if someone posts an amp link on Twitter or a chat client Google now gets to know when I access a specific website even though they are an unrelated third party[1].

Edit: [1] In practice this was probably already the case since Google Analytics is so popular. But still.

gregable · on April 17, 2019

Good question.

If you make a search query, but have not clicked on any results, you have a privacy expectation that the web servers of the search results you have not clicked on will not know you performed this query, your ip address, cookie, etc. For example, if you search for [headache] and then close the window, mayoclinic.com knowing that you made this query would probably be a surprising result.

With naive preloading, you would preload a search result from that origin. Your browser would make an HTTP request to the site and that site (sending an ip address, the URL you are preloading, and any cookies you may have set on that origin). So, this approach would violate your expectation of privacy.

Instead, if the page is delivered from Google's own cache, the HTTP request goes to Google instead of the publisher. Google already knows that you have made this query, and are going to preload it (the search results page instructed your browser to do so in the first place). The request will not have any cookies in it except for Google's origin cookies, which Google already knows as well. Therefore this type of preload does not reveal anything new about you to any party, even Google.

AMP has been doing this for a long time in order to preload results before you click them. However, until Signed Exchanges the only way to do this was that on click the page would need to be from a Google owned cache URL (google.com/amp/...). With Signed Exchanges, that can be fixed. The network events are essentially the same.

Note that once the page has been clicked on, the expectation of privacy from the publisher is no longer there. The page itself can then load resources directly from the publishers origin, etc.

To your last point, if someone posts a link on twitter to an AMP page on a publisher domain, and then you click it, your browser will make a network request to the publisher's origin. Google will not be involved in this transaction in any way. If someone explicitly posts a link to an Google AMP Cache Signed Exchange, then yes this will trigger a request to Google but this will be far less likely going forward as these URLs will never be shown in a browser. For example, try loading https://amppackageexample-com.cdn.ampproject.org/wp/s/amppac... using Chrome 73 or later. This is a signed exchange from one domain being delivered from another. You'll never see that URL in the URL bar for more than a moment, so it's unlikely to ever be shared, like I'm doing now.

AlexandrB · on April 17, 2019

Thanks, this was very informative. I'm not a fan of AMP at all, but this helps me understand the reasoning a little bit better and why Google hosting the AMP cache is necessary for preserving privacy.

At its root, I think my objections to AMP boil down to a few things:

On a technical level:

1. It's buggy and weird on iOS.

2. I'm not convinced I care about a few seconds of loading time enough to justify the added complexity of making this kind of prefetching possible. Additionally, this seems like a stop-gap that will be rendered unnecessary by increasingly wide pipes for data.

On a philosophical level:

3. It gives Google way too much power over content.

4. I want the option to turn it off completely because of points [1] and [3], and because I fundamentally want to feel in control of my internet experience.

Edit: The point about SXG making AMP URLs less likely to get copy/pasted to other mediums is a key benefit I hadn't considered and will likely make avoiding AMP outside of Google search easier.

londons_explore · on April 17, 2019

2. How many URL's do you load in a day? My browsing history over the last 10 years averages to 417 pages per day. 2 seconds per URL is 35 days of my life...

I totally want that time saved if possible.

andy_ppp · on April 17, 2019

Making everything faster won't give you more time.

themacguffinman · on April 17, 2019

That's literally not true.

It looks like you were trying to make some deeper philosophical point, but you'll have to be clearer because your statement makes no sense.

gregable · on April 17, 2019

Bandwidth increases do not fix latency. If a document has to round trip from the other side of the planet, that adds about 200 milliseconds until we break the speed of light. If that same document must make several round trips to be able to initially load (very common!) this adds up rather quickly. The only solutions are localized caching and prefetching.

andy_ppp · on April 17, 2019

Yes, exactly why people think this is creepy. I also expect you not to start using battery rendering shit I haven’t asked you to in the background or data that again you don’t have permission to use. Just because the majority of users don’t care doesn’t mean you gregable are not corrupting the foundation of a free web. I still feel you’re making the web super creepy, grabbing extra data and the whole project focused could be accomplished without this embrace and extend - derank slow pages more aggressively doesn’t lead to a two tier web and doesn’t tie everyone further into Google’s brain washing algorithms. But this “solution” to the problem at least for now Chrome doesn’t visit Google for these new style links from elsewhere so at least that is some improvement. The fact this whole project should not exist and adds zero value and I can’t opt out is a massive problem for me.

jsnell · on April 17, 2019

If the browser were to prefetch search results, it would leak information to all the result pages about the user having done that search. (I once had a blog post accidentally rank on the first page for "XXX". I really don't want to know who is searching for that particular term.)

Google has to know what you're searching for to compute and show the results. So there are few additional privacy implications from the preload.

And your last case is exactly what will no longer happen. People will now copy-paste the original URL rather than the cache URL. Click on the link, and you're taken to the original site.

andy_ppp · on April 17, 2019

Then don’t cache stuff that you haven’t told the user you are caching from third party sites.

andy_ppp · on April 17, 2019

> If you're loading the content from the originating site, surely there's no benefit at all to signing. If you're loading the content directly from the site, the browser just needs TLS to verify the integrity of the content.

The browser security model stops them from doing this, but presumably in this new world they could allow this to work and not host the content in the carousel themselves.

I think the argument about content suddenly becoming "slow" and no longer AMP validated if it's not served from the AMP cache is a poor one.

Finally I'm willing to postpone judgement but I did just explain why people feel that Google is embracing and extending the web if you can't understand why people are worried about this that's not something I can help you with ;-)

Cloudflare does not have the same scope, power, monopoly or scale that Google have - I can change CDN provider if they start doing weird stuff, no problem, but I can never really get away from Google.