Correct me if I'm wrong, but this shows that in the last 90 days 5.06 billion visits came from 31.3% Windows while 1.1% were from GNU/Linux. If we assume that both groups visit government websites equally often, then for every GNU/Linux user there exist (31.3/1.1) = 28.5 Windows users. Scary stuff.
1.1% is way too high. I bet they didn't filter out all the scrapers that poll gov't websites.
Hmm I wonder if just parsing the HTML still works like it did 8 years ago when I had to scrape the USPS: https://github.com/NavinF/USPS-scraper/blob/master/USPS_scra...
As long as the USPS only allows API requests from browsers (as opposed to the much more common situation where you need to update the status of every tracking number in a database), people still have to scrape their website pretending to be a browser.
Oh they had an API 8 years ago too. It’s just that they only let you use that API from JavaScript running on your users’ browsers.
The undocumented tiny ratelimits and threat of bans for server-side API users (while no such ratelimits applied to the HTML pages) forced pretty much every app to scrape their HTML server side.
From my README which quotes their old docs: “Note: The United States Postal Service expressly prohibits the use of Web Tools "scripting" without prior approval. Web Tools scripting can be defined as a technique to generate large volumes of Web Tools XML request transactions that are database- or batch-driven under program control, instead of being driven by individual user requests from a web site or a client software package. The USPS reserves the right to suspend server access without notification by any offending party that does not have prior approval for Web Tools scripting. Registered Web Tools customers that believe they have a legitimate requirement for Web Tools scripting should contact the ICCC to request approval.”
For what it's worth, Linux is over represented in the (fake) User-agent strings of the bots that attack my web servers. Most probably are indeed on linux, since they are predominantly scripts running on cloud providers. :)
That site shows 113.1 million visits in the last 90 days with 47.2% being Windows and 1.2% linux. (47.2/1.2) = 39.3 windows users for every linux user.
I agree that it is skewed in a number of ways. I just wanted to estimate a lower bound.
Thank you for pointing out a better source of data for my use case.