Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you distinguish crawlers from regular visitors using a whitelist? As stated in the article, the crawlers show up with seemingly unique IP addresses and seemingly real user agents. It's a cat and mouse game.

Only if you operate on the scale of Cloudflare, etc. you can see which IP addresses are hitting a large number of servers in a short time span.

(I am pretty sure next they will hand out N free LLM requests per month in exchange of user machines doing the scraping if blocking gets more succesful.)

I fear the only solution in the end are CDNs, making visits expensive using challenges, or requiring users to log in.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: