Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nah, no official API, just parsing the site. It's possible that they'd choose to block it, though I've gone out of my way to give them no reason to. PadMapper's CL crawler is built to be as light as possible on their servers, and the crawling is constrained to a very small subset of the site. If it gets blocked, PM has other sources, but it would definitely suck.

If you tried to make something that crawled the entire site, I would imagine that they would block you pretty quickly, though, if only for the load it would cause.



Any particular libraries or techniques you used for the crawling / parsing, or just coded up something from scratch for each site you crawl? (I think I noticed you even do kijiji.)


All of it is from scratch. Yep, I do Kijiji, but only in Canada, because I was told it's more important than CL in some parts there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: