Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I disagree. The deduping enables low-cost snapshot semantics which has drastically simplified my life.

Every 24 hours, I dump a database and let tarsnap work out what to actually send to S3. And it does such a good job, for such a low price, that it boggles the mind.



Fair enough, but I'm reacting to the unbased claim that you can just divide it by "orders of magnitude" when the original post does not contain enough information to claim that, and indeed enough information that it's probably not true. If you have 100GB of server-type service data, you're not looking at "orders [plural] of magnitude" less space taken up on the backup service. The huge multipliers that dedupe is sometimes cited as giving are for certain datasets, you do not get "orders of magnitude" shrinking in the general case.


The claim is based on personal experience.

With Mixpanel I'd expect even better results since their data is append-only by nature. I.e. think about deduplication when backing up a large append-only log file on a daily basis.

Tarsnap has weaknesses but cost of storage for Mixpanel type of workload is not one of them.


Note that mixpanel is talking about full dumps. Tarsnap does cross-backup deduplication, so while your first 100GB dump may take 80GB, the next may take 100MB.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: