Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We stored hundreds of petabytes on cheap SATA drives with random fragment placement using reed solomon 6+3 coding (half the space of 3 replicas but same durability). Never lost a byte.

Speed of recovery is crucial, because that’s your window of vulnerability to multiple failures. For example. try raid 5 on giant drives. The chances of losing a second drive during recovery is very likely.



No need to be rude. EDIT: The offensive part was removed

What was the probability of failure of your drives? My guess is you just didn't hit the threshold for your failure rate. The maths checks out (PhD here). Seriously, do the calculation.


We lost drives all the time. In fact we moved so much data we needed checksums to avoid the 1e-13 undetected data errors.

We seriously did do the calculations (done by serious PhDs) and we seriously did not lose data.

I’m sure you are imagining a system that doesn’t work. But that doesn’t mean only a raidlike setup can work.

And by the way, could you explain how to calculate chance of data loss without taking recovery time into account?


To clarify, the assumptions I'm making for the calculation are:

1) a Fixed probability of a server failing

2) a fixed erasure coding scheme used for all files

3) uncorrelated server failures

4) an erasure fragment is stored on a random server


It boils down to the following:

You can calculate a probability L of losing a given file.

Because we've assumed totally uncorrelated failures that means this is the same for all files, and that the probability of losing NO files if you have T files is (1 - L)^T

As you can see, this approaches 0, meaning Pr(losing a file) approaches 1 as T increases.

Using the probability of file loss in Sia, which I would say is is too low, but lets ignore that. They get L = 10^-19.

This leads to T = ~10^19 before you expect to lose data. If you're erasure coding on the byte level, then that's 10 exa bytes.

I expect your probability of failure is much less than random nodes on a distributed global network of volunteers. so yes, ~petabyte is below the threshold, but there is a threshold.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: