Who the hell is going to be running a multi-petabyte data array? NO ONE. They are not using that much space.
I was off by a bit. Okay, so lets do the math on the largest enterprise disk you can get (300GB Hitachi ultrastar). With RAID 5 (which offers the best redudancy the cheapest, many places don't use this, in favor of RAID 10, which requires double the disk space, for full mirroring), that's 6,640,000 MB for a 4 TB array, which would hold 16,000 users at 250MB, assuming there are no 100MB users at all (unlikely), or 40,000 unpaying users at 100MB. Sure, neither is currently even close, but over the next year or so, it very well could be. (and there WILL be users that use more than 250MB, especially as time goes on and they don't feel like archiving it somewhere and taking it offline).
Okay, so let's do the math on the drives. I found a steal on them, free second day shipping, $925.00 apiece
. That's assuming they have U320 SCSI rather than Fibre channel, which is more common in enterprise storage.
That's 23 drives, for a total cost of $21,275.
They would have to sell 2128 accounts to pay for the drives alone, which will be mostly occupied by users who aren't paying extra for it.
> The controllers of these drives far outprice the drives themselves.
Another one time cost which really isn't that high.
A used system on ebay is $38,975
, and that includes a paltry 384GB of useable disk space, and I don't even think it's capable of taking the drives I quoted above.
So now, we're up for $60,250, on the mythical idea they'd want to buy a system without support, and try and cram drives into it that it wasn't designed to take. I'm shooting low here on purpose (mostly because I've never seen netapp's quotes for a 4TB filer)
That's now 6025 accounts to subsidze this machine at paid user only levels.
Furthermore, the costs of transfer are not included in your equasion
Costs of transfer, you mean spending a few minutes (at MOST) transferring a data over gigabit ethernet? Costs practically nothing.
Real datacenters charge by your 95th percentile. An average website takes less than 1MB 95th percentile. I would put livejournal somewhere around the 50MB/sec mark, 95th percentile. Livejournal apparently gets around 100,000 queries per second, according to their presentation
, estimating 1k of transfer per hit, which is a conservative number.
That's $2,500/mo for the line, as it stands, with no further traffic, estimating a really cheap price of $50/mb for 50MB commit on a 100MB burstable connection.
That places a lowball figure of $30,000/year for bandwidth, which would go up. Yes, LJ currently pays for this. We'll estimate that it brings up traffic another 10 MB/sec, which is another $500/mo. Not much, you're right, but it adds up, as they must add this into their infrastucture costs as well.
Add in the costs of a credit card authorization (for me it's $0.20, plus 2.20%, so there's about $0.50 of the $10 right there.
Wow, a tiny percentage for CC charges, oh my!
For our 6000 users, that totals up to be $3000 you can't count on.
(Which drags the minimum users back up a bit)
Then someone who has skills and knowledge and experience dealing with large arrays,[...], which adds to the costs as well.
LJ has no such experts. See: last LJ blackout.
No, you're just an idiot. They just didn't protect against both power supplies at their datacenter failing, along with the battery plant, the generator, and the transfer switch. That's a freak thing, and if they only have $2.5M/year with no outside investment I'd not expect them to replicate their site in a different datacenter.
They manage several hundred servers to provide an overall uptime of 99.9% over a 3 year period, for a free service, where only a few percent actually pay to use it. Yea, you're right, they're dumbasses.
I presume you could do better?