Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I immediately thought about this as well, however I seem to recall reading somewhere (and I could be entirely wrong here) that NASA has a requirement to give away freely their science data.


If there's a marginal cost for each copy of the data that's transferred to a user, I don't think asking the user to cover that cost conflicts with a requirement to "give away the data".

(If they distributed their science data in printed form, surely they'd be allowed to charge people for the cost of printing & mailing the paper copies; that's quite different from charging for the data itself.)


Why the downvotes? This isn't uncommon or unreasonable if you're downloading TB's of data. Also the data would be freely redistributable if someone took the data and put up a torrent. Still I'd rather see NASA host their own data. Put up an FTP server, torrent server and save a lot of money on hosting fees.


While proxying through a torrent system is a good idea. I doubt it would get well seeded outside a few popular datasets- the agency would end up the sole seeder of the long tail.

I’m willing to bet NASA saves a ton of money by going to a cloud provider- US government storage setups are insanely expensive. I remember a project I was on got a quote of over $10,000/TB in 2014, and there is no way egress is actually free right now- they are paying for a government regulation compliant internet connection one way or another.

I do worry about vendor lock in to a degree, but I’m confident the agency and tax payers would save money going to any major cloud provider.


Sounds like there is a bigger story there and it's probably a managed SAN.

I've operated pretty significant government shared infrastructures like this in the past... we were offering fast, flash-cached disk in 2010 for about $5,000/TB. $10k/TB is not unreasonable for highly available Tier-1 storage for something like SAP, especially in that era where you couldn't use all flash in most case.

Today, cost structures can be very different. You can land high-iop storage for a fraction of the cost without the overhead of a big SAN. If you need capacity focused storage, that is also much cheaper.

An agency like NASA gets hosed on services, and cloud is no different. AWS is probably a net savings for operational workloads whose characteristics are known. Backup is a no-brainer. But for a high-volume, operationally highly variable thing like a public archive of data, AWS a square peg in a round hole because of the metered access.


I’m sure that $10k/terabyte quote was complete overkill for what we needed- but that’s what the stove piped storage org was offering, and it killed the project we were working on.


I hope you can correct my numbers but I am pretty sure this is within the same decimal order of magnitude :

If 1-2TB drives were handily $1k in 2010 (2005 $1K hot you 128GB 15KRPM)

and your array set is at least R10,

already raw storage is approaching half of ten thousand dollars.

And this ignores controllers, cabling and chassis.

And this is before we look at our storage software licenses.

Is backup, point in time SLA, replication and availability in this budget?


I wasn't really sure what they pitched us technically, but your pitch sounds reasonable. It was also complete overkill- we were hosting read only static images (map tiles). Azure and AWS were less than $300/TB/Year at the time, and their triple replication was more than what we needed availability wise.


Maybe I'm missing vital context info here: Why didn't you go with an alternative?


Because the storage group refused to sign off on a cheaper solution with lower specs (I don't know why) and acquisitions in the government is a mess so going outside would have tied up one of our primary constraints (the tech lead) more than it was worth.

The overall system ended up with worse capabilities than it should have had, but it did ship.


Wow! That's good to know, if a bit disheartening. I guess I was thinking costs for small startup costs with some cheap-ish linux raid setups and likely massive fiber taps NASA must surely already have. Not government/big business costs.


What causes a cost of $10000/TB? Even with multiple redundant failsafes I just cannot see how the cost could run up to that.


In 2014?

You'd be buying something like an EMC vMax that can sustain 1M+ IOPS on lots of 15K spinning drives, with caching tiers on crazy expensive flash.

To support that, you need a fibre channel network layer and a bunch of FTEs to attend to it. Usually compliance requirements require segmentation of roles, which increases cost. If you're a federal government entity, those FTEs are most likely contractors billed out at $125-300/hr. Figure $3-5M/year on labor costs alone, although that may be divided out over multiple systems.

This happens in commercial business too. I had a buddy who was making about $150k in NYC to zone luns on a SAN. Basically he kept a spreadsheet and updated a specific configuration setting 2-3x a day and spent about 60-90 minutes/day doing that. The rest was waiting or studying for his MBA.

It's pretty wacky to compare S3 to this type of storage.


At a technical level yes, it’s wacky. At a “this is what government departments actually do” level, it’s perfectly reasonable. I’m sure NASAs current system is actually pretty efficient as the us government goes, but having spent a career running into the sort of institutional pathologies that lead to an interdepartmental quote for $10k/terabyte, I’m willing to bet AWS is very competitive.


A million iops from spinning rust?

200 iops per drive from 2.5" 15KRPM is good going....

Edit:iops auto spellings


By the way, depending on where it's hosted, S3 can seed torrents automatically: https://docs.aws.amazon.com/AmazonS3/latest/dev/S3TorrentRet...


Records departments always charge for copies, and that is the use I thought of immediately when I learned of Requester Pays. I’d be surprised if NASA couldn’t use it.


Why FTP - torrent it all the way, perhaps have the AWS as nodes...



> If there's a marginal cost for each copy of the data that's transferred to a user, I don't think asking the user to cover that cost conflicts with a requirement to "give away the data".

Charging the user for data, even if it is on a marginal cost basis, conflicts with a mandate to give data away freely. Because “at the marginal cost of delivery” is not “free”.

(It's true that it is common for mandates to specify something like at marginal cost of delivery rather than free—sunshine laws providing copies of public records often work that way—but that's not the applicable mandate here; in fact, since without the separate mandate here the data would be available on a marginal cost basis under FOIA, the main reason for a separate mandate is to negate that cost.)


Do you have a citation for the "mandate to give data away freely"?

I found https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPD&c=2230&s=1, which mentions things like "Ensure public access...", but I don't see anything there mandating such public access to necessarily be at zero cost.


Also, public access can mean that once someone gets a copy of the data they can host it for free as well. It's not as if it's under a commercial license.


While the data is free, the cost of getting the data to you can be charged. Originally, it was to cover the expense of someone pulling the data, making copies, and then mailing that data out to you. If it was photographic, you'd be charged for the prints. I'd see using Requester Pays in the same vein. They are not charging you for the data, but any fees incurred to obtain the data would be at your expense.


Isn't requestor pays just like I pay for gas to drive to my local library, when I can't bike because I want to borrow so many books, but the books are free to loan.


It's more like we both have a library, the books are free, but if I want to take some of your books I have to pay for shipping.


I'm pretty sure its like when I buy a book, and than I pay for it.


It's required to be public domain. IMO it's comparable to FOIA requests still requiring the requester to attach a stamp to the envelope their request goes in. Or at most, include a self-addressed stamped envelope too.

Requiring you to pay S3 is little different than requiring you to have Internet access, and thus pay whichever company includes you in THAT monopoly, IMO.


To me it feels very different.

Imagine for a moment that in order to access NASA data sets you had to have a Fastmail email account. Gmail won't work, Outlook won't work, it has to be Fastmail alone.

That would be very objectionable (as much as I adore Fastmail).

Ability to pay one specific cloud provider should not be a gate for public domain government data.


I don't think this analogy works. For Fastmail, there is a cost regardless of whether you want to access government data. You have to pay for the account itself. For most cloud providers, there is zero cost for having an account. Even if they hosted this themselves, they could just as likely charge for data transfer costs...and get to choose how to collect that. They could choose PayPal and you have to create an account. Or they take credit cards...and you must have a card belonging to one of the networks they support. The barrier to entry doesn't change regardless of how many cloud providers there are, all it does is increase infrastructure costs unnecessarily.


The alternative here, though, to get comparable distribution / durability, etc. by spending way more of the public's money upfront regardless of who wanted it. I get the purist / idealistic argument here, but it feels a bit like cutting off one's nose to spite their face.


I'm not an expert, but most government agencies are allowed to charge reasonable fees for access to their data. I don't know if this qualifies, but it at least seems like a possibility, especially if it's transparently just passing along their costs in the form of AWS' own cost structure




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: