NASA to launch 247 petabytes of data into AWS, but forgot about egress costs

slowhand09 · on March 19, 2020

Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at the time it would be one of the worlds largest databases. We changed horses midstream moving our user interfaces from X-windows Motif to WWW. And built a very early Oracle DB accessible via WWW. There was no cloud then except missions studying atmospheric water vapor. When this was originally designed there were to be several (6-7) DAACs - Distributed Active Archive Centers (https://earthdata.nasa.gov/eosdis/daacs) to store data near where it was needed or captured. Now they have 12 and are storing on AWS. Amazon didn't exist when this was originally built.

acruns · on March 19, 2020

I was at ESRI when we were going to host this data, then Congress got involved and blocked it.

sizzle · on March 20, 2020

Awesome company, passed up on their offer at the beautiful Redlands campus.

anthonylukach · on March 19, 2020

This article seems short sighted.

1. Using the AWS cost calculator is pointless, naturally an entity the size of NASA would get heavily discounted rates. 2. As data volume grows, the complexities of working with that data expands. NASA appears to be embracing cloud computing by embracing a paradigm where scientists push computation to where the data rests rather than downloading data [1], [2], [3], thereby paying egress on only the higher order data products. 3. The report notes that NASA has tooling to rate limit and throttle access to data. This, in itself, proves that NASA didn't "[forget] about eye-watering cloudy egress costs before lift-off".

People may scream about vendor lock in, which is a fair complaint; but acting like NASA just didn't think about egress is misleading.

NASA is ultimately a science institution, I think diverting effort away from infrastructure management and towards studying data is likely a wise decision.

[1: https://www.hec.nasa.gov/news/features/2018/cloud_computing_...] [2: https://link.springer.com/article/10.1007/s10712-019-09541-z] [3: https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...]

Supermancho · on March 19, 2020

> NASA would get heavily discounted rates

Having spent a lot of money with AWS, that's giving Amazon more credit than I think is warranted.

owenmarshall · on March 20, 2020

What is “a lot?”

Datapoint: When $company hit very high six figures (closing in on seven) in monthly spend I found AWS was incredibly willing to cut our egress rates, often by a significant amount.

This was explained by our account team: bandwidth has some of the best margins for AWS but they’re willing to sacrifice that for their enterprise customers (read: suck us in closer to non-commodity services)

ganstyles · on March 19, 2020

+1 to this. I've been on teams that spent $75k/mo and didn't get any hint of a discount. Though we got our own on call rep to handle issues.

soared · on March 19, 2020

$75k/mo is tiny in the enterprise world. At Oracle they’d give a 22 year old fresh out of school ~30 accounts that size, for reference. I worked on a team of 9ish on a ~$5MM/mo account. (Not cloud, but a comparable business unit)

bosswipe · on March 19, 2020

At which level do you start having real negotiation power?

Mandatum · on March 19, 2020

$10MM/year is considered a big deal to large enterprise in NA, $1MM outside of NA.

gkolli · on March 20, 2020

NA is North America, correct?

HeWhoLurksLate · on March 20, 2020

Sí, señor

ganstyles · on March 19, 2020

Wow! I had a feeling our spend was a little low compared to big players, but I didn't realize how far off I was!

soared · on March 20, 2020

The big players have market caps measured in billions, so there aren’t a huge amount of them. IMO cloud is weird since for most products/services you can go buy from a smaller company to get better customer service, but that’s obviously not the case for aws/etc.

tedivm · on March 19, 2020

I've been on teams spending half of that, and managed to get great discounts.

My question whenever I hear that people didn't is who did they ask? AWS doesn't just jump in to give people free service- but if you reach out to them and tell them you need it they tend to work with companies.

punnerud · on March 19, 2020

The Norwegian map authority (Kartverket) found it way to expensive to use cloud for it’s 12PB. How would it be more viable for NASA? https://translate.googleusercontent.com/translate_c?depth=1&...

kempbellt · on March 19, 2020

>NASA is ultimately a science institution, I think diverting effort away from infrastructure management and towards studying data is likely a wise decision.

Indeed. I am glad to see them leveraging the power of an already proven infrastructure provider rather than spending X billions of dollars trying to build and maintain their own.

dsl · on March 19, 2020

> the power of an already proven infrastructure provider

Every major cloud provider is using Linux network drivers written by NASA employees.

In 1994 the NASA Beowulf project pioneered the idea of clustering together cheap commodity hardware to replace mainframes (this concept was later used to bootstrap hardware for Google)

NASA helped start the OpenStack project which powers a number of cloud providers.

Heck, NASA helped invent the GRiD Compass, the first laptop computer.

ksec · on March 20, 2020

>NASA helped start the OpenStack...

Does anyone still uses OpenStack? It seems to have been super hyped and then disappeared.

thaeli · on March 21, 2020

It's quietly powering a lot of private clouds these days.

sgt · on March 19, 2020

No, that is just ridiculous. NASA is more than capable of running their own server infrastructure. They've got expertise, they've got DC's and they don't need 99.999% uptime for most of their services. Cloud providers can turn out to be insanely expensive. I am not against cloud - mostly I would recommend it for businesses but when reaching a certain size you have to consider doing your own cloud infrastructure.

cookiecaper · on March 19, 2020

Yeah. It seems like everyone is hopping aboard that bandwagon and doesn't remember a world before 2009. For $1M, you can get yourself a very beefy server farm.

AWS accounts still take management and a team of people that need to maintain a whole lot of different aspects of it, so you're not really saving on headcount. You're just moving that capex to opex.

It's important to be flexible enough to be able to deploy onto a cloud provider if the situation demands (e.g., new client demands infrastructure run in $FOREIGN_REGION_X where you don't already have a DC), but everyone's insistence on going 1000% AWS is absurd and IMO totally unjustifiable.

unpopdancetrio · on March 20, 2020

I agree with you.... but no one wants the responsibility of securing the servers and keeping the hardware up to date. I assume NASA has an abundance of outdated hardware and no one to sell these outdated systems to. My fear is we will lose this ability to create our own servers in the future. My degree program had nothing dealing with cloud, I would tell the professors that IBM blumix and M$oft Azure was the future. My outlook toward the future was close, who knew the book store would be the biggest cloud provider. Nearly every company I worked for had a big initiative to go to the cloud. One infrastructure guy told me the final decision came when they realized they had no way to train the next generation, cloud upfront cost is less risky, and its nice to have someone else to point fingers toward when things do not work as expected.

kempbellt · on March 20, 2020

Sure, they are certainly capable of doing it themselves, but why should they?

For what AWS provides, the DIY approach would be insanely expensive and wasteful. Not to mention, it would take years to build a basic MVP. They'd have to scope out the project, hire people just to design it, and then pay for several contractors just to stand up the first iteration of a working system (which will still not compare to AWS in terms of resiliency, redundancy, and accessibility), and then maintain it....forever. Also, many people already know how to interact with AWS. NASA would also need to design and maintain user-access methods to the data, on top of just plugging in thousands of hard drives and making them all work nicely together.

Why reinvent the wheel when there is a perfectly good wheel manufacturer that has already proven extremely successful at what they do?

pathseeker · on March 19, 2020

>NASA is ultimately a science institution, I think diverting effort away from infrastructure management and towards studying data is likely a wise decision.

True, but once you're a certain scale, outsourcing everything just because it's not your competency isn't a good excuse. You can afford to hire enough people for it to become your competency.

tda · on March 19, 2020

Probably there is even good competition between the cloud providers, because hosting the data means that you can sell a lot of compute time to all the users of the data. NASA choosing for AWS means that any IO intensive analysis on that data will run faster/better/cheaper on AWS.

eeZah7Ux · on March 19, 2020

No, there isn't, and going for the quasi-monopolist only encourages lock-in.

tspike · on March 19, 2020

Microsoft especially has been aggressive in courting large companies away from AWS for cloud needs.

matchagaucho · on March 19, 2020

Yeah, this looks like a FUD hit job, possibly by entities made obsolete by a move to AWS.

There are just too many solutions to egress optimization to mention (CDN edge caching, rate limit, throttling, tiered discounts, multi-year agreements).

No gov procurement deal at this scale gets sticker shock from retail prices.

marrickvillain · on March 20, 2020

Totally and utterly not a FUD hit job. Just a reporter finding a document that told a story. I am that reporter. And FWIW when people push a dirty story that's clearly in their interests to have in print, I either don't write it because I won't be their cats-paw, or I mention it in the story. This was in plain sight, but I guess hadn't been read by anyone who understands cloud.

NotSammyHagar · on March 21, 2020

Thanks for the story by the way. I thought it was great.

9nGQluzmnq3M · on March 20, 2020

It's the Register, their trademark is equal-opportunity snark about everything and everybody.

btilly · on March 19, 2020

Using the AWS cost calculator is pointless, naturally an entity the size of NASA would get heavily discounted rates.

I strongly doubt this.

Amazon seems to work heavily on the principle that they charge their cost + a small margin. Which means that they can't heavily discount without going below their actual costs.

payamb · on March 19, 2020

The organisation I’m working for gets a nice %25 discount on quite a lot of AWS products

solarkraft · on March 20, 2020

> Amazon seems to work heavily on the principle that they charge their cost + a small margin

In their retail business. They finance that operation through the relatively high margin AWS.

justinclift · on March 20, 2020

> + a small margin

If by "small margin" you mean "several hundred percent", then sure.

Dunedan · on March 19, 2020

> “However, when end users download data from Earthdata Cloud, the agency, not the user, will be charged every time data is egressed.

Not necessarily, depending on how the users access the data. If users access the data through their own AWS accounts, NASA could leverage S3's "Requester Pays" feature [1], to let the user pay for downloading the data.

1: https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay...

dpcx · on March 19, 2020

I immediately thought about this as well, however I seem to recall reading somewhere (and I could be entirely wrong here) that NASA has a requirement to give away freely their science data.

jfk13 · on March 19, 2020

If there's a marginal cost for each copy of the data that's transferred to a user, I don't think asking the user to cover that cost conflicts with a requirement to "give away the data".

(If they distributed their science data in printed form, surely they'd be allowed to charge people for the cost of printing & mailing the paper copies; that's quite different from charging for the data itself.)

elcritch · on March 19, 2020

Why the downvotes? This isn't uncommon or unreasonable if you're downloading TB's of data. Also the data would be freely redistributable if someone took the data and put up a torrent. Still I'd rather see NASA host their own data. Put up an FTP server, torrent server and save a lot of money on hosting fees.

topkai22 · on March 19, 2020

While proxying through a torrent system is a good idea. I doubt it would get well seeded outside a few popular datasets- the agency would end up the sole seeder of the long tail.

I’m willing to bet NASA saves a ton of money by going to a cloud provider- US government storage setups are insanely expensive. I remember a project I was on got a quote of over $10,000/TB in 2014, and there is no way egress is actually free right now- they are paying for a government regulation compliant internet connection one way or another.

I do worry about vendor lock in to a degree, but I’m confident the agency and tax payers would save money going to any major cloud provider.

Spooky23 · on March 19, 2020

Sounds like there is a bigger story there and it's probably a managed SAN.

I've operated pretty significant government shared infrastructures like this in the past... we were offering fast, flash-cached disk in 2010 for about $5,000/TB. $10k/TB is not unreasonable for highly available Tier-1 storage for something like SAP, especially in that era where you couldn't use all flash in most case.

Today, cost structures can be very different. You can land high-iop storage for a fraction of the cost without the overhead of a big SAN. If you need capacity focused storage, that is also much cheaper.

An agency like NASA gets hosed on services, and cloud is no different. AWS is probably a net savings for operational workloads whose characteristics are known. Backup is a no-brainer. But for a high-volume, operationally highly variable thing like a public archive of data, AWS a square peg in a round hole because of the metered access.

topkai22 · on March 20, 2020

I’m sure that $10k/terabyte quote was complete overkill for what we needed- but that’s what the stove piped storage org was offering, and it killed the project we were working on.

2J0 · on March 20, 2020

I hope you can correct my numbers but I am pretty sure this is within the same decimal order of magnitude :

If 1-2TB drives were handily $1k in 2010 (2005 $1K hot you 128GB 15KRPM)

and your array set is at least R10,

already raw storage is approaching half of ten thousand dollars.

And this ignores controllers, cabling and chassis.

And this is before we look at our storage software licenses.

Is backup, point in time SLA, replication and availability in this budget?

topkai22 · on March 23, 2020

I wasn't really sure what they pitched us technically, but your pitch sounds reasonable. It was also complete overkill- we were hosting read only static images (map tiles). Azure and AWS were less than $300/TB/Year at the time, and their triple replication was more than what we needed availability wise.

solarkraft · on March 20, 2020

Maybe I'm missing vital context info here: Why didn't you go with an alternative?

topkai22 · on March 23, 2020

Because the storage group refused to sign off on a cheaper solution with lower specs (I don't know why) and acquisitions in the government is a mess so going outside would have tied up one of our primary constraints (the tech lead) more than it was worth.

The overall system ended up with worse capabilities than it should have had, but it did ship.

elcritch · on March 19, 2020

Wow! That's good to know, if a bit disheartening. I guess I was thinking costs for small startup costs with some cheap-ish linux raid setups and likely massive fiber taps NASA must surely already have. Not government/big business costs.

Aeolun · on March 19, 2020

What causes a cost of $10000/TB? Even with multiple redundant failsafes I just cannot see how the cost could run up to that.

Spooky23 · on March 19, 2020

In 2014?

You'd be buying something like an EMC vMax that can sustain 1M+ IOPS on lots of 15K spinning drives, with caching tiers on crazy expensive flash.

To support that, you need a fibre channel network layer and a bunch of FTEs to attend to it. Usually compliance requirements require segmentation of roles, which increases cost. If you're a federal government entity, those FTEs are most likely contractors billed out at $125-300/hr. Figure $3-5M/year on labor costs alone, although that may be divided out over multiple systems.

This happens in commercial business too. I had a buddy who was making about $150k in NYC to zone luns on a SAN. Basically he kept a spreadsheet and updated a specific configuration setting 2-3x a day and spent about 60-90 minutes/day doing that. The rest was waiting or studying for his MBA.

It's pretty wacky to compare S3 to this type of storage.

topkai22 · on March 20, 2020

At a technical level yes, it’s wacky. At a “this is what government departments actually do” level, it’s perfectly reasonable. I’m sure NASAs current system is actually pretty efficient as the us government goes, but having spent a career running into the sort of institutional pathologies that lead to an interdepartmental quote for $10k/terabyte, I’m willing to bet AWS is very competitive.

2J0 · on March 20, 2020

A million iops from spinning rust?

200 iops per drive from 2.5" 15KRPM is good going....

Edit:iops auto spellings

mentat · on March 19, 2020

By the way, depending on where it's hosted, S3 can seed torrents automatically: https://docs.aws.amazon.com/AmazonS3/latest/dev/S3TorrentRet...

harlanji · on March 19, 2020

Records departments always charge for copies, and that is the use I thought of immediately when I learned of Requester Pays. I’d be surprised if NASA couldn’t use it.

xxs · on March 19, 2020

Why FTP - torrent it all the way, perhaps have the AWS as nodes...

NikolaeVarius · on March 19, 2020

https://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent.ht...

dragonwriter · on March 19, 2020

> If there's a marginal cost for each copy of the data that's transferred to a user, I don't think asking the user to cover that cost conflicts with a requirement to "give away the data".

Charging the user for data, even if it is on a marginal cost basis, conflicts with a mandate to give data away freely. Because “at the marginal cost of delivery” is not “free”.

(It's true that it is common for mandates to specify something like at marginal cost of delivery rather than free—sunshine laws providing copies of public records often work that way—but that's not the applicable mandate here; in fact, since without the separate mandate here the data would be available on a marginal cost basis under FOIA, the main reason for a separate mandate is to negate that cost.)

jfk13 · on March 19, 2020

Do you have a citation for the "mandate to give data away freely"?

I found https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPD&c=2230&s=1, which mentions things like "Ensure public access...", but I don't see anything there mandating such public access to necessarily be at zero cost.

3pt14159 · on March 19, 2020

Also, public access can mean that once someone gets a copy of the data they can host it for free as well. It's not as if it's under a commercial license.

dylan604 · on March 19, 2020

While the data is free, the cost of getting the data to you can be charged. Originally, it was to cover the expense of someone pulling the data, making copies, and then mailing that data out to you. If it was photographic, you'd be charged for the prints. I'd see using Requester Pays in the same vein. They are not charging you for the data, but any fees incurred to obtain the data would be at your expense.

teruakohatu · on March 19, 2020

Isn't requestor pays just like I pay for gas to drive to my local library, when I can't bike because I want to borrow so many books, but the books are free to loan.

SteveNuts · on March 19, 2020

It's more like we both have a library, the books are free, but if I want to take some of your books I have to pay for shipping.

NikolaeVarius · on March 19, 2020

I'm pretty sure its like when I buy a book, and than I pay for it.

TallGuyShort · on March 19, 2020

It's required to be public domain. IMO it's comparable to FOIA requests still requiring the requester to attach a stamp to the envelope their request goes in. Or at most, include a self-addressed stamped envelope too.

Requiring you to pay S3 is little different than requiring you to have Internet access, and thus pay whichever company includes you in THAT monopoly, IMO.

macintux · on March 19, 2020

To me it feels very different.

Imagine for a moment that in order to access NASA data sets you had to have a Fastmail email account. Gmail won't work, Outlook won't work, it has to be Fastmail alone.

That would be very objectionable (as much as I adore Fastmail).

Ability to pay one specific cloud provider should not be a gate for public domain government data.

somethingwitty1 · on March 19, 2020

I don't think this analogy works. For Fastmail, there is a cost regardless of whether you want to access government data. You have to pay for the account itself. For most cloud providers, there is zero cost for having an account. Even if they hosted this themselves, they could just as likely charge for data transfer costs...and get to choose how to collect that. They could choose PayPal and you have to create an account. Or they take credit cards...and you must have a card belonging to one of the networks they support. The barrier to entry doesn't change regardless of how many cloud providers there are, all it does is increase infrastructure costs unnecessarily.

TallGuyShort · on March 19, 2020

The alternative here, though, to get comparable distribution / durability, etc. by spending way more of the public's money upfront regardless of who wanted it. I get the purist / idealistic argument here, but it feels a bit like cutting off one's nose to spite their face.

wikiman · on March 19, 2020

I'm not an expert, but most government agencies are allowed to charge reasonable fees for access to their data. I don't know if this qualifies, but it at least seems like a possibility, especially if it's transparently just passing along their costs in the form of AWS' own cost structure

advisedwang · on March 19, 2020

This then requires that everyone have an AWS account and billing relationship with Amazon to access public data.

Bedon292 · on March 19, 2020

I wonder if there is a problem with this because it requires you to have an Amazon account and such to do it. There is now a much higher barrier of entry for random people to access small amounts of data. And no longer have direct http links. You have to use the CLI / SDKs once requester pays is on there.

angry_octet · on March 19, 2020

And this would be an even worse outcome.

ben509 · on March 19, 2020

angry_octet · on March 20, 2020

Because it allows the agency to escape from its bad design problems by pushing the (huge) cost onto its clients -- and those clients are other parts of the US Govt or funded by the US Govt.

ben509 · on March 20, 2020

You're asserting the design is flawed when that's in dispute.

It's useful for those agencies' budgets to reflect a portion of the cost of performing that research.

The USG needs insight into what taxpayer dollars are being spent on. Lawmakers have to explain to constituents why that money is being spent.

NASA is the first tier of information, collecting the data. Its budget ought to reflect that cost.

The consuming agencies are the second tier, processing that information. Their budgets reflect the cost of gathering their information and of processing it.

NASA doesn't know which information will be useful, so it's not helpful for them to pay the cost of egress. We want them to collect as much as possible.

It's much like a music store, 90% of their sales come from the top 10, but there's a lot of value in hosting obscure stuff.

If they have to pay to store it all rather than pay for egress, they'd have to justify the cost storing data that they can only say "it might be useful some time."

Having the agencies that are working with the data pay for the egress, they can justify the cost by showing the specific work they do.

angry_octet · on March 23, 2020

The missions are already funded on the basis that they will store and share the data.

But you're arguing for inter-agency billing as the correct way to weight scientific experiments? That isn't rational.

VikingCoder · on March 19, 2020

I'm a huge fan of requester pays, and I frankly don't understand why we haven't switched more of the internet to it.

I'm also a liberal, so then I also think government should give everyone a monthly quota of internet usage allowance. Universal Basic Internet Income, or something.

happyopossum · on March 19, 2020

I'm not saying this won't be a financial cluster - it likely will cost many times more than planned - but the headline here is just a flat-out lie.

TFA says:

"a March audit report [PDF] from NASA's Inspector General noticed EOSDIS hadn’t properly modeled what data egress charges would do to its cloudy plan."

'Hadn't properly modeled' is very different from 'forgot about'. And if you actually read the linked report, it says things like:

"ESDIS officials said they plan to educate end users on accessing data stored in the cloud, including providing tools to enable them to process the data in the cloud to avoid egress charges." and "To mitigate the challenges associated with potential high egress costs when end-users access data, ESDIS plans to monitor such access and “throttle” back access to the data"

Neither of those statements would be in the audit if the entire topic had been a surprise.

tyingq · on March 19, 2020

From that linked report...

"In addition, ESDIS has yet to determine which data sets will transition to the cloud nor has it developed cost models with the benefit of operational experience and metrics for usage and egress."

That sounds fairly close to the headline.

_pvxk · on March 19, 2020

    YOU ARE NOT AFRAID?
    'Not yet. But, er...which way to the egress, please?'
    There was a pause. Then Death said, in a puzzled voice: ISN'T THAT A FEMALE EAGLE?

I've been reading A Hat Full of Sky to my daughter these days, and there's a running joke that "supposedly intelligent people" don't know the meaning of the word "egress", mixing it up with things like egret, ogress or eagles.

(See also the inspiration for the joke: https://unrealfacts.com/pt-barnum-would-trick-people-with-a-... )

ghostpepper · on March 19, 2020

There's a joke around here somewhere about AWS pricing being too difficult even for rocket scientists.

leni536 · on March 19, 2020

Apparently AWS pricing is not rocket science

movedx · on March 19, 2020

It's The Register, people. Don't take it seriously. It's practically The Onion of the IT industry, especially the comments sections.

I've written two articles for them and the comments are a joke. They're all anti-Cloud, anti-progressive. Try selling them Kubernetes has a solution to their problems: they'll think you've come to steal their children. I know, I've tried.

In short: this never happened. NASA didn't forget anything. It does, however, make for a great eye catching headline!

Sorry to be bitter about this, but publications like The Register serve little purpose these days. It caters to a specific kind of IT personality that can't let go of their physical tin and they think public Cloud has no place or use at all. Again I know, I've tried convincing these people of such things.

mturmon · on March 20, 2020

Smartest comment so far in the thread. The issue of cloud egress has been known and worked at NASA for a decade now, and the article treats it like an OMG moment.

Historically, data have been stored and processed on-premise but NASA has been migrating data and processing to the cloud where it makes sense. For instance, it makes a lot of sense to burst out to the cloud for near-real-time processing during and just after natural disasters like earthquakes and forest fires.

The large missions they mention (SWOT, NISAR - big radars in Earth orbit) are drivers of the shift of more processing + data to the cloud, because they will generate an unprecedented amount of data. They are pathfinders. By percentage, very little of that data will ever egress - it's low-level and uncalibrated - so a cached strategy could be valuable.

Here are some slides giving background on the SWOT/NISAR data system. They are from 2017, so more has happened in the meantime, but they touch on some of these issues:

https://smd-prod.s3.amazonaws.com/science-red/s3fs-public/at...

Regarding the step function in data volume, see the humorous slide #4.

pixelbath · on March 19, 2020

Unless my numbers are way off, I got around $15.5 million per year using Backblaze's calculator: https://www.backblaze.com/b2/cloud-storage-pricing.html

Numbers used:

  Initial upload:   258998272 GB (1024*1024*247)
  Monthly upload:   100 GB (default)
  Monthly delete:   5 GB (default)
  Monthly download: 1048576 GB (1 PB)

  Period of Time:   12 months (default)

adtac · on March 19, 2020

It'll take 215,000 years to reach 247 petabytes if you averaged 100 GB of upload a month.

kylebarron · on March 19, 2020

The initial upload is 247 petabytes

bhandziuk · on March 19, 2020

I think they're saying NASA would add ~100GB of new data to this dataset every month.

adtac · on March 19, 2020

I know. And I'm saying if that was the rate they've historically added data to their dataset, it would've taken them 200,000+ years to get here. Which is why 100GB/mo is virtually nothing for NASA -- it doesn't match with their historical throughput.

bhandziuk · on March 20, 2020

I see what you're saying. Yeah, I agree.

pixelbath · on March 20, 2020

That's the default value on the calculator, but all that does is prorate the storage demand over the period. Backblaze is entirely usage-based (speaking from experience; I'm a customer).

ackbar03 · on March 19, 2020

Oh but aws didn't forget. Aws never forgets

TomMckenny · on March 19, 2020

Judging from the random yet inexplicable 42 cent bill on a free account I set up years ago, I'd say their memory is positively unbelievable.

justinclift · on March 20, 2020

They've owed me money (just a few $) for several years now, in my Amazon Seller account.

And they've promised to pay me, something like 20 odd times. With a specific date each time.

Which is never paid.

And ever time I ask what happened, the customer service person says they'll look into it, and never get back to me.

Most recently, they've sent an email saying they're closing my account due to lack of activity. No word on what'll happen to the funds, my expectation is they'll just steal the money for themselves.

"Lack of activity"... yeah, no kidding.. People tend not to use a service when the other party is obviously full of shit and repeatedly lies. :(

As for "AWS never forgets", sure. That goes both ways.

gonzo41 · on March 19, 2020

This is kinda bad press for AWS. If I were NASA I'd be shitty about the relationship manager not hinting and trying to help architect for lowest cost.

NikolaeVarius · on March 19, 2020

Since when the hell does NASA actually care about bad press regarding costs these days.

badwolf · on March 19, 2020

NASA is spending over $1B for a launch pad that will be used no more than 4 times.

https://spacenews.com/report-finds-delays-and-cost-overruns-...

whatshisface · on March 19, 2020

How much would a launch pad that will be used four times normally cost for what they're planning to launch? Without knowing that I can't say if they overpaid 10x, 2x, got it exactly right or got an amazing bargain.

yborg · on March 19, 2020

In 1965, the Vertical Assembly Building, which was at that time the largest enclosed volume in the world, cost $117M (on a $23.5M original construction contract). That would be about a billion dollars in 2020, but it was completed in 3 years and was used to stack 13 Saturn Vs. It was later used for the 100+ Shuttle missions as well, but there were additional costs to modify the building for this purpose. The VAB is still planned for use for future missions.

https://www.popsci.com/blog-network/vintage-space/nasas-vab-...

dwighttk · on March 19, 2020

so how many of those things were launched from the VAB?

yborg · on March 20, 2020

I picked the VAB because it's current dollar cost was roughly a billion dollars.

Total cost for constructing Launch Complex 39, which includes the VAB and the crawler-transporter launchers was estimated 1t $500M in 1962 for 2 pads. A total of 153 launches have occurred from LC-39. This number is greater than 4.

johnmaguire · on March 19, 2020

NASA or AWS? Parent said AWS.

dwighttk · on March 19, 2020

like pepperidge farm

NikolaeVarius · on March 19, 2020

Senator Shelby should get AWS to launch a new region in Alabama for NASA at this rate.

OzzyB · on March 19, 2020

Looks like even the big boys get bitten by the Cloud Meme when forgetting about bandwidth costs; glad I'm not the only one.

7777fps · on March 19, 2020

I assume the data accessed is a heavily skewed pareto distribution.

Given that, it's maybe still cheaper to build their own serving / caching layer in front to save egress costs than to have constructed the whole storage solution themselves.

vidarh · on March 19, 2020

Putting a caching layer in front of AWS is often very cost effective even without much skew in the access pattern. It tends to take a very low hit rate before it pays for itself.

knorker · on March 19, 2020

This surely was entirely known to AWS, where they were rubbing their hands at the fact that every user of this data has to process it using EC2 on site.

This is Cloud lock-in using data location.

tehalex · on March 19, 2020

I wonder if this includes or if they can use Direct Connect? [1]

Cloud data transfers are too expensive, personally I assume that it costs more to measure and bill for bandwidth than the usage itself...

1: https://aws.amazon.com/directconnect/

angry_octet · on March 19, 2020

They could use direct connect, from each of their data centres, essentially turning AWS into a giant NAS. However this gives up the idea of using AWS compute to provide value added analysis.

toomuchtodo · on March 19, 2020

Cue the cloud apologists that “it’s better to use the cloud than to build and manage your own infra”.

This is why you build and run your own storage, similar to Backblaze (who is almost entirely bootstrapped except for one reasonable round of investment).

ben509 · on March 19, 2020

To cloud or not to cloud is the same as any outsourcing decision.

For many operations, you may get to a point where it makes sense to build your own cloud.

If you're a seller, you might also get to a point where you want to sell goods directly.

It partly depends on your core expertise, meaning, is this part of how your business creates value? If NASA doesn't want be a datacenter provider, they should continue to outsource it.

It also depends on whether their business model aligns with yours. AWS's egress rules specifically work when you are getting revenue from the data being downloaded. If you're selling software or other media, and you can factor the cost of downloads into the price of it, pay-for-egress is very sustainable.

Other models like pay-for-capacity don't align as well if you want to maintain a large library of media and people are attracted by the variety, but only download the popular stuff.

For NASA, pay-for-egress may be entirely justified if their budget is based on usage of the data. Or if they can simply use "requester pays" to mitigate the cost.

Karunamon · on March 19, 2020

Cue the cloud detractors that "a failure to do due diligence (in this case: 15 minutes on the pricing calculator) on your computing platform should be held against the whole platform".

Snark aside, it entirely depends on what you're doing. AWS probably has better engineers, better processes, and more of them than your company.

toomuchtodo · on March 19, 2020

Due diligence only somewhat mitigates the damage done by having a generation of engineers who believe going straight to AWS or another expensive cloud provider is the first and or best course of action, when you have engineers scoff at building a cheaper, more efficient solution better fit for purpose. Backblaze proves it can be done, and I argue they are just as competent, if not more, than Amazon. They’ve provided a similar object storage system as S3 at a drastically lower cost.

In most scenarios, it’s not my money, and I don’t care if it’s not my money. In this case, as a taxpayer, it’s my money (our money to be specific) and I care. I intend to contact my representatives about this failure, and have already fired off a FOIA request for AWS NASA contract details.

falcolas · on March 19, 2020

None of which will really help you, since AWS priority is AWS, not the uptime of your business. And no number of those better engineers or processes have prevented downtime and service interruptions on AWS.

unethical_ban · on March 19, 2020

Oh, man.

Better run your own Internet, after all, you care more about connectivity to your friends than your ISP does!

Dogmatism is passé. There are good uses for cloud, and good times for on-premise, depending on what you need, what your skillsets are as an organization, the kinds of workloads and length of time required for that workload.

AWS and others have absolutely outstanding amounts of infrastructure and tooling. Their reliability is off the charts in the past few years, and (once it actually gets figured out by your engineers) the cloud concept of IAM is incredibly secure.

There are pitfalls - cost, up-front complexity and several other things - but I no longer rag on "the cloud".

toomuchtodo · on March 19, 2020

Amazon has outages all the time, hidden on their status board with a green triangle, and you still lose S3 objects once you’re operating at a large enough scale.

A quick google search for “amazon outages” lists the numerous extended outages they’ve experienced.

Karunamon · on March 19, 2020

How many of those outages were multi-region and would have taken down a properly distributed application? How many outages and instances of lost data would the average enterprise, likely without their own datacenters, redundant power, hardware staff, etc have taken in the same period?

toomuchtodo · on March 19, 2020

Most applications will never be architected to be “properly distributed” because of cost. Many popular web properties (Reddit) still have outages on AWS even when architected properly. Netflix still distributes content from their own CDN with their OpenConnect appliances, and only uses AWS for non streaming use cases (jedberg will correct me on both Netflix and Reddit points if I'm missing something and comes across this comment).

https://www.usatoday.com/story/tech/news/2017/02/28/amazons-...

If my app is architected for reliability, I’ll run it on bare metal and keep the costs savings. Why pay twice by building it for cloud durability and running it on expensive cloud resources? Clearly the AWS marketing is working (“you’re just building it wrong”).

We’ll see what happens when CFOs take the reins from CTOs and CIOs and start putting cost controls in place during this recession (“why exactly are we paying so much in opex when this could be capex we can depreciate?”).

Karunamon · on March 19, 2020

Ok, so we replace a lot of opex with a little capex and a lot more opex. You only need devops types if your business runs on a cloud provider, now you need to employ facilities, sysadmins, security, etc. It's not just the cost of the hardware we're talking about, your labor budget will necessarily increase as well.

falcolas · on March 19, 2020

On a tangent from the sibling comments (which are spot on), colocation does exist. They handle the network drops, power, security, cooling, and you just have to ship them servers. Before AWS, this is how most businesses ran (including Amazon).

Few businesses ever get to the point where they need to run their own datacenter. And when they do, the costs would be roughly even or lower to AWS due AWS' markup (for handling those DC-related things for you, plus profit).

toomuchtodo · on March 19, 2020

Devops types are sysadmins that cost more for mostly the same skillset (you know cloud primitives, you know infra as code, you know some python/bash or powershell depending on the underlying OS). Facilities, security, etc are usually covered by your hardware hosting provider, or colocation provider. Still a lower cost than cloud. You are still paying similar labor costs regardless if you're in the cloud or have your own metal.

Disclaimer: Previously a devops/infra guy, before that ops/networking/sysadmin, built out colo facilities/datacenters/hosting companies before cloud. Have done a lot of cost models for storage and compute, still do on the side.

Karunamon · on March 19, 2020

So who takes care of the non-development tasks that AWS (or any cloud provider, really) is handling on the backend? Schlepping the hardware around, swapping failing drives, hardware monitoring, actually speccing out and running a datacenter, physical security, and so forth?

It's generally not the same people who are going to be at their computers running awscli (or if it is, now we get to figure in how much time they're spending on tasks that are not their primary job and how many extra of them we get to hire to maintain the same velocity, not to mention the occasional bit of firefighting you get to do when you manage your own infra)

pas · on March 19, 2020

I thought the ultimate argument was that if you're big enough AWS will make you a deal. But maybe now AWS is just so big and already growing so fast, they don't want to make exceptions and lower their profitability.

belval · on March 19, 2020

They got a 50% deal. From the article:

"At least NASA seems to have bagged a good deal from AWS: The Register used Amazon’s cloudy cost calculator to tot up the cost of storing 247PB in the cloud giant’s S3 service. The promised pay-as-you-go price for us on the street was a staggering $5,439,526.92 per month, not taking into account the free tier discount of 12 cents. The audit, meanwhile, suggests an increased cloud spend of around $30m a year by 2025, on top of NASA’s $65m-per-year deal with AWS."

jleahy · on March 19, 2020

$5.4m/mo * 12 mo/yr = $65m/yr. My guess is the "$65m/year deal with AWS" is actually the S3 cost and the extra $30m/year of 'increased cloud spend' is the egress costs found by the audit. Otherwise it's a coincidence of the numbers.

yosito · on March 20, 2020

> You don't need to be a rocket scientist to learn about and understand data egress costs. Which left The Register wondering how an agency capable of sending stuff into orbit or making marvelously long-lived Mars rovers could also make such a dumb mistake.

I used to work very closely with this department at NASA. Without saying too much, the short answer is "tenured government employees more concerned about job security than the success of the project" is how an agency could make such dumb mistakes.

jka · on March 19, 2020

What's the opposite of AWS Snowmobile[0]?

[0] - https://aws.amazon.com/snowmobile/

chickenpotpie · on March 19, 2020

Downloading no data extremely fast

Spooky23 · on March 19, 2020

Using AWS for this type of use case is dumb for an org as large as NASA, if cost savings is a goal. It's cheaper to just land capacity at a datacenter.

toyg · on March 19, 2020

I guess they have additional legal constraints that don’t allow them to just “land space” here or there - the vendor must probably be security-vetted, compliant to a hundred government-produced checklists, and willing to go through extra-long sales and support cycles. It will inevitably push up prices significantly.

In fact, I can imagine ops-teams at Nasa licking their lips at the idea of doing away with a lot of that bureaucracy once they switch to AWS... note how the report mentions that some of the controllers are actual sponsors of the move: it’s obviously a conflict of interest, but it might well arise when the org as a whole is a bit too happy to steer away from a suboptimal situation.

This said, AWS will rob them blind, simply because they can. Like all outsourcers (which is effectively what they are), they get in with the simplicity argument, then boil that frog up with extra charges. It’s good that somebody pointed out one of those charges, but I doubt anything will change substantially- Amazon will probably cut them a discount and that will be it. And once you’re invested in a cloud env to the tune of hundreds of petabytes, you’ll likely not switch away for decades.

Karunamon · on March 19, 2020

>..then boil that frog up with extra charges.

That implies a level of dishonesty or nontransparency that AWS doesn't have. Their pricing is disclosed, up front, and they offer a calculator to model your costs out. Knowing how much data egress you're going to have is not some arcane art, NASA just plain forgot to do it.

It may be complicated, but so is any workload at this size. Figuring the cost is part of due dilligence, and they've made it as straightforward as possible.

toyg · on March 19, 2020

> That implies a level of dishonesty or nontransparency that AWS doesn't have.

Have you ever been part to an enterprise-level sales cycle? Things like the official calculator are waved away, since the customer is on a special deal, so "of course is not as much as that!". The customer asks for a quote with a certain degree of detail, the vendor provides an answer with the degree of accuracy required to get them in the door. If it turns out after a year that the customer ended up paying 2x, well, too bad - clearly they must have had higher requirements than forecasted! "Did you record all your traffic? No? Well, we did, and the result is this bill, sorry. Alright, alright, I hear your complaint, I tell you what - I'll give you a big discount on your next order, what about that?" Rinse, repeat. This is not dishonesty and I'm not alleging malfeasance or anything like that, it's just how that world works in my experience.

In order to figure out the real cost of outsourcing, you need an adversarial attitude that most shops simply lack, because they've fundamentally made the choice to abandon the previous solution even before they've entered the sales cycle. This is particularly clear in a case where some controller is also part of the group promoting the switch. It's surprising it was flagged up, there must be a competing group somewhere that is desperately trying to fight on - maybe some Oracle-friendly "japanese in the jungle" or something. Or maybe bureaucratic procedures to safeguard the institution are actually working as they should, for once, but that would be pretty exceptional in itself.

Spooky23 · on March 19, 2020

That's a half-truth.

All of the cloud vendors de-empathize network egress costs. It's similar to products that depend on Microsoft licensing who will always omit those types of costs. (Oh, so you needed to spend another $500k in SQL Server Enterprise?)

Many organizations lack the operational metrics to allow them to effectively measure their egress needs. And AWS/GCP/MS salesmen arent in the business of slowing down deals with awkward questions.

This is especially true where an org like NASA probably contracts out things like network services. Going from a model where you make fixed capital investments to paying for the byte is difficult to measure.

Karunamon · on March 19, 2020

I'm not sure what you mean by "de-empathize".

Here's the official pricing calculator[1] - note that ingress and egress costs are included in all relevant services. Also note that for something like S3 (which is probably what the article mentions the "earthdata cloud" is based on), the pricing details are right there on the description page[2].

There is no evidence of any malfeasance by AWS here, just lots of casting aspersions. What specifically do you want that was not provided?

[1]: https://calculator.s3.amazonaws.com/index.html

[2]: https://aws.amazon.com/s3/pricing/

julienchastang · on March 19, 2020

This article is misleading. The entire point is to not move data out of the cloud. Instead bring your computing (analysis, visualization) to the data and pay for compute cycles on AWS. If your workflows are short/bursty, you will come out ahead. Moreover, you will be able to do big data-style computations that you cannot do in a local computing environment. This is bad journalism, IMO.

_ugfj · on March 19, 2020

If you are facing similar problems you should know traffic via Cloudflare from B2 is free. I am not 100% CF would be happy if NASA picked the CF free tier but probably their quote would be magnitudes lower than Amazon's.

X6S1x6Okd1st · on March 19, 2020

> NASA also knows that a torrent of petabytes is on the way.

Oh that sounds like a potential solution.

/s

gigatexal · on March 19, 2020

might be cheaper to spin up virtual workstations on AWS and use the data there

julienchastang · on March 19, 2020

Exactly. Move your computation to the data instead of the other way around. At that point, there are many ways to keep costs down such as using spot instances and tearing down VMs when your analysis is over.

gigatexal · on March 19, 2020

And you get to rent the latest hardware than use likely old machines ... I mean you use the existing machines as dumb terminals but still

Havoc · on March 19, 2020

Can't they just use the current DAACs as a caching layer? Seems like the least ugly way out of this mess.

Also - can't they use torrent tech? I wouldn't mind helping out a bit on space & data

CKN23-ARIN · on March 20, 2020

Putting a dataset into AWS is a lot like putting a satellite into orbit. You still need to pay later to get it down, or to safely destroy it.

Wheaties466 · on March 19, 2020

at that point why not just use a P2P based system.

szczepano · on March 19, 2020

To sum up no matter how big the hard drives or data center we produce we will always have problem with storage capacity.

pontifier · on March 20, 2020

Cloud egress costs killed the business I'm now trying to save. I won't fall into that trap.

ralusek · on March 19, 2020

I wonder why they wouldn't use Wasabi:

https://wasabi.com/cloud-storage-pricing/

Looks like egress is free.

Maybe because it's comparably untested? Does anyone here have any experience with it?

alexfromapex · on March 19, 2020

Probably need assurances or regulatory solutions that only a cloud giant like AWS could address

Eikon · on March 19, 2020

I wouldn't rely on that.

    Wasabi does not charge for egress but our pricing model is not suitable for use cases involving the hosting of videos in a manner where the ratio of egress downloads exceeds the amount of storage.

https://wasabi-support.zendesk.com/hc/en-us/articles/3600004...

api · on March 19, 2020

This is exactly why the costs are set up that way. The first time I saw AWS pricing I chuckled and thought "roach motel." Data goes in but it doesn't come out. Its one of many soft lock in mechanisms cloud hosts use.

tzm · on March 19, 2020

$5,439,526.92 per month

jdelman · on March 19, 2020

Requester pays!

Mave83 · on March 19, 2020

just build your own storage and save an incredible amount.

It's hard you might think, but it's not. croit.io provides all you need to deploy a scalable cluster even on multiple geographic regions.

Price for 1 PB sized cluster including everything from rack to hardware to license to labor for below 3€/TB/Month or at the Amazon Glacier price tag but with the S3-IA access.

driverdan · on March 19, 2020

Are you seriously suggesting that NASA didn't consider alternatives, like their current self-hosted solutions?

GordonS · on March 19, 2020

Given they "forgot" about egress bandwidth costs, I think the parent's comment was fair.

mister_hn · on March 19, 2020

What about maintenance? Some forget about that... Broken drives, broken RAID, broken NAS.

A 120TB SSD NAS might cost over 200k€ ..imagine a 250PB one

dna_polymerase · on March 19, 2020

Do you want to add your contact details to your post so NASA can get in touch or what is going on here. Add a little disclaimer that you work for/are croit.io so people can instantaneously see why you would argue for the space agency of the U.S. to run their own data storage.

RandomTisk · on March 19, 2020

Seems like a poor choice. If they're getting an incredible deal with AWS, then fine, but I would be utterly shocked if most seasoned and competent IT professionals couldn't design and build a multi-region storage array for far less than Amazon will charge them.

oh_hello · on March 19, 2020

"The audit, meanwhile, suggests an increased cloud spend of around $30m a year by 2025"

Isn't this a rounding error for NASA?

mensetmanusman · on March 19, 2020

This seems like a good use of torrenting?

maerF0x0 · on March 19, 2020

It looks like they were aware of Bit torrent as recent as Oct 2011

https://web.archive.org/web/20111024223108/https://visibleea...

caymanjim · on March 19, 2020

Torrents are only helpful when there's a large number of people who download the data and are willing to share it. There's not a large userbase for the vast majority of NASA data. It wouldn't be distributed in any meaningful way.

mensetmanusman · on March 22, 2020

Maybe various world governments could bandwidth to accomplish these types of missions.

beastman82 · on March 19, 2020

Torrent FTW

vnchr · on March 19, 2020

Cloud VERSUS Space. Who will come out on top?

ph2082 · on March 19, 2020

1 Terabyte of hard disk cost ~50USD.

247 Petabyte ~ 247000 Terabyte > 50000 USD.

Network cards, bandwidth, electricity cost > I can't guess.

Couple of good engineers (hardware and software ones), which they definitely have.

May be they could have built their own cloud in < ~10-15 million USD. And that won't be recurring cost.

May be they missed article about Bank of America saving ~2 Billion USD, by building their own cloud.

supdatecron · on March 19, 2020

Your numbers are way off, as you didn't account for redundancy of the drives (any failure or bit flips of 1 of those 2,470 drives will cause corruption of likely the entire data set).

> Network cards, bandwidth, electricity cost > I can't guess.

This is where a huge amount of cost is.

> And that won't be recurring cost.

Maintenance, humans, cooling, drive replacements, property, building, land tax, payroll tax are all recurring costs.

ph2082 · on March 19, 2020

> Your numbers are way off, as you didn't account for redundancy of the drives (any failure or bit flips of 1 of those 2,470 drives will cause corruption of likely the entire data set).

Let take another setup of same count as backup. Then another setup as back up of back up. ~150K

> This is where a huge amount of cost is.

Maintenance, humans, cooling, drive replacements cost > can't be greater than first time set up cost.

> property, building, land tax, payroll tax

Nasa runs on Government budget, I am sure they can claim some tax break there.

The point I am trying to make is, it may be cheaper to do in-house with the level of engineering talent they have.

sitkack · on March 19, 2020

The government should be running its own object store. And by government, I mean coordinated by Internet2/NSF with federation across all member orgs.

https://en.wikipedia.org/wiki/Internet2

Use backblaze pods, demand off peak bandwidth of gilded age megacorps that own said fiber for sync/replication.

https://www.backblaze.com/b2/storage-pod.html 480TB/4U

Have 3x sites around the US the build the pods, each new pod gets preloaded with a smattering of rarely requested and low replication count objects (as a redundant backup). Then shipped to the site where it will be used. Local writes go directly to pods which are then kept in sync with the rest of the cluster.

edit, from the TFA

``` And to put a cherry on top, the report found the project's organizers didn't consult widely enough, didn't follow NIST data integrity standards, and didn't look for savings properly during internal reviews, in part because half of the review team worked on the project itself. ```

Neil44 · on March 19, 2020

His numbers can be out by many mulitples and still beat AWS's 5 Mil a month with no egress.

SEJeff · on March 19, 2020

You realize that the entire openstack project came from the opensourcing of NASAs opennebula project, right? They've got one of the biggest infiniband networks in the world underpinning it.

ph2082 · on March 19, 2020

I didn't knew that. Thank you for telling.

Now I am more curious why go along with AWS instead of using Openstack. Need to find some case study of openstack vs rest of cloud provider.

duskwuff · on March 19, 2020

Because OpenStack is a piece of software, not a provider. And it's instructive to consider why none of the major cloud providers use it...

rmrfstar · on March 19, 2020

In addition to saving money, they will also make the US more resilient by helping avoid a concentration of expertise and an infrastructure mono-culture.

I suspect that ideas like this will become more popular as the US asks itself "what happened to our resilience?"

[1] https://en.wikipedia.org/wiki/Self-Reliance