S3 != a data lake. If you're able to solve the problems that you were previously...

threeseed · on Feb 24, 2020

Not sure what you are talking about.

If you have S3 you can use Athena, Redshift Spectrum or Spark as query layer. It's not 22 technologies.

You don't need ElasticSearch, Ranger, Knox, Zookeeper etc as they have nothing to do with querying.

dx034 · on Feb 24, 2020

But then it's far from basically free. Even overpriced Oracle databases can end up cheaper than locking into AWS in these cases (my experience).

derefr · on Feb 24, 2020

I think the presumption that's differing here is query workload.

An OLAP database is, in the default case, an always-online instance or cluster, costing fixed monthly OpEx.

Whereas, if your goal in having that database is to do one query once a month based on a huge amount of data, then it will certainly be cheaper to have an analytical pipeline that is "offline" except when that query is running, with only the OLTP stage (something ingesting into S3; maybe even customers writing directly to your S3 bucket at their own Requester-Pays expense) online.

billman · on Feb 24, 2020

My biggest problem with Oracle is not the database itself. There is no doubt that Oracle is a fine piece of software, and is bullet proof, and has decades of experience built into it.

My problem is the scalability and elasticity of it's licensing model. It doesn't meet the needs of today's analytics without spending enormous amounts of money up front.

cjalmeida · on Feb 24, 2020

Nope. One can start easily with Airflow+Spark(ERM)+Presto+S3 and get about 80% what'd get from your run of the mill Oracle database. At a fraction of the price, without half the headache in procurement, licensing or performance tweaking. And better scalability.

You'd be looking at $M in licenses for anything half-serious based in Oracle tech. Becoming good at replacing Oracle stuff probably has been one of the best paying jobs for a while.