I don't like the way postgres doesn't have the ability to switch query plans mid query.
Frequently a query plan turns out to perform nowhere near as well as the planner expects (for example, because the data distribution is poor, or a key being filtered for doesn't exist). In those cases, flipping the query plan around could turn a 1 hour query into a 1 millisecond query. Yet postgres doesn't have the ability to do that.
Sure a human can sometimes use domain knowledge to sometimes be able to rearrange the query to force a plan that works well, but in the general case, the database shouldn't have performance drop by a factor of 1 million because of semi-arbitrary planner decisions. Being able to try multiple possible plans would be a massive start in solving the issue.
For anyone interested in implementing this, here is what I found last time...
Postgres queries are streamed to the client - ie. some results are delivered before the query is done running. That functionality is necessary for really big resultsets.
That makes it difficult to change plans mid query, because your new plan might return results in a different order, and you need to filter any already-returned results, but you can't afford to keep all of the already-returned results in RAM. Even if that weren't an issue, I'm not even sure that it's always valid to do this.
To add even more complexity... The postgres protocol allows the client to reverse the query (ie. midway through getting the results, the client can say "yo, go back, and return results from earlier again"). It must return the same results in reverse order. That means if you do switch query plans, when the client goes backwards, the server needs to un-switch query plans back to the old plan when going backwards.
These issues are not insurmountable... But they certainly stopped me implementing it in the day I had set aside for the task...
I would love to be able to SET multiple_planning; on a cursor to be able to say “yes I know that reversing would be UB, and I have set a limit so I don’t care if results aren’t streamed, so please just try everything you can.” In general I wish there were better ways to tell these systems you know what you’re doing.
That effictively means the client cannot actually start processing the results before everything is received. Because you would have to undo everything you did with the data you received before. So basically this would simply be a non streaming version.
The decisions a database makes aren't semi-arbitrarily: they are completely based on your statistics. Usually wrong decisions are caused by outdated statistics.
The statistics cant accurately represent most data distributions, and errors stack multiplicativly with deep query plans. Typical query plans, even with fresh statistics, are lucky to be within a factor of 10, or even 100 of the true cost of running the query.
Unfortunately the benchmarks I ran to get that knowledge aren't public. They were run on a ~1TB database of the production data of a mid size web company. They spanned thousands of different queries. They were trying to answer the question "is our business better off to have out of date statistics or fresh statistics", and out conclusion was that fresh statistics were too risky because they can cause a query plan to change unexpectedly in the middle of the night, and the whole database to fall over from resource exhaustion as some common query now takes hours to complete. The companies most costly downtime had already been caused by that once.
We instead generated statistics on a backup copy, then loadtested that the statistics worked acceptably, and then wrote those statistics to the production database. We did that every 3 months, because we found that outdated statistics didn't really have any appreciable performance impact, and performance degradation was gradual rather than a cliff.
A tool like would be a wonderful addition to the PostgreSQL ecosystem!
That being said, this tool won't work well for anything but trivial queries. PostgreSQL query plans have many quirks around CTEs, Loops, etc. that cause problems when trying to determine the true inclusive/exclusive time for each node without forgetting stuff or counting it twice. The only tool that tackles them fairly in my experiences it the good old https://explain.depesz.com/ using it's own Pg--Explain library [1].
I'm currently working on my own version of a tool like the one presented by OP (called FlameExplain), and hope to release it soon.
I love this tool! But generally speaking, the user experience of the PostgreSQL admin workflow could be improved so much.
The psql client binary should include visualizations like this even in text mode, so admins don't have to follow a multitude of steps as described in the repo:
1) run query, store results in .json file
2) scp .json file to your dev machine
3) run visualization tool
If I already have a psql client shell open, why can't this all be done in the background? I think there is still a lot of potential for improvement.
If you run the psql client from your dev machine, there’s no need for step 2. Also, in practice I don’t save the output JSON to a file, I just pipe it directly to pg_flame. The README breaks it up into multiple steps, but maybe I could make it clear that it’s not necessary.
But in general I do agree that simplifying this type of tooling is a good thing and something to strive for.
Had this been implemented as a web service, half of the crowd would be asking for it to be an open-source program, so they could run it locally, for speed and flexibility and security. It was implemented as an open-source program, so half of the crowd are asking for it to be a web service for usability and convenience and aesthetics.
What I'm seeing is we've got two big platforms (web, CLI) and they both have some distinct advantages and some distinct disadvantages, and it's not easy for either one to cross the chasm and compete with the other one directly. We desperately need a new platform which combines what we like about both of these, and discards what we don't.
Until then, we're just going to keep implementing all end-user functionality twice, because by historical accident developers do most of their work in a DEC VT100 emulator, and end-users won't tolerate that.
> It was implemented as an open-source program, so half of the crowd are asking for it to be a web service for usability and convenience and aesthetics
I’ll prefer that any time of day. At least, this way someone can easily turn it into a web service. It generates HTML afaics.
A good number of github projects I find say "Install with npm install foo, or try it out online [here]"
The online version can host the exact same code from git master. For many projects, no hosting is even required, because the whole thing can run in a codepen-like playground.
I built it as a CLI because of the reasons you mention, but also because I find that workflow most convenient for me and I didn’t want to need an internet connection.
Yes, some tools such as the query analyzer by depesz [1] work by copy-pasting the EXPALIN ANALYZE output into your browser.
But IMHO the overall friction is still too large for such a central use case as query optimization done by a database admin.
The PostgreSQL team has been innovating and improving steadily, so I am confident these kind of workflows will be heavily optimized within the next couple of years.
that tool is a must if you plan improve, depesz (who is here on HN) also has good blog about postgres, he describes new upcoming features.
He also wrote a blog series about understanding explain plans: https://www.depesz.com/tag/unexplainable/ his explain tool + that information is really good starting point when trying to optimize queries.
I never understood the obsession with the graphical display of execution plans (neither in Postgres nor in Oracle).
I find the text output much more useful and detailed then any graphical display. Especially when generated with "explain (analyze, buffers, timing) ..." and "track_io_timing" turned on.
I think the text display is great too! I built pg_flame mostly to help understand the relative timing of each step. Your brain can compare the size of each bar in a flamegraph virtually instantly, while comparing a bunch of actual time numbers scattered throughout the text output takes some time.
A really great feature that the Oracle database has is ability to provide hints to the planner. Is there anything that prevents PostgreSQL from adding this feature?
edit: Googled it, looks like PostgreSQL maintainers just have an opinion that hints are bad and the planner is good enough:
While it's easy to nod along with their justification for no hints the practical reality is that if the query planner gets it wrong you can have query time an order of magnitude slower. This is particular obvious when using gin indexes with `like`.
On the application I'm currently working on the difference between the two indexes is night and day, the gin index will respond in 100ms, whilst the btree index can be 15secs+. We've resorted to having two columns with the same content, one with a btree index and one with a gin index so that we can explicitly choose which index to hit.
It's unfortunate that it comes off as if maintainers think planner is good enough, because that is demonstrably not true. And from my discussions there is wide agreement that the planner will never be perfect. But agreeing that it's something that needs work is not enough, someone actually has to do the work too. Reality is that implementing any feature to PostgreSQL quality standards is hard work.
I’m reminded of the MySQL developers in the 90s decrying foreign keys, transactions etc as unnecessary overheads, only to belatedly add them later. Hints are one thing I miss coming from Oracle - even being able to hint if you want the first row fastest or the entire result set.
I ported this to JavaScript (apart from the InitPlan shenanigans), so that it doesn't require the server-side component - it runs directly in the browser.
Frequently a query plan turns out to perform nowhere near as well as the planner expects (for example, because the data distribution is poor, or a key being filtered for doesn't exist). In those cases, flipping the query plan around could turn a 1 hour query into a 1 millisecond query. Yet postgres doesn't have the ability to do that.
Sure a human can sometimes use domain knowledge to sometimes be able to rearrange the query to force a plan that works well, but in the general case, the database shouldn't have performance drop by a factor of 1 million because of semi-arbitrary planner decisions. Being able to try multiple possible plans would be a massive start in solving the issue.