Hey, I've read this article while doing my course exercises. Coming from Metropolis-Hastings, HMC seemed a bit like magic. Here's some notes that I wrote about it, could somebody tell was I even remotely close:
>In other words explained by my TA: when you have a really complex multi-dimensional distribution normal MCMC will take forever to explore it. HMC on the other hand adds momentum that will help the MC to explore areas where the probability is high. Imagine the probability is being translated into 3d landscape where high probability corresponds to deep areas and low high. Any ball with gravity will follow those curvatures and not jump over the walls needlessly where the probability is low.
>Also HMC is the current state-of-the-art MCMC algorithm if you have very high-dimensional data. Regular MCMC too can be applied if the distribution is much simpler. However instead of MCMC, VI is commonly used since it can give really good results with little amount of work. I mean sure you have to choose your approximation distribution but after that it's dead simple. Only maybe if you need really high accuracy you might choose something like HMC.
Excellent! Always believed being able to "play" with the Hamiltonian particle would be a great motivator in understanding physics ;)
This is why I get so excited about about probabilistic programming in general. 1M dimension data sets likelihood estimation in reasonable time right on your laptop. Real world samples are usually sparse, heterogeneous. By abstracting out your analysis it not only reduces the chance for human error. But allows for ingestion of even more disparate sets of archival and de novo data sources. I have little doubt this will lead to more nuanced theories. And higher reproducibility of results.
A recent example of the state-of-the-art: predicting rare events in pediatric transplant surgeries.
Note that many people still use HMC without a closed form for the gradient, via approximation. In fact, Stan (http://mc-stan.org/) automatically approximates the gradient by default if none is given.
I don't follow the literature on sampling based inference very closely. Could anyone tell me what's the state of the art in confirming and debugging convergence problems
I'm not an expert, but I think R-hat is pretty commonly used. There are a whole bunch of diagnostic plots in the bayesian community within R (bayesplot and tidybayes are useful here). Not sure if that's state of the art (R-hat definitely isn't, as it's in Gelman's PhD thesis).
The folk theorem of computational statistics suggests that if the model has convergence problems, it's a bad model ;)
OK I somewhat understand Markov Chains and I somewhat understand Monte Carlo simulations (at least in the context of financial modeling), but this is quite over my head!
Why is the title of the site "Brilliantly Wrong"? They post a variety of such methods. Is this an example of a sophisticated method that is incorrect somehow?