Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AD guarantees analytically correct logic (in infinite precision, for example) if you use it right

The entire point of the video is that this isn't true. It is true for static algorithms, but for algorithms that iterate to convergence, the AD will ensure that the primal has converged, but will not ensure the dual has converged.



I don't think you understood what I wrote there. The rules for algebraically computing a derivative are simple and deterministic and essentially captured within AD algorithms. They MUST therefore be correct in an analytical sense, given infinite precision. The video starts off by saying, they are dealing with considerations about how computers actually work. That kind of implies finite precision. Like I said, concerns about stability of the methods are not new. Your original function might not be differentiable at any given point, for example. You have to know about that stuff rather than blindly applying "automatic" techniques. There is a lot of literature about how to use AD and what can go wrong. This is a paper I just found in a basic search, that is a survey of known pitfalls: https://wires.onlinelibrary.wiley.com/doi/full/10.1002/widm....

The entire point of the video is mired in an hour or so of details about how they had trouble using it for solving ODEs. I am familiar with forward and reverse mode but for me to appreciate it I would have to get up to speed with their exact problem and terminology. Anyway, my point is that AD requires you to know what you are doing. This video seems like a valuable contribution to the state of the art but I think you have to recognize that the potential for problems was known to numerical analysis experts for decades, so this is not as groundbreaking as it appears. The title should read, "Automatic differentiation can be tricky to use" to establish that it is in fact a skill issue. The mitigation of these corner cases is valuable, to make them more versatile or foolproof. But the algorithms are not incorrect just because you didn't get it to solve your problem.


Not to spam you, but this is probably a function that would not work for AD: https://math.stackexchange.com/questions/2383397/differentia...

That is, it is a series that converges, but trying to take the derivative as a sum of individual terms results in divergence. I learned a lot of this type stuff ages ago, but in 2025 I just searched for an example lol... I am long overdue for a review of numerical analysis and real analysis.

ChatGPT also says something about an example related to some Fourier series, maybe related to this: https://en.m.wikipedia.org/wiki/Convergence_of_Fourier_serie... You can ask it all about this stuff. It seems pretty decent, although I have not gone too far into it.


Don't worry, This is interesting! AD should work on this example (at all points where the derivative converges) see this desmos graph for a very informal proof that the series converges https://www.desmos.com/calculator/djf8qtilok.

The place where I think we're talking past each other here is that in infinite precision, AD perfectly differentiates your algorithm, but even an algorithm using arbitrary (or even infinite) precision math, that to high accuracy controls the error of a differentialable problem, AD can still do weird things.


Try that with `g(x, i) = sin(ix) / i`. I think that is one that ChatGPT said wouldn't work, as in you can't get the derivative of `f(x)` term-by-term. I guess another issue that could happen is that the original sequence converges, and the derivative sequence converges, but they converge at different rates. So code that calculates the function to sufficient precision would not automatically get the derivative to any particular error threshold.


> g(x, i) = sin(ix) / i

That's an example where the derivative does not exist.

> I guess another issue that could happen is that the original sequence converges, and the derivative sequence converges, but they converge at different rates.

This is a lot closer to what's happening in the video. For a potentially simpler example than an ODE solver, if you had a series evaluator that given a series evaluated it at a point, AD would need a similar fix to make sure the convergence test is including the convergence of the derivative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: