Sorry, it's late here and I guess I didn't word it well. Dialback (these days) always runs over a TLS-encrypted connection, as all servers enforce TLS.
The next question is how to authenticate the peer, and that can be done a few ways, usually either via the certificate PKI, via dialback, or something else (e.g. DNSSEC/DANE).
My comment about "combining dialback with TLS" was to say that we can use information from the TLS channel to help make the dialback authentication more secure (by adding extra constraints to the basic "present this magic string" that raw dialback authentication is based on).
How would dialback-over-TLS be "more vulnerable to MITM" though? I think that claim was what led to the confusion, I don't see how TLS-with-client-EKU is more secure then TLS-with-dialback
Firstly, nobody is actually calling for authentication using client certificates. We use "normal" server certificates and validate the usual way, the only difference is that such a certificate may be presented on the "client" side of a connection when the connection is between two servers.
The statement that dialback is generally more susceptible to MITM is based on the premise that it is easier to MITM a single victim XMPP server (e.g. hijack its DNS queries or install an intercepting proxy somewhere on the path between the two servers) than it is to do the same attack to Let's Encrypt, which has various additional protections such as performing verification from multiple vantage points, always using DNSSEC, etc.
If an attacker gets a misissued cert not through BGP or DNS hijacks, but by exploiting a domain validation flaw in a CA (e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=2011713) then it's trivial for them to use it as a client certificate, even if you're requiring the serverAuth EKU. On the other hand, dialback over TLS would require the attacker to also MitM the connection between XMPP servers, which is a higher bar.
The good news is that since Prosody requires the serverAuth EKU, the misissued cert would be in-scope of Mozilla's root program, so if it's discovered, Mozilla would require an incident report and potentially distrust the CA. But that's reactive, not proactive.
You're not wrong. PKI has better protections against MITM, dialback has better protections against certificate leaks/misissuance.
I think the ideal approach would be combining both (as mentioned, there have been some experiments with that), except when e.g. DANE can be used ( https://prosody.im/doc/modules/mod_s2s_auth_dane_in ). But if DANE can be used, the whole CA thing is irrelevant anyway :)
Firstly, nobody is actually calling for authentication using client certificates. We use "normal" server certificates and validate the usual way
I'm not sure I understand this point. You authenticate the data you receive using the client's certificate. How is that "nobody is calling for authentication using client certificates"? Maybe there's some nuance I'm missing here but if you're authenticating the data you're receiving based on the client's certificate, then how is that "validating the usual way"?
There is a lot of confusion caused by overlapping terminology in this issue.
By "client certificates" I mean (and generally take most others in this thread to mean) certificates which have been issues with the clientAuth key purpose defined in RFC 5280. This is the key purpose that Let's Encrypt will no longer be including in their certificates, and what this whole change is about.
However when one server connects to another server, all of TCP, TLS and the application code see the initiating party as a "client", which is distinct from say, an "XMPP client" which is an end-user application running on e.g. some laptop or phone.
The comment I was responding to clearly specified " I don't see how TLS-with-client-EKU [...]" which was more specific, however I used the more vague term "client certificates" to refer to the same thing in my response for brevity (thinking it would be clear from the context). Hope that clarifies things!