DeepSeek's Multi-Head Latent Attention | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		DeepSeek's Multi-Head Latent Attention (liorsinai.github.io)
		4 points by the_origami_fox on March 3, 2025 \| hide \| past \| favorite \| 1 comment

fspeech on March 3, 2025 [–]

Matrix absorption is unnecessary. What is needed is the order of multiplication associates towards the direction of the absorption. This and the modified Rope are needed to make the caching work.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact