Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

O(n^2) is a scaling factor: what happens as n->infinity.

An optimal arrangement for n==4 already exists. Both AMD MI100 and NVidia Volta / Ampere perform 4x4 FP16 matrix multiplication in a single assembly language statement.

An 8x8 matrix is just an arrangement of four 4x4 matricies (!!!!). As such, you can define a 8x8 matrix multiplication as a 4x4 matrix multiplication over 4x4 matricies. This recursive relationship is often key to how they manage to get faster-and-faster. Getting to O(n^2.8), or O(n^2.6,) etc. etc.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: