this post was submitted on 01 Apr 2024
17 points (90.5% liked)

Performance

360 readers
3 users here now

A community for posts relating to performance

Wormhole

[email protected]

founded 1 year ago
MODERATORS
 

My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here