A comment from a Reddit user (Fuzzlewhumper) regarding these changes:
What would take me 2-3 minutes of wait time for a GGML 30B model takes 6-8 seconds pause followed by super fast text from the model - 6-8 tokens a second at least. Faster than I normally type. Yup, had it describe the characters, big old paragraph, 7.41 tokens on my 2015 machine with 32gb memory, I7-6700, and a couple cheap 3060 RTX cards. SCORE.
I would be curious to see if the efficiency change is that drastic. I will do my best to include my findings in the larger model benchmark post I am piecing together.