I'm in the same boat. Markov chains are a lot of fun, but LLMs are way too formulaic. It's one of those things where AI bros will go, "Look, it's so good at poetry!!" but they have no taste and can't even tell that it sucks; LLMs just generate ABAB poems and getting anything else is like pulling teeth. It's a little more garbled and broken, but the output from a MCG is a lot more interesting in my experience. Interesting content that's a little rough around the edges always wins over smooth, featureless AI slop in my book.
slight tangent: I was interested in seeing how they'd work for open-ended text adventures a few years ago (back around GPT2 and when AI Dungeon was launched), but the mystique did not last very long. Their output is awfully formulaic, and that has not changed at all in the years since. (of course, the tech optimist-goodthink way of thinking about this is "small LLMs are really good at creative writing for their size!")
I don't think most people can even tell the difference between a lot of these models. There was a snake oil LLM (more snake oil than usual) called Reflection 70b, and people could not tell it was a placebo. They thought it was higher quality and invented reasons why that had to be true.
Like other comments, I was also initially surprised. But I think the gains are both real and easy to understand where the improvements are coming from. [ . . . ]
I had a similar idea, interesting to see that it actually works. [ . . . ]
I think that's cool, if you use a regular system prompt it behaves like regular llama-70b. (??!!!)
It's the first time I've used a local model and did [not] just say wow this is neat, or that was impressive, but rather, wow, this is finally good enough for business settings (at least for my needs). I'm very excited to keep pushing on it. Llama 3.1 failed miserably, as did any other model I tried.
For story telling or creative writing, I would rather have the more interesting broken english output of a Markov chain generator, or maybe a tarot deck or D100 table. Markov chains are also genuinely great for random name generators. I've actually laughed at Markov chains before with friends when we throw a group chat into one and see what comes out. I can't imagine ever getting something like that from an LLM.
This stuff is getting pushed all the time in Obsidian plugins (note taking/personal knowledge management software). That kind of drives me crazy because the whole appeal of the app is your notes are just plain text you could easily read in notepad, but some people are chunking up their notes into tiny, confusing bite-sized pieces so it's better formatted for a RAG (wow, that sounds familiar)
Even without a RAG, using LLMs for searching is sketchy. I was digging through a lot of obscure Stack Overflow posts yesterday and was thinking, how could an LLM possibly help with this? It takes less than a second to type in the search terms and you just have to look at the titles and snippets of the results to tell if you're on the right track. You have the exact same bottleneck of typing and reading, except with ChatGPT or Copilot you also have to pad your query with a bunch of filler and read all the filler slop in the answer as it streams in a couple thousand times slower than dial-up. Maybe they're more equal with simpler questions you don't have to interrogate, but then why even bother? I've seen some people who say ChatGPT is faster, easier, and more accurate than Stack Overflow and even two crazy ones who said it's completely obsolete and trying to understand that perspective just causes me psychic damage.