this post was submitted on 11 Apr 2024
50 points (81.2% liked)
Technology
58303 readers
9 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Besides novelty, the majority of AI I have used has just added extra steps to my work process instead of making it easier. Can we just stop already? It's not a tool for literally everything and I'm tired of companies thinking it is.
Just give it a couple of years for the hype/boom/bust cycle to complete, then it’ll settle down and people will start using the tech appropriately.
Yep, in the exact same was as blockchain: nowhere.
Unlike block chain, there is a solid chunk of new use cases to be conquered with AI. These might be very technical in nature, but for example, text suggestions on smartphones might already be done with AI, depending on your OS.
We already have text prediction that works more efficiently (from a power and computing point of view) by using things like trees.
There's very few use-cases I've seen where AI is more efficient than an algorithm, and it's mostly in areas where it does a bunch of tests/research/simulation inputs by throwing random shit at the wall that users wouldn't normally try really fast.
AI is basically useless when you're doing something that's easily repeatable, because it's easier to actually implement tools that use algorithms to do that kind of thing.
My brother in Christ, a LLM is a tree
neural network tools seem really powerful for image filtering and video compression.
That could explain why SwiftKey sucks now
Google and partners have been showing off some pretty cool use cases for Gemini, mostly related to GCP, at Next 24.
Depends on your work, what you're trying to do, and how you use it.
As a developer I run my own local version of Dolphin Mixtral 8x7B (LLM) and it's great at speeding up my productivity. I'm not asking for it to do everything all at once but usually just small snippets here and there to see if there's a better or more efficient way.
I, for one, am looking forward to hardware improvements that can help us run larger models, so news like this is very welcome.
But you are correct, a large number of companies misunderstand how to use this technology when they should really be treating it like someone at an intern level.
It's great to give small and simple (especially repetitive) tasks, but you'll still need to verify everything.
Hey, I might give Dolphin Mixyral a try. Do you know where I might install it?
Also, are you a web dev?
Well that's a loaded question.
There are probably some websites that let you try out the model while they run it on their own equipment (or have it rented out through Amazon, etc.). But the biggest advantage to these models is being able to run it locally if you have the hardware to handle it (beefy GPU for quicker responses and a lot of RAM).
To quickly answer your question, you can download the model from here:
https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF
I would recommend Q5_K_M.
But you'll also need some software to run it.
A large number of users are using "Text-Generation-WebUI" https://github.com/oobabooga/text-generation-webui
There's also "LM Studio" https://lmstudio.ai/
Ollama https://github.com/ollama/ollama
And more.
I know that LM Studio supports Both NVIDIA and AMD GPUs.
Text-Generation-WebUI can support AMD GPUs as well, it just requires some additional setup to get it working.
Some things to keep in mind...
Hardware requirements:
- RAM is the biggest limiting factor with which model you can run while your GPU/CPU will decide how quickly the LLM can respond.
- If you can fit the entire model inside of your GPU's VRAM you'll get the most speed. In this case I would suggest using a GPTQ model instead of GGUF https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GPTQ
- Even the newest consumer grade GPUs only have 24GB of VRAM right now (RTX 4090, RTX 3090, and RX 7900 XTX). And the next generation of Consumer GPUs are looking like they will be capped at 24GB of VRAM as well unless AMD decides this is their way of competing with NVIDIA.
GGUF models let you compensate for VRAM limitations by loading the model first in VRAM and anything leftover will get loaded into system RAM.
Context Length: Think of an LLM like something that only has a fixed amount of short term memory. The bigger you set the context length, the more short term memory you can give it (the maximum length you can set depends on the model you're using and setting it to the max also requires more RAM). Mixtral 8x7B models have a Max context length of 32k.
This always happens when something new and novel has “potential”. VC money has been funding loss-leaders for two decades and they wanna cash in on the next gold rush. Just like blockchain, expect to see this beaten to death and shoehorned into places it really has no real use. They’ll be a few really solid things that are found for it to do though, and it will excel in those places. Then we’ll all laugh about “remember when they thought LLMs were the next big thing? What a bubble that turned out to be, like pets.com all over again”