this post was submitted on 04 Feb 2025
155 points (94.3% liked)

Technology

63082 readers
3659 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

"Scarcity is the mother of invention" …looks like optimisation is back on the menu boys, something the Americans have forgotten whether for games, softwares, apps etc.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 60 points 2 weeks ago (1 children)

Per the article, they just developed a faster algorithm for a specific type of material simulation.

[–] Isoprenoid 35 points 2 weeks ago (2 children)

Per the article, they ~~just~~ developed a faster algorithm for a specific type of material simulation.

You're underplaying it.

As per the article:

This has broad implications for industries that require detailed material analysis, including:

Aerospace and Defense: Improved modeling of material stress and failure in aircraft structures.

Engineering and Manufacturing: More efficient testing of materials for construction and industrial applications.

Military Research: Faster development of impact-resistant materials for defense systems.

[–] [email protected] 7 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

Yes, exactly… For another example, the DeepSeek team developed their own replacement to CUDA with PTX (Parallel Thread Execution) a lower-level assembly-like language that allows for more granular optimisations of GPU performance offering 10X efficiency improvement recently as GPU sanctions were levied on China. This innovative approach not only challenges the dominance of CUDA in the AI landscape but also opens up new possibilities for optimizing GPU performance in various applications and this is what missing not only from those relying on Nvidia but its competitors whether AMD or Apple that prefers to have its own proprietary solutions.

[–] [email protected] 17 points 2 weeks ago (1 children)

Nvidia developed PTX, DeepSeek leveraged it to do some load balancing work they couldn't do in CUDA. They still also use CUDA.

[–] [email protected] 1 points 2 weeks ago

DeepSeek team developed their own replacement to CUDA called PTX (Parallel Thread Execution) a lower-level assembly-like language that allows for more granular optimisations of GPU performance offering 10X efficiency improvement recently as GPU sanctions were levied on China

No they didn’t. NVIDIA created PTX

[–] [email protected] 1 points 2 weeks ago

I write HPC and this really isn't that crazy. Its 800x vs a serial program ran on a CPU, it is only 100x a OpenMP (parallel CPU version).

While the gains are very impressive and show clear deep understanding of HPC programming, it is really nothing that out of the ordinary.

With proper memory optimisations and algorithm optimisations from a single thread to a dedicated GPU model a 100-1000x speed up is pretty normal.