this post was submitted on 22 Nov 2024
752 points (98.1% liked)

Comic Strips

12943 readers
2873 users here now

Comic Strips is a community for those who love comic stories.

The rules are simple:

Web of links

founded 2 years ago
MODERATORS
752
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 3 weeks ago* (last edited 3 weeks ago)

How useful would the training data be

Open datasets are getting much better (Tulu for an instruct database/recipe is a great example), but its clear the giants still have "secret sauce" that gives them at least a small edge over open datasets.

There actually seems to be some vindication of using massively multilingual datasets as well, as the hybrid chinese/english models are turning out very good.