this post was submitted on 16 Sep 2024
71 points (100.0% liked)

TechTakes

1436 readers
127 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

None of what I write in this newsletter is about sowing doubt or "hating," but a sober evaluation of where we are today and where we may end up on the current path. I believe that the artificial intelligence boom — which would be better described as a generative AI boom — is (as I've said before) unsustainable, and will ultimately collapse. I also fear that said collapse could be ruinous to big tech, deeply damaging to the startup ecosystem, and will further sour public support for the tech industry.

Can't blame Zitron for being pretty downbeat in this - given the AI bubble's size and side-effects, its easy to see how its bursting can have some cataclysmic effects.

(Shameless self-promo: I ended up writing a bit about the potential aftermath as well)

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 21 points 2 months ago* (last edited 2 months ago) (2 children)

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I've read how chain-of-thought works in o1 it's been completely different, and I'm still not sure I understand what's supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it's something that's readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that's going on while it arrives to an answer? Even though you're not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

[–] [email protected] 20 points 2 months ago* (last edited 2 months ago)

From the documentation:

While reasoning tokens are not visible via the API, they still occupy space in the model's context window and are billed as output tokens.

Huh.

load more comments (1 replies)