this post was submitted on 31 May 2025
12 points (77.3% liked)

Security

902 readers
4 users here now

A community for discussion about cybersecurity, hacking, cybersecurity news, exploits, bounties etc.

Rules :

  1. All instance-wide rules apply.
  2. Keep it totally legal.
  3. Remember the human, be civil.
  4. Be helpful, don't be rude.

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
top 5 comments
sorted by: hot top controversial new old
[–] [email protected] 8 points 1 day ago* (last edited 1 day ago) (1 children)

From the researcher's blog post: (https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/)

My experiment harness executes this N times (N=100 for this particular experiment) and saves the results. [...]

o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives.
...
Combining the code for all of the handlers with the connection setup and teardown code, as well as the command handler dispatch routines, ends up at about 12k LoC (~100k input tokens), and as before I ran the experiment 100 times.

o3 finds the kerberos authentication vulnerability in 1 out of 100 runs with this larger number of input tokens, so a clear drop in performance, but it does still find it. More interestingly however, in the output from the other runs I found a report for a similar, but novel, vulnerability that I did not previously know about.

A practical demonstration of "even a stopped clock is right twice a day."

[–] Kissaki 1 points 1 day ago

“even a stopped clock is right twice a day.”

Code analysis is a bit more complex than a clock.

[–] onlinepersona 3 points 2 days ago

Initially embarking on a manual audit of ksmbd to benchmark o3’s potential, Heelan quickly realized that the model was able to autonomously identify a complex use-after-free vulnerability in the handler for the SMB ‘logoff’ command—an issue Heelan himself had not previously detected.

[–] [email protected] 1 points 1 day ago (1 children)

Uh oh, that means AI will be used to find countless zero-days for hacking purposes.

[–] [email protected] 4 points 1 day ago

If by countless you mean 8 valid ids of this same singular issue in 100 runs, with an almost 30% false positive rate, then sure.

I'm far more worried about the false positive rate drowning out things.