Programming

20088 readers

96 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]

founded 2 years ago

MODERATORS

snowe

Ategon

[email protected]

193

Stack Overflow Just Announced Their Own AI OverflowAI (stackoverflow.blog)

submitted 2 years ago by [email protected] to c/programming

72 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 6 points 2 years ago (3 children)

Check out the article and feature video. It does appear to link to answers it pulled from. Bing and Bard do the same. Posters saying it's impossible are mistaken.

[–] [email protected] 4 points 2 years ago

Thanks for the TLDW - I could ogle a bit of the article but since I was at work, I couldn't just play the video out loud.

[–] [email protected] 4 points 2 years ago

Posters aren't saying that its impossible to put search results through an LLM and ask it to cite the source it reads. They're saying that the neural networks, as used today in LLMs, do not store token attribution in the vocabulary or per node. You can implement a system for the neural network to work in that provides it the proper input (search results) and prodding (a prompt that encourages the network to biasing toward citation), not that the single LLM can conceptualize of that on its own.

[–] MagicShel 2 points 2 years ago* (last edited 2 years ago) (1 children)

If it's doing a search for the code, pulling it in to the context, and then spitting it back out in slightly modified form, then it can attribute the source it pulled in. That's a very different thing from the AI because code that is pulled into context by a search had a strong influence on the output. The output is still generated the same way but it would be reasonable to credit the author of the code that is pulled in. However, the code in the training data cannot be credited. How you would pull in just the right piece of code in the first place though is a bit of a mystery to me.

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

There are a few ways of finding which code is relevant, but one way is to use some sort of vector database to perform the search using embeddings generated from the Qs, As, and query.

Embeddings are essentially semantic representations of the text which can be compared to each other for similarity.