Just curious because I was discussing this with someone else on here. Do you think it's possible to create a tldw bot with chatgpt for YouTube videos as well?
AI
Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.
It is definitely possible, at least for videos that have a transcript. There are tools to download the transcript which can be fed into an LLM to be summarized.
I tried it here with excellent results: https://programming.dev/post/158037 - see the post description!
See also the conversation: https://chat.openai.com/share/b7d6ac4f-0756-4944-802e-7c63fbd7493f
I used GPT-4 for this post, which is miles ahead of GPT-3.5, but it would be prohibitively expensive (for me) to use it for a publicly available bot. I also asked it to generate a longer summary with subheadings instead of a TLDR.
The real question is if it is legal to programmatically download video transcripts this way. But theoretically it is entire possible, even easy.
Ah yeah looks good. I mean V4 is definitely better, but if someone could write a bot for the V3.5 version it would still be better than nothing I guess.
For videos without transcriptions you could also use whisper to transcribe the videos first. But that would probably be way to expensive and all.
Still nice to see it would be possible.
Yes, I also realized Whisper could be used for this (see sibling comments, I guess they didn’t show up in your inbox).
Oh, I’ve just realized that it’s also possible if the video doesn’t have a transcript. You can download the audio and feed it into OpenAI Whisper (which is currently the best available audio transcription model), and pass the transcript to the LLM. And Whisper isn’t even too expensive.
Not sure about the legality of it though.
If you want it to be really good, you can transcribe the audio even if the YouTube video already has a transcript. Whisper is much better than whatever YouTube uses for the subtitles. Of course it will be more expensive this way.
Will it work on all instances?
And how do you manage costs? Querying GPT isn't that expensive but when many people use the bot costs might accumulate substantially
I hope it does not get too expensive. You might want to have a look at locally hosted models.
Unfortunately the locally hosted models I've seen so far are way behind GPT-3.5. I would love to use one (though the compute costs might get pretty expensive), but the only realistic way to implement it currently is via the OpenAI API.
EDIT: there is also a 100 summaries / day limit I built into it to prevent becoming homeless because of a bot
By the way, in case it helps, I read that OpenAI does not use content submitted via API for training. Please look it up to verify, but maybe that can ease the concerns of some users.
Also, have a look at these hosted models, they should be way cheaper than OpenAI. I think that this company is related to StabilityAI and the guys from StableDiffusion and also openassistant.
There is also openassistant, but they don't have an API yet. https://projects.laion.ai/Open-Assistant
Yes, they have promised explicitly not to use API data for training.
Thank you, I’ll take a look at these models, I hope I can find something a bit cheaper but still high-quality.
I looked into using locally hosted models for some personal projects, but they're absolutely awful. ChatGPT 3.5, and especially 4 are miles above what local models can do at the moment. It's not just a different game, the local model is in the corner not playing at all. Even the Falcon instruct, which is the current best rated model on HuggingFace.
Can you do this with GPT? Is it free?
ChatGPT 3.5 is free, with login, but there it’s no free API access, for bots. It’s about $0.002/1k tokens. So something like 600 tldr per $.
fyi someone else launched one a day ago; please see my suggestions to them in their 'how the bot works' thread in [email protected].
tldr: i am really not a fan of automated posting of automated plausibly-not-BS-but-actually-BS, which is what LLMs tend to produce, but I realize some people are. if you really want to try it, please make it very clear to every reader that it is machine-generated before they read the rest of the comment.
Thank you, that’s a reasonable suggestion, I added it to the comment template:
TL;DR: (AI-generated 🤖)
Will it work on all instances?
And how do you manage costs? Querying GPT isn't that expensive but when many people use the bot costs might accumulate substantially
- Only on programming.dev, at least in the beginning, but it will be open source so anyone will be able to host it for themselves.
- I set up a hard limit of 100 summaries per day to limit costs. This way it won’t go over $20/month. I hope I will be able to increase it later.
Thanks for your reply!
I am very happy to hear that you will open source it!
I am curious - have you tested how well it handles a direct link to a scientific article in PDF format?
It only handles HTML currently, but I like your idea, thank you! I’ll look into implementing reading PDFs as well. One problem with scientific articles however is that they are often quite long, and they don’t fit into the model’s context. I would need to do recursive summarization, which would use much more tokens, and could become pretty expensive. (Of course, the same problem occurs if a web page is too long; I just truncate it currently which is a rather barbaric solution.)
Thanks for your response!
I imagined that this would be harder to pull off. There is also the added complexity that the layout contains figures and references... Sill, it's pretty cool, I'll keep an eye on this project, and might give self-hosting it a try once it's ready!
LLMs can do a surprisingly good job even if the text extracted from the PDF isn't in the right reading order.
Another thing I've noticed is that figures are explained thoroughly most of the time in the text so there is no need for the model to see them in order to generate a good summary. Human communication is very redundant and we don't realize it.
@TLDRBot
It doesn't work yet, the screenshots are from a private test instance.
Oo looks pretty good mate, keen to see how it turns out