A web-based application enabling users to interact with and extract insights from YouTube video transcripts and website content. This solution aims to enhance user engagement, streamline content exploration, and provide actionable insights efficiently.
This project was implemented for LLM Zoomcamp - a free course about LLMs and RAG.
The used dataset for this project is dynamic as it depends on the user interests.
We can consider the next datasets as the foundations of this project:
The data to validate and test the LLM can be found here and here (Public URLs accessible by anyone).
Objective:
Develop a web-based application enabling users to interact with and extract insights from YouTube video transcripts and website content.
This solution aims to enhance user engagement, streamline content exploration, and provide actionable insights efficiently.
Enhanced Content Accessibility:
- Challenge: Users often face difficulties finding and accessing relevant information from video content and websites.
- Solution: This tool allows users to input YouTube video URLs and website links, process them, and interact with the content through a chat interface. This makes it easier for users to find specific information and gain insights without manually sifting through lengthy videos or web pages.
Improved User Engagement:
- Challenge: Traditional methods of content consumption can be passive and less engaging, leading to lower user interaction and satisfaction.
- Solution: By providing a chatbot interface, users can engage in a conversational manner with the content, asking questions and receiving tailored responses. This interactive approach increases user engagement and makes content exploration more dynamic and user-friendly.
Streamlined Information Retrieval:
- Challenge: Retrieving specific information from videos and websites can be time-consuming and inefficient.
- Solution: The application processes video transcripts and website content, allowing users to instantly query and receive relevant information. This speeds up information retrieval and improves overall efficiency.
Accessibility for Non-Technical Users:
- Challenge: Many users lack the technical expertise to manually analyze or process content from various sources.
- Solution: The user-friendly interface simplifies the process of content analysis and interaction, making it accessible to users with varying levels of technical knowledge.
Competitive Advantage:
- Challenge: Businesses and content creators need innovative tools to stand out and provide value to their audiences.
- Solution: This tool positions your business as a forward-thinking content interaction and analysis leader. It demonstrates a commitment to enhancing user experience and leveraging advanced technologies to provide valuable insights.
Multiple Retrieval approaches were evaluated (in total 6), as implemented here. The used metric was the hit rate. The results are presented as follows:
The different Retrieval approaches are the next ones:
As shown in the Figure above, the best retrieval was Embedded Vector and Keyword Search with Rerank, outperforming all its competitors.
Multiple RAG approaches were evaluated based on its RAG configurations.
The cosine similarity with a groundtruth database was the metric to evaluate the RAGs, waas implemented here.
The average metric of the result is presented in the next figure:
The best RAG performance based on the Cosine Similarity was Embedded Vector and Keyword Search with Rerank, but the other competitors were very close.
The UI implemented in gradio is presented here.
The ingestion pipeline is totally automated and part of the core functionalities of this project, and the related code can be found here and here, implemented in different ways (gradio app or tests 1 or test 2 or test 3)
Not implemented, but easily a database and user questions and answers can be saved.
A Dockerfile was implemented to load the Gradio app with all the requirements. With this Dockerfile is possible to run the App consuming a local Ollama service.
Using a local Ollama service make sure you:
ollama serve
, to serve the modelsollama pull mxbai-embed-large
ollama pull gemma2
ollama pull phi3.5
To run the Dockerfile follow the next steps:
docker build -t llm-url_video-rag .
docker run --network="host" -p 7860:7860 llm-url_video-rag
As well a fully implemented docker-compose was developed to manage the full app (including its local Ollama service model).
To run the docker compose follow the next steps:
docker compose exec ollama ollama pull mxbai-embed-large
docker compose exec ollama ollama pull gemma2
docker compose exec ollama ollama pull phi3.5
docker compose up
The reproducibility of this project is very high as we have available the instructions to:
Implemented as mentioned above (Retrieval evaluation) (vector search and keyword search), implemented here and evaluated as showed before.
The documents are re-ranked in multiple cases as described above. The implementation can be found here
The query rewritting is implemented here, in order to improve the user input.