aldern00b
- Mar 9
- 7 min read

How Can Python and Multiple API's Enhance Video Analysis and Transcription with OpenAI's GPT?

Updated: Mar 13

Here's the problem I wanted to solve:

Fishing videos usually have a lot of fluff. I'm watching because I want to know what, when and where fish were caught and I usually have to sit through 10-20 minute videos and make notes as I watch to get the the information I wanted.

The answer:

ChatGPT's AI API. We're going to use the power of ChatGPT via their paid API to dissect the video.

If you haven't already, check out the primer article here: https://www.aldern00b.com/post/unlock-the-power-of-ai-how-to-create-custom-chat-bots-using-the-chatgpt-api it will show you the basics of how this all fits together.

TL;DR

from openai import OpenAI

from pytube import YouTube

client = OpenAI(api_key="API_KEY_HERE")

# Get the transcript. Replace "YOUR_VIDEO_ID" with the actual video ID

from youtube_transcript_api import YouTubeTranscriptApi

user_input = input("Enter a YouTube video URL: ")

#get the video title from the User Input

video_url = user_input

yt = YouTube(video_url)

# Determine the YouTube video ID

last_11_chars = video_url[-11:]

video_id = last_11_chars

# Get the transcript for the video

transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Print the transcript to a single variable

fulltranscript=""

for entry in transcript:

    #We're adding each line to the single variable

    fulltranscript += (entry['text']) + " "

messages=[{"role": "system",

            "content": "You are a funny host of an exciting fishing news channel. Your main goal is to keep peoples attention."},

          {"role": "user",

           "content": "This video transcript is from a freshwater fishing video. I need to know what time of year the fish were caught, the lures were being used, how those lures were being presented to the fish and at what depth they were fishing in. Capture the exciting moments and outline what you think those moments were: " + fulltranscript}]

# Create a request to the CHAT completions endpoing

response = client.chat.completions.create(

  model="gpt-3.5-turbo",

  messages=messages,

print("Video: " + yt.title)

print(response.choices[0].message.content)

OK, if you wanna know how I put it together, here's the long way. First off, we're going to need some pip installs done. Here's what I used:

pip install youtube-transcript-api

pip install openai

pip install pytube

Import all those into your python script like this:

from openai import OpenAI

from pytube import YouTube

from youtube_transcript_api import YouTubeTranscriptApi

The first thing we need to do is of course get a YouTube video to get the information from. We're going to prompt the user for that information with this:

video_url = user_input

We can pull all the information we're going to need right from this input. First we'll pass the entire URL to the yt variable so we can get the title and such:

yt = YouTube(video_url)

We're also goin to need the video ID so we can get the transcript. The video ID is usually the last 11 digits of the video URL so we can strip that right off the end of the input we recieved.

last_11_chars = video_url[-11:]

video_id = last_11_chars

Next, we'll get the transcript for the video using that ID.

transcript = YouTubeTranscriptApi.get_transcript(video_id)

The transcript is going to come with all kinds of data we don't want right now - like times and such. We're going to prime the stripping of that data by creating an empty variable we can append do as we iterate through the lines of the transcript.

fulltranscript=""

Let's dive into the for loop as we go through each line of the transcript.

for entry in transcript:

As we make our way through we're going to only capture the text fields for the transcript and concatenate them to the variable we've created. You'll notice I've added a space on the end. This is because if we don't, the output will have the ending of the line run into the beginning of the next line - mashing the words up.

    fulltranscript += (entry['text']) + " "

Now that we have a full transcript it's time to start the AI business. We're going to be using the chat engine to to this. We'll outline what the system should be like and then ask our question from the user perspective. This video dissection is about a fishing video and I wanted to know more information about how they were catching. You'll notice at the end of my question I'm providing the full transcript to feed it into the AI.

messages=[{"role": "system",

            "content": "You are a funny host of an exciting fishing news channel. Your main goal is to keep peoples attention."},

          {"role": "user",

           "content": "This video transcript is from a freshwater fishing video. I need to know what time of year the fish were caught, the lures were being used, how those lures were being presented to the fish and at what depth they were fishing in. Capture the exciting moments and outline what you think those moments were: " + fulltranscript}]

We'll now create the response request and have it print out the OpenAI response.

response = client.chat.completions.create(

  model="gpt-3.5-turbo",

  messages=messages,

print("Video: " + yt.title)

print(response.choices[0].message.content)

For the video I dissected, here's the response our system entity gave back:

Video: TOP 3 Most PROVEN EFFECTIVE Ways To Catch STEELHEAD

The fishing video seems to have been recorded during the steelhead season, which typically occurs in the fall and winter months. The angler was using three different setups - bead, jig, and spinner - to target steelhead in a river. Let's break down the information based on the different setups:

1. Bead Setup: The angler was using a bead setup, specifically a glass bead in line with a size two addicted bead hook. The presentation was natural and uninvasive, allowing the bead to float down the river in front of the fish. The angler was fishing at a depth of about 5 feet and was focusing on line management to ensure a proper presentation. The angler was using a 10.5-foot Okuma guide select rod with 12- to 6-pound test line.

2. Jig Setup: The angler switched to a jig setup in the fast water conditions. They were using a quarter-ounce jig with a chunk of prawn as bait. The rod used was a 10.5-foot Okuma X Rod with a 4000 series reel and a 65-pound test line. The angler emphasized the importance of keeping the jig off the bottom and presenting it properly in the fast water.

3. Spinner Setup: In the last setup, the angler used a spinner in fast water conditions targeting steelhead. The spinner had a black body and a sidewash hook. The angler was making close, middle, and far casts, focusing on a proper presentation around rocks and structure. The rod used was a 9.3-foot Okuma X Rod with a 8- to 15-pound rating.

Exciting moments in the video included the angler hooking a fish on the bead setup, almost landing a fish on the jig, and successfully catching fish on the spinner setup in fast water. There was also a moment of reflection on a missed opportunity due to a hook set mistake, emphasizing the importance of technique in landing fish.

Overall, the video provided valuable insights into fishing for steelhead using different setups and techniques while showcasing the thrill of catching these elusive fish in challenging river conditions.

Let's take it up a notch. I've modified this so it does an entire playlist of like videos to bulk research:

###################################################################

# Application that takes a YouTube transcript and summarizes it.

# For this we're using it to learn quickly from a fishing video

# Here's what we're after:

# - What time of year this was done

# - What lures were being used

# - What areas were being fished and how they're being presented

# - What depth they're fishing

###################################################################

# Before we begin we need a Python API to get the transcript

# pip install youtube-transcript-api

# Make sure you also install the openai API

# pip install openai

# for video data we also need to install pytube

# pip install pytube

# For pretty text we use colorama

# pip install colorama

from openai import OpenAI

from pytube import YouTube

from pytube import Playlist

import colorama

colorama.init()

client = OpenAI(api_key="API-KEY-HERE")

# Get the transcript.

from youtube_transcript_api import YouTubeTranscriptApi

print(f"""{colorama.Fore.GREEN}

    ############################################################################

    # Please note, this is for use with PLAYLISTS. It's recommended that you   #

    # create a playlist of like videos so it can batch process all the data    #

    # in all the videos. Doing this gives you the best chance of good info.    #

    #                                                                          #

    # - Playlists MUST not be private (Public or Unlisted)                     #

    # - Make sure you're copying the Playlist URL, not a video URL             #

    #                                                                          #

    # Created by Jason DeValadares (Orillia Fishing)                           #

    # Last updated: 3/13/2024                                                  #

    ############################################################################

{colorama.Style.RESET_ALL}""")

user_input = input("Enter a YouTube PLAYLIST URL: ")

#Assign the playlist URL to a new variable

playlist = Playlist(user_input)

# Count how many videos are in the playlist

print(f"Dissecting {len(playlist.video_urls)} videos...")

# Setup our variable for use

fulltranscript=""

ytTitles = []

# Loop through each video URL in the playlist

for video_url in playlist:

    # Store the current video URL to a new varialbe

    yt = YouTube(video_url)

    # Get the title of the video and save it to the ytTitles array list

    ytTitles.append(yt.title)

    # Determine the YouTube video ID by grabbing the last 11 chars of the URL

    last_11_chars = video_url[-11:]

    video_id = last_11_chars

    #print(f"Video ID: {video_id}")  ## USE THIS FOR TROUBLESHOOTING

    # Get the transcript for the video, pick english first and if not avail then brittan english

    transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'en-GB'])

    #print(transcript)  ## USE THIS FOR TROUBLESHOOTING

    # Print the transcript to a single variable

    for entry in transcript:

        #We're adding each line to the single variable making sure to only capture the text portions

        fulltranscript += (entry['text']) + " "

# Create the AI roles and outline what we're looking for. Add the concatenated transcripts to the user role

messages=[

    {"role": "system", "content": "You are a funny host of an exciting fishing news channel. Your main goal is to keep peoples attention and share the things you learn."},

    {"role": "user", "content": "These are video transcripts from multiple freshwater fishing videos. Examine all of them and determine what season fish were caught, the most common lures that were being used - including their weights, how those lures were being presented to the fish and at what depth and structure they were fishing in. Outline everything found after examination and create a shopping list of the most common equipment mentioned in the transcripts. Here are the transcripts: " + fulltranscript}

#print(messages)  ## USE THIS FOR TROULBESHOOTING

# Create a request to the CHAT completions endpoint and have it use the messages variable to create content about our video transcripts

response = client.chat.completions.create(

  model="gpt-3.5-turbo",

  messages=messages,

# List out all the video titles in the playlist

print("\n Videos Examined: ")

for titles in ytTitles:

    print(titles)

# Print out the AI response content.

print("\n")

print(response.choices[0].message.content)

How Can Python and Multiple API's Enhance Video Analysis and Transcription with OpenAI's GPT?

Recent Posts