pm21-dragon/exercises/release/exercise-14/Deep Learning.ipynb
2025-01-31 09:52:52 +01:00

11 KiB

None <html> <head> </head>

Deep Learning

YOLO (version 8) video detection

We will use YOLO (You Only Look Once), a very popular object detector trained using deep learning. Read this good article about what is YOLOv8.

As our first step

In [ ]:
!pip install ultralytics
In [ ]:
import cv2
from ultralytics import YOLO
from ultralytics.utils.plotting import Annotator, colors
import os

In this exercise, we use a pre-trained model. Ultralytics has trained this model on the COCO 2017 dataset by Microsoft. They have trained 80 classes of objects, from apple to wine glass including person, bicycle, and car. They make several versions of the object detection model available (and they have other models for tasks beyond object detection). We will use the smallest (and fastest) of these, the "nano" version, yolov8n.

In [ ]:
model = YOLO("yolov8n.pt")

Q1

Upload a video which you would like to run the YOLO object detector on. If you want to use a video of ours, you can try this video: https://strawlab-cdn.com/assets/samples-for-object-detection/BiologieI-bike.mp4 . Upload the video to the folder where this notebook lives by dragging and dropping it into the file browser in JupyterLab.

Put the name of your video in the variable movie_filename.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert os.path.exists(movie_filename), 'No file with the name in `movie_filename`'
assert os.path.getsize(movie_filename) >= 10, 'Not video file: no more than 10 bytes'

Now we run YOLO and create an output video where each object detected by YOLO is drawn with a bounding box.

In [ ]:
output_movie_filename = os.path.splitext(movie_filename)[0] + "-output.mp4"

print(f'"{movie_filename}" -> "{output_movie_filename}"')

cap = cv2.VideoCapture(movie_filename)
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

out = cv2.VideoWriter(output_movie_filename, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))

for _ in range(100):
    ret, im0 = cap.read()
    if not ret:
        print("Video frame is empty or video processing has been successfully completed.")
        break

    results = model.predict(im0, verbose=False)
    annotator = Annotator(im0, line_width=2)

    for box in results[0].boxes:
        cls = int(box.cls)
        annotator.box_label(box.xyxy[0], model.names[cls], color=colors(cls))
    out.write(im0)

out.release()
cap.release()
print("done")

In practice, one may improve performance of a pre-trained model by fine-tuning for a particular task by using labelled data from your task. To limit the computational resources (and time) required, we did not perform any training in this exercise. Even the modest amount of tuning

Q2

Now describe some aspects of the object detection that are functioned but also describe some aspects which did not work well. (Two or three sentences.)

YOUR ANSWER HERE

LLMs

For the next questions, sign up on ILIAS for "AI Chat" as shown in the movie during lecture. Answer the questions to enable "Pilotierung: AI Chat für das Studium".

Now, go to "AI Chat (Modell: llama3.3:70b, Betrieb: RZ)" as shown in the movie during lecture.

Q3

With the AI Chat, ask the model to explain something to you which you have learned about in one of your biology courses. Copy and paste your prompt and the answer in the next cell:

YOUR ANSWER HERE

Q4

Now, comment on the the quality of the response from the AI Chat: (two or three sentences).

YOUR ANSWER HERE

</html>