Transforming Images and Creating Video with OpenCV

When working with OpenCV, the primary focus is often on images. However, creating animations from multiple images can provide a different perspective or help visualize your work more effectively by introducing a time dimension.

In this tutorial, you will learn how to create a video clip using OpenCV, accompanied by basic image manipulation techniques to generate the individual frames. Specifically, you will learn:

How to manipulate images as NumPy arrays.
How to use OpenCV functions for image manipulation.
How to create a video file using OpenCV.

Let’s get started!

Table of Contents

Overview

This tutorial is divided into two main sections:

Creating the Ken Burns Effect
Writing the Video File

Creating the Ken Burns Effect

You’ll create multiple images to visualize progress in your machine learning project or demonstrate how a computer vision technique manipulates an image. The first task is to implement the Ken Burns effect, a technique named after the filmmaker that involves panning and zooming in on a static photograph:

“Instead of presenting a large, static image, the Ken Burns effect crops to show a detail and then pans across the image.”

To illustrate this, we’ll start with an image that you can download from Wikipedia:

Download the Bird Image

This image features the Hooded Mountain Tanager and has a resolution of 4563×3042 pixels. Here’s how to load and display the image with OpenCV:

import cv2

imgfile = "Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"
img = cv2.imread(imgfile, cv2.IMREAD_COLOR)
cv2.imshow("Bird", img)
cv2.waitKey(0)

The image read into OpenCV, img, is represented as a NumPy array of shape (3042, 4563, 3), indicating it’s a colored image with pixel values expressed in BGR format and an 8-bit unsigned integer data type.

The Ken Burns effect involves zooming and panning across the image. Each frame of the video is generated by cropping the original image (and then resizing to fill the screen). The cropping of the image using NumPy array slicing is straightforward:

cropped = img[y0:y1, x0:x1]

The image is a three-dimensional NumPy array where the first two dimensions represent height and width. Therefore, you can slice the array to extract pixels from the specified coordinates.

To maintain a constant video frame size and avoid distortion, the cropped images must be resized to a predefined aspect ratio. OpenCV’s resize function makes it easy to adjust the images:

resized = cv2.resize(cropped, dsize=target_dim, interpolation=cv2.INTER_LINEAR)

This function takes an image and a target dimension tuple (width, height), returning a new NumPy array. You can specify the resizing algorithm; using linear interpolation usually yields satisfactory results.

To implement the Ken Burns effect, you’ll follow these steps:

Use a high-resolution image, defining the pan by specifying starting and ending focal points.
Set the zoom levels for the animation.
Determine the video duration and the frame rate (FPS). Multiply these values to calculate the total number of frames.
For each frame, compute the crop coordinates and resize the cropped image to fit the target resolution.
Write all frames to a video file.

Here are the defined constants for creating a two-second 720p video (1280×720) at 25 FPS, with specified pan and zoom settings:

imgfile = "Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"
video_dim = (1280, 720)
fps = 25
duration = 2.0
start_center = (0.4, 0.6)
end_center = (0.5, 0.5)
start_scale = 0.7
end_scale = 1.0

To facilitate frequent cropping, create a function that crops the images as needed:

def crop(img, x, y, w, h):
    x0, y0 = max(0, x - w // 2), max(0, y - h // 2)
    x1, y1 = x0 + w, y0 + h
    return img[y0:y1, x0:x1]

This function accepts the image, the center coordinates, and the width and height for cropping, ensuring it doesn’t exceed the image boundaries.

Using affine transformations, you’ll determine the zoom and pan parameters based on your timeline. The following Python code snippet captures the transition for zoom and pan:

rx = end_center[0] * alpha + start_center[0] * (1 - alpha)
ry = end_center[1] * alpha + start_center[1] * (1 - alpha)

The scale parameter is similarly defined by blending the starting and ending zoom levels:

scale = end_scale * alpha + start_scale * (1 - alpha)

With the necessary transformations and settings established, the next block of code allows you to generate each frame based on the defined parameters, creating a series of frames to compile into a video.

import cv2
import numpy as np

num_frames = int(fps * duration)
frames = []
for alpha in np.linspace(0, 1, num_frames):
    # Calculate new pan and zoom
    rx = end_center[0] * alpha + start_center[0] * (1 - alpha)
    ry = end_center[1] * alpha + start_center[1] * (1 - alpha)
    x = int(orig_shape[1] * rx)
    y = int(orig_shape[0] * ry)
    scale = end_scale * alpha + start_scale * (1 - alpha)

    # Size calculations for cropping based on aspect ratio
    if orig_shape[1]/orig_shape[0] > video_dim[0]/video_dim[1]:
        h = int(orig_shape[0] * scale)
        w = int(h * video_dim[0] / video_dim[1])
    else:
        w = int(orig_shape[1] * scale)
        h = int(w * video_dim[1] / video_dim[0])

    # Crop and scale
    cropped = crop(img, x, y, w, h)
    scaled = cv2.resize(cropped, dsize=video_dim, interpolation=cv2.INTER_LINEAR)
    frames.append(scaled)

# Write to MP4 file
vidwriter = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, video_dim)
for frame in frames:
    vidwriter.write(frame)
vidwriter.release()

By creating the video, the VideoWriter object is utilized to compile the frames into a continuous stream.

Writing the Video File

In the previous section, you initialized the VideoWriter object:

vidwriter = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, video_dim)

Unlike image file formats (like JPEG or PNG), the video format must be explicitly specified with a four-character code (FourCC). You can refer to various resources to discover compatible FourCC codes:

FourCC Codes

To check whether a specific FourCC code is usable, you can run a quick test:

try:
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    writer = cv2.VideoWriter('temp.mkv', fourcc, 30, (640, 480))
    assert writer.isOpened()
    print("Supported")
except:
    print("Not supported")

Conclusion

In this tutorial, you learned how to create a video using OpenCV by compiling a series of images with the Ken Burns effect. You explored techniques for:

Cropping images using NumPy array slicing.
Resizing images with OpenCV functions.
Calculating parameters for zoom and pan via affine transformations.

This method provides a creative way to visualize data and results effectively.

Transforming Images and Creating Video with OpenCV

Overview

Creating the Ken Burns Effect

Writing the Video File

Conclusion

Further Reading

Leave a Comment Cancel reply