Fast Window Capture - OpenCV Object Detection in Games #4
Learn how to capture window data in real-time as a video stream for processing with OpenCV. We try several different methods searching for the fastest way possible. In this tutorial I also discuss the importance of good Google search skills as a programmer, and we revisit some basic object-oriented programming concepts.
Full tutorial playlist:
Grab the code on GitHub:
Research discussed:
OpenCV getting started with videos:
Fastest way to take a screenshot with Python on Windows:
Convert PyAutoGUI image to OpenCV:
Convert SaveBitmapFile to an OpenCV image instead:
Best Numpy reference:
List all your windows:
0:47 Main capture loop
3:16 Using PyAutoGUI screenshot()
6:24 Measure FPS
8:27 Using Pillow ImageGrab
9:31 Using Pywin32 for screenshots
13:48 Converting () for OpenCV
17:23 Confining screenshots to a specific window
18:42 Creating a WindowCapture class
25:05 Trimming the window capture
28:15 Image to screen position conversion
29:35 Wrap up
Read the full written tutorial with code samples here:
Up to this point, we’ve been using OpenCV to detect objects in static images. Now we’re ready to apply those same techniques to video games in real time.
Remember that video is just a series of still images shown in rapid succession. In this tutorial our goal is to capture screenshots as fast as possible and display them in an OpenCV window, so that we get a real time video stream of the game we’re interested in. This will set us up to begin processing image data with OpenCV in real-time.
OpenCV has a tutorial on “Getting Started with Videos“ that will serve as the basis for our code. Our starting point differs from the official tutorial only in that we are preparing to work with screenshot data instead of frames from a camera.
When defining get_screenshot() you could simply use () from the PyAutoGUI library, or () from the Python Image Library.
And this would work, but there are several benefits to calling the Windows API directly instead. Firstly, we approach the theoretical limit for how fast we can take these screenshots by dealing right with the operating system itself. Secondly, the Windows API has methods that will allow us to grab the screen data for only the window we’re interested in, even when it’s minimized or off screen.
To do this, we must first pip install pywin32 to get access to the Win32 API in Python.
Let’s start with some code to capture a single screenshot of our entire desktop and save that to a file. This will confirm for us that the Windows API calls are working. By calling this function, you should end up with a screenshot file.
The next step is to modify this function so that instead of saving an image file, it instead returns the image data, formatted to work with OpenCV.
Now we can call this function from our original infinite loop and get a real-time stream of our desktop.
To improve upon this, we can use (None, window_name) to capture just the window we’re interested in. Replace the window_name with a string that contains the name found in the title bar of the window you want to capture. Doing so will allow you to capture the frames from that window even when it’s hidden behind other windows.
If you’re having trouble figuring out the name of the window you want, you can use this code to list the names of all your existing windows:
We can improve our code further by trimming off the excess around the window we’re interested in. When you run the above code, you will notice black space to the right and below the window image, as well as the window borders and title bar. Removing these will not only clean things up, it will also improve our frame rate. We can also get improvements by not calling () on every call to get_screenshot(), so let’s turn this into a class.
Finally, we’ll need a way to convert positions we detect in our screenshots back to pixel positions on our actual monitor. In the WindowCapture class constructor, I’ve already included code to calculate the window offset using the window position data from (). Let’s add a method to our class that uses this offset to return that converted screen position.
Continue with the written tutorial here: