Computer Vision for Beginners: Part 1
Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing.
By Jiwon Jeong, Data Science Researcher at Yonsei University and Project Instructor at DataCamp
Computer Vision is one of the hottest topics in artificial intelligence. It is making tremendous advances in self-driving cars, robotics as well as in various photo correction apps. Steady progress in object detection is being made every day. GANs is also a thing researchers are putting their eyes on these days. Vision is showing us the future of technology and we can’t even imagine what will be the end of its possibilities.
So do you want to take your first step in Computer Vision and participate in this latest movement? Welcome you are at the right place. From this article, we’re going to have a series of tutorials on the basics of image processing and object detection. This is the first part of OpenCV tutorial for beginners and the complete set of the series is as follows:
- Understanding color models and drawing figures on images
- The basics of image processing with filtering
- From feature detection to face detection
- Contour detection and having a little bit of fun
The first story of this series will be about installing OpenCV, explaining color models and drawing figures on images. The complete code for this tutorial is also available on Github. Now let’s get it started.
Introduction to OpenCV
Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing. We do image processing to manipulate the pictures for extracting some useful information from them. We can reduce noises, control the brightness and color contrast. To learn detailed image processing fundamentals, visit this video.
OpenCV stands for Open Source Computer Vision library and it’s invented by Intel in 1999. It’s first written in C/C++ so you may see tutorials more in C languages than Python. But now it’s also getting commonly used in Python for computer vision as well. First things first, let’s set up a proper environment for using OpenCV. The installation can be processed as follows but you can also find the detailed description here.
After you finish the installation, try importing the package to see if it works well. If you get the return without any errors, then you’re now ready to go!
The first step we’re going to do with OpenCV is importing an image and it can be done as follows.
Have you ever been to Burano? It’s one of the most beautiful islands in Italy. If you haven’t been there, you should definitely check this place for your next holidays. But if you already know this island, you’d probably notice there’s something different in this picture. It’s a little bit different from the pictures we usually see from Burano. It should be more delightful than this!
This is because the default setting of the color mode in OpenCV comes in the order of BGR, which is different from that of Matplotlib. Therefore to see the image in RGB mode, we need to convert it from BGR to RGB as follows.
Now, this is Burano! Such a lovely island in Italy!
More than just RGB
Let’s talk about color modes a little bit more. A color model is a system for creating a full range of colors using the primary colors. There are two different color models here: additive color models and subtractive color models. Additive models use light to represent colors in computer screens while subtractive models use inks to print those digital images on papers. The primary colors are red, green and blue (RGB) for the first one and cyan, magenta, yellow and black (CMYK) for the latter one. All the other colors we see on images are made by combining or mixing these primary colors. So the pictures can be depicted a little bit differently when they are represented in RGB and CMYK.
You would be pretty accustomed to these two kinds of models. In the world of color models, however, there are more than two kinds of models. Among them, grayscale, HSV and HLS are the ones you’re going to see quite often in computer vision.
A grayscale is simple. It represents images and morphologies by the intensity of black and white, which means it has only one channel. To see images in grayscale, we need to convert the color mode into gray just as what we did with the BGR image earlier.
Actually, RGB images are made up by stacking three channels: R, G, and B. So if we take each channel and depict them one by one, we can comprehend how the color channels are structured.
Take a look at the images above. The three images show you how each channel is composed of. In the R channel picture, the part with the high saturation of red colors looks white. Why is that? This is because the values in the red color parts will be near 255. And in grayscale mode, the higher the value is, the whiter the color becomes. You can also check this with G or B channels and compare how certain parts differ one from another.
HSV and HLS take a bit different aspect. As you can see above, they have a three-dimensional representation, and it’s more similar to the way of human perception. HSV stands for hue, saturation and value. HSL stands for hue, saturation and lightness. The center axis for HSV is the value of colors while that for HSL is the amount of light. Along the angles from the center axis, there is hue, the actual colors. And the distance from the center axis belongs to saturation. Transforming the color mode can be done as follows.
But why do we have to transform the colors? What are these for? One example that can give the answer is lane detection. Please take a look at the picture below. See how the lanes are detected in different color modes. During the computer vision task, we do multiple color mode transformation along with masking. If you’d like to find more about how image processing is applied in the lane detection task, feel free to check out this post by nachiket tanksale.
Now I believe you get the idea. Image processing is ‘data preprocessing.’ It’s reducing noises and extracting useful patterns to make classification and detection tasks easier. Therefore all these techniques including the ones we’ll discuss later, are for helping the model to detect the patterns easier.
Drawing on images
Let’s bring some figures on the image. Now, we’re going to Paris. Have you ever heard of the wall of love? It’s a wall which is filled with the words “I love you” in all kinds of international languages. What we’re going to do is finding the words in our language and marking them with a rectangle. As I’m from South Korea, I’ll look up for ‘I love you’ in Korean. First, I’ll make a copy of the original image and then draw a rectangle with
cv2.rectangle()We need to give the coordinates values for the upper left point and the lower right point.
Great! I think I caught the right position. Let’s try again. I can see one more Korean word from the image so I’ll make a circle this time. With
cv2.circle() , we need to specify the point of its center and the length of its radius.
We can also put text data on the image. Why don’t we write the name of this wall this time? With
cv2.putText() , we can designate the position and the font style and size of the text.
This is really a “lovely” wall, isn’t it? Try this yourself and find “I love you” in your language! ????
More than images
Now we’ve been to Italy and France. Where would you like to go next? Why don’t we put a map and mark the places? We’re going to create a window and draw figures not by designating the points but by clicking directly on the window. Let’s try a circle first. We first create a function which will draw a circle with the data for the position and clicking of the mouse.
cv2.EVENT_RBUTTONDOWN , we can bring the data for the position when we press the buttons of the mouse. The position of the mouse will be
(x, y) and we’ll draw a circle whose center is at that point.
We’ll set a map as the background of the window and name the window as my_drawing. The name of the window can be anything, but it should be the same because this acts like the id of the window. Using the
cv2.setMouseCallback() , we make a connection between the window and the function
draw_circle we made at step 1.
Now we execute the window using while loop. Don’t forget to set the break unless you are making an infinite loop. The condition of the if clause is setting the window to be shut down when we press ESC on the keyboard. Save this as a file and import it on your terminal. If you’re to use jupyter lab, put the codes in one cell and execute. Now, tell me! Where do you want to go?
Let’s try a rectangle. As a rectangle requires two points for pt1 and pt2 in
cv2.rectangle() , we need an additional step to set the first click point as pt1 and the last point as pt2. And we’re going to detect the movement of the mouse with
We first define
drawing = False as a default. When the left button is pressed,
drawing becomes true and we give that first position as pt1. If drawing is on, it’ll take the current point as pt2 and keep drawing rectangles while we move the mouse. It’s like overlapping the figures. When the left button is up,
drawing becomes false and it takes the last position of the mouse as its final point of pt2.
draw_circle function to
draw_rectangle in step 1. Please don’t forget to make a change inside the callback function,
cv2.setMouseCallback() as well. So the whole code script will be as follows. Save this script file and run it on the terminal or the jupyter notebook.
Did you enjoy the first time with OpenCV? You can also try other functions such as drawing a line or a polygon. Feel free to check the documentation for it, which can be found here. Next time, we’re going to talk about more advanced technologies such as attaching two different images, image contour and object detection.
Are there errors you would love to correct? Please share your insight with us. I’m always open to talk, so feel free to leave comments below and share your thoughts. I also share interesting and useful resources on LinkedIn so feel free to follow or reach out to me. I’ll be back again with another interesting story next time!
Bio: Jiwon Jeong, is a Data Scientist currently undertaking a Master's degree in Industrial Engineering and is a Project Instructor for DataCamp.
Original. Reposted with permission.
- End-to-End Machine Learning: Making videos from images
- Preprocessing for Deep Learning: From covariance matrix to image whitening
- Basic Image Data Analysis Using Numpy and OpenCV – Part 1