Have you wondered how self-driving cars recognize objects on the roads and identify what direction to go in? Or how a computer in a car recognizes things around it? Or, how an electronic device made up of circuits and processors understands what’s in front of its camera?
It’s called Computer Vision – it detects and identifies objects (road signs, objects, or traffic lights). And this field of AI is what’s played a critical role in the development of autonomous vehicles.
The question, however, to ask is – how can a computer “see” things?
Let’s understand it here…
In very simple terms, computer vision is the method by which machines interpret things that are available in the form of images. We learned to associate things with words from the time we could crawl. It is easy for us to do so because of our cognitive capabilities as well as our ability to communicate effectively. But this isn’t the case with computers. They can neither speak nor think for themselves (of course, that is until artificial intelligence came trotting along). So how do computers read and understand images? For starters, they don’t look at images like we do. They understand images as pixels and allocate a specific value to each pixel in an image.
Shapes ‘X’ and ‘O’ represented using pixels (Source: Code.org)
As an example, consider two shapes ‘X’ and ‘O’. Put these characters into an image matrix that consists of a finite number of pixels. In traditional programming, we tell the computer that if the pixels at the top-most corners and the center of the matrix are coloured, it means that the matrix contains the shape ‘X’. If the corners and the center are not coloured, the matrix contains ‘O’. But what happens when the shapes do not occupy the whole of the image matrix? This is where machine learning plays an important role. This machine learning-based system is fed with thousands of images that belong to different shapes. The system then makes guesses about each of these images. With every pass, the system learns to distinguish between different shapes. Voila! We now have a system that can differentiate between different shapes!
The same technique is applied to any kind of computer vision. Be it recognizing roads, patterns, animals, or even faces, the machine learning algorithm learns from trial and error during the training phase and creates a “statistical model” (or a guesser, if you will). This is where things get a little technical. When we say that the machine learning algorithm helps to identify patterns from images, what we mean is that it employs “neural networks” with dozens of layers within them for input, pattern detection, and output.
There is another area that has been gaining interest over the years – image processing. Sometimes, image processing and computer vision are used interchangeably. But are they the same thing? The short answer is, no. Image processing, well, processes images where the input and the output are both images. As we saw earlier, computer vision tries to make sense out of an input which is an image! However, they can co-exist within the same framework. Computer vision may employ image processing techniques to obtain clearer and crisper images for recognition.
With the field’s market size expected to grow to around USD 43 billion within the year, computer vision is not showing any signs of slowing down. Computer vision has been rapidly changing the way computers learn and has simplified thousands of processes over the years. From having to program workflows to artificial intelligence-based techniques developing their own algorithms for precise decision making, computing has come a long way. Computer vision has helped machines in performing tasks including identifying objects, classifying them, matching different features, detecting patterns, and recognizing faces.
When it comes to the application of computer vision, one of the most prominent examples is its use in autonomous vehicles. The plethora of functions computer vision takes care of every second while the autonomous vehicle is plying on the road is simply astounding. It has also been transforming the way machines are used in areas such as manufacturing, facial recognition services, financial services (analyzing bills, delivering verdicts based on graphs), agriculture, content moderation on social media (where potentially offensive content may be removed or flag without the need for manual verification), healthcare (detection of diseases in the human body), surveillance (by governments or other agencies to identify people engaged in antisocial activities), and e-commerce (categorizing products without the need for human intervention). With more and more research being carried out in the field, we might be looking at a future where even the blind could depend on computer vision to identify things.