If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

How computer vision works

Computer Vision is a form of machine learning used in self-driving cars, facial recognition systems, and sustainable farming. Find out how a computer learns to classify images, how it can build from simple shapes to more complex figures, and why it’s so difficult for a computer to tell the difference between a chihuahua and a muffin.

Featuring
Alejandro Carrillo (Farmwise) who builds next-generation farming robots that use computer vision to grow crops more efficiently.
Kate Park (Tesla) who works on Tesla Autopilot's self driving cars.

Start learning at http://code.org/

Stay in touch with us!
• on Twitter https://twitter.com/codeorg
• on Facebook https://www.facebook.com/Code.org
• on Instagram https://instagram.com/codeorg
• on Tumblr https://blog.code.org
• on LinkedIn https://www.linkedin.com/company/code-org
• on Google+ https://google.com/+codeorg

Produced and Directed by Jael Burrows
Co-produced by Kristin Neibert
Written by Hadi Partovi, Mike Harvey, Winter Dong, Erin Bond, Dan Schneider and Jael Burrows
Camera by Bow Jones.
Created by Code.org.

Want to join the conversation?

Video transcript

Hi! My name is Alejandro Carrillo, and i'm a  robotics engineer at an agricultural company. Specifically my team uses  machine learning and robotics and computer vision, to identify the  difference between the crops that we eat, and weeds that take nutrients away. We're able  to remove those weeds without any chemicals. My name is Kate Park and I work at Tesla  Autopilot. I build self-driving cars. Any place where there can be resources used  more efficiently, is a place where technology can play a role. But of course, one of the best  impactful ways of AI is through self-driving cars. Have you ever wondered how a computer can  recognize a face, or drive a car? Or maybe you've wondered why it's so hard for a computer to tell  the difference between a dog and a bagel? Well it all has to do with something called computer  vision: the way machines interpret images. Let's take a look at a simple example of how  computers learn to see. Here are two shapes: an X and an O. At some point you've learned  the names for these shapes, but a computer looking at these images for the first time just  sees a bunch of little squares, called pixels. Each pixel has a numerical value for a  computer to see. It needs to make sense of these numbers to figure out what is in  the picture. In traditional programming, you could tell the computer to check which  pixels are filled to decide what shape it sees. If the center and corner pixels  are full, then it's an X. If the center and corner pixels are empty, then  it's an O. Traditional programming works great for this kind of thing, but what about asking the  computer to recognize these images? What might the computer think these are? We gave the computer  a strict definition of what an X looks like, but these images don't fill all the  necessary pixels to fit the definition. So if the computer doesn't  think these are X's at all, in fact the computer thinks these are O's  because the corners and center pixels are blank, and that fits the definition  of an O that we gave it. In this example, traditional programming only  works some of the time, but with machine learning, we can teach the computer how to recognize shapes  no matter their size, symmetry, or rotation. Teaching a computer requires thousands or even  millions of examples of training data, and a whole lot of trial and error. So let's start training!  Here are some simple shapes we can use to train the computer to see. At first the computer is  completely clueless, and makes a totally random guess from a preset group of options, and it  guesses wrong. But that's okay, because this is where the computer learns. After it makes a  guess, the computer is shown the correct answer. It's like learning with flashcards: sometimes  you have to get it wrong before you get it right. With every guess, the computer looks at  each pixel and the surrounding pixels. It tries to recognize patterns  and make rules to help it guess, like if it sees a row of orange  pixels next to a row of white pixels, there's an edge. If the computer sees two edges  oriented a certain way, say a 90 degree angle, then it's likely to guess that it's looking  at a square. It won't get it right every time, but with more trial and error, it will slowly  build a more confident guessing algorithm. Whether it's trying to guess shapes,  animals, or any other category, machine learning finds patterns by learning  from its mistakes. The training data is used to make a statistical model, which is just  a fancy way of saying a guessing machine. When we give it training data, the guessing  machine is tuned and optimized to recognize the pictures we gave it, with the hope that  it will then be able to recognize new pictures with the same accuracy. It may seem easy to  tell the difference between an X or an O, or to even categorize basic shapes,  but most images aren't that simple. Let's take a look at how computer vision  can learn to recognize complex images, or scenes like ones in the real world. Most complex images can be broken down into small  simple patterns. For example, an eye is made up of two arcs and some circles inside. A wheel is made  up of concentric circles and some radial lines. The way a computer recognizes the patterns  in all these pixels, is by using a neural network made of many layers. The first layer of  neurons takes pixel values as numerical inputs, to identify edges. The next few layers of neurons  take those edges and try to detect simple shapes, until finally the computer puts  it all together to understand. It can take hundreds of thousands, or even  millions of labeled images, to train a computer vision system. But sometimes even that's not  enough! Some face recognition systems have trouble even seeing people of color, because the system  was primarily trained with photos of white people. Sometimes problems in computer vision are  silly, like when a computer gets confused trying to tell the difference between  these dogs. Oh wait, that's not a dog! But it does kind of look like  a dog. At least this dog. But as society relies on computer vision for real  problems, like detecting diseases and medical imagery, or helping a self-driving car identify  pedestrians, it becomes increasingly important that we all understand how these systems work and  what types of problems they're appropriate for. Computer vision can open up a  miraculous world of possibilities, but a machine is ultimately only as  good as the data used to train it.