What is computer vision? In simple terms it’s an attempt to get computers to process visual input like humans do. In a way, the principles driving computer vision arise from the same principles that drive our own visual perception (i.e. the process by which our minds construct meaning from raw retinal information).
Let me elaborate on that. At any given moment, the input that each of us receives from our eyes is a flat image. It’s entirely up to the human mind, with all of its experiences, to make sense of that data. And if humans can do it, why can’t computers? (Every researcher of computer vision has almost certainly wondered this at some point.) We can give the same flat images to a program, so why can’t we impart visual perception unto computers as well? One might imagine an input/output machine allowing computers to infer depth and texture from a photograph.
Unfortunately, it’s not as easy as it sounds. As of 2017, we don’t understand exactly how humans can interpret and embellish images the way they do.
So computer vision is about figuring that out, and ultimately imbuing computers with a sense of visual intelligence.