I've started refreshing my knowledge of predictive modelling and am reminded of something that has always struck me with the way we teach students about machine learning: we, not the machines, do most of the thinking.
How can this be so? Don't machines learn by themselves?
I'm reading about "supervised learning"(1), where the model is based on observing variables and looking for a relationship between the variables and some known outcome. The goal is to find a way to predict the outcome based on the observed variables. Well, I guess the machine is learning, but my experience with learning has a lot more to do with figuring out HOW to look at something -- how to aggregate, weight, and disregard in the presence of an enormous quantity of variation, finding patterns buried in noise.
But in most cases, we limit the machine in what it can learn because we limit what the machine can observe. WE choose what the machine pays attention to, and then tell it to do the best it can do. It is like putting blindfolds over my eyes, then telling me to learn to predict where a ball will land based on wind-speed, sound, and air temperature instead of giving me eyes to see the ball.
For the first time since machines have been invented, we are finally taking off the blindfolds and giving machines more to look at by using the internet as an input. The internet has the requisite variety(2) to become a sense organ. And hence the mad dash to make sense of all this internet data. "Data-miners" are pounding larger and larger data sets through machine learning algorithms.
But who is writing the algorithms to specify the models? How are we teaching machines to learn on their own? What are we asking them to learn? Are we giving them the eyes to see? Or are we limiting their learning?
And what happens when the machines learn faster and better than we do?