On January 24, I attended a 1-day data science symposium at Harvard University with the fun title ‘Weathering the Data Storm‘. I imagine being in a tiny boat on the endless beautiful sea of data, and then a big data storm comes up! Numbers and pieces of text fly through the air… they hit me hard in the face like hail, pile up in my boat… and I’m in dire need of some clever algorithms to take care of all that data, so that I won’t get hurt, my boat won’t sink!
In line with the fun title, there were lots of fun talks. The funniest quote of the day clearly goes to Ryan Adams from Harvard University, when he introduced a new name for a common machine learning ‘method’: grad student descent. He talked about a ‘meta-problem’ of machine learning: Most machine learning algorithms are sufficiently complex to give great results – if they are run with parameters that are adapted to the problem at hand. For example, to work with a neural network you have to choose the number of layers, the weight regularization, the layer size, which non-linearity, the batch size, the learning rate schedule, the stopping conditions… How do people choose these parameters? Mostly with ad hoc, black magic methods. One method, common in academia, is ‘grad student descent’ (a pun on gradient descent), in which a graduate student fiddles around with the parameters until it works. It’s kind of sad, but it’s so true! Of course, Ryan Adams then went on to discuss better solutions (‘meta-algorithms’ that automatically find the parameters), but it was the ‘grad student descent’ that stuck to everyone’s mind.
Rachel Schutt form News Corps mused on the perennial question ‘What is a data scientist?’ She cited the well-known definition by Josh Wills from Cloudera, which I really like: