I’m thrilled that former students and postdocs of mine won both of this year’s NeurIPS Test of Time Paper Awards. This award recognizes papers published 10 years ago that have significantly shaped the research field. The recipients included Ian Goodfellow (who, as an undergraduate, built my first GPU server for deep learning in his dorm room) and his collaborators for their work on generative adversarial networks, and my former postdoc Ilya Sutskever and PhD student Quoc Le (with Oriol Vinyals) for their work on sequence-to-sequence learning. Congratulations to all these winners!
By nature, I tend to focus on the future rather than the past. Steve Jobs famously declined to build a corporate museum, instead donating Apple's archives to Stanford University, because he wanted to keep the company forward-looking. Jeff Bezos encourages teams to approach every day as if it were “Day 1,” a mindset that emphasizes staying in the early, innovative stage of a company or industry. These philosophies resonate with me.
But taking a brief look at the past can help us reflect on lessons for the future. One takeaway from looking at what worked 10 to 15 years ago is that many of the teams I led bet heavily on scaling to drive AI progress — a bet that laid a foundation to build larger and larger AI systems. At the time, the idea of scaling up neural networks was controversial, and I was on the fringe. I recall distinctly that, around 2008, Yoshua Bengio advised me not to bet on scaling and to focus on inventing algorithms instead!