Statistical algorithms have long been widely used for making forecasts with time series data. These classical algorithms, like Exponential Smoothing, and ARIMA models, prescribe the data generation process and require manual selections to account for factors like the trend, seasonality, and auto-correlation. However, modern data applications often deal with hundreds or millions of related time series. For example, a demand forecasting algorithm at Amazon may have to consider sales data from millions of products, and an engagement forecasting algorithm at Instagram may have to model metrics from millions of posts. Traditional forecasting methods learn characteristics of individual time series, and hence do not scale well because they fit a model for each time series and do not share parameters among them.
Deep learning provides a data-driven approach that makes a minimal set of assumptions to learn from multiple related time series. In the previous article, I did a detailed literature review on the state of statistical vs machine learning vs deep learning approaches for time series forecasting1. It is important to note that deep learning methods are not necessarily free of inductive biases. While DL models make very few assumptions about the features, inductive biases creep into the modeling process in the form of the architectural design of the DL model. This is why we see certain models perform better than others on certain tasks. For example, Convolutional Neural Networks work well with images due to their spatial inductive biases and translational equivariance. Hence a careful design of the DL model based on the application domain is critical. Over the past few years, several new DL models have been developed for forecasting applications. In this article, we will go over some of the most popular DL models, understand their inductive biases, implement them in PyTorch and compare their results on a dataset with multiple time series.