# How can we quantify similarity between time series?

submited by
Style Pass
2021-06-10 17:30:04

Time never stops — everything in our world is in constant motion. Without getting too much into the physics or philosophy of this simple statement, a direct consequence of this is that pretty much everything we know can be described as a series of events, or a series of measurements for the more data-interested among us. This is what we call a time series. A time series can contain various information about many different aspects of life — think of daily temperature curves, currency exchange rates and stock ratings, the speed and location of an airplane, ocean tide levels…

Here at Gorilla we work a lot with electricity consumption data — we need to understand the electricity needs of each meter as well as possible in order to produce accurate forecasts which can be used to calculate quotes and charges. This is relatively easy to do if historic consumption data are available, but much harder if they are not — the latter scenario is known as Synthetic Load Profiling. For the UK market an in-depth procedure has been developed by Elexon, which manages electricity balancing and settlement there. Outside of the UK however, such centrally agreed upon methods are usually not available and so we are developing a classification scheme ourselves: the goal is to analyze historic usage data of different consumers, determine which consumers follow similar usage patterns and decide on a number of parameters which are most useful for separating consumers into groups with similar usage. Examples of such parameters could be the type of business (bakeries will have different consumption patterns than gas stations) or the consumer location (a household’s consumption in Scotland will follow a different seasonal pattern than in London).

In this blog post we will have a look at how we decide which time series are similar and which are not — a basic but important issue to consider before we can even start solving the overarching problem. In the figure below you can see a set of different time series which we will compare to one another using different (dis)similarity metrics. These series are all manipulations of the same consumption curve (an averaged winter week of one meter’s consumption) which was extracted from the SmartMeter Energy Consumption Data in London Households. However, the considerations below are quite generalised and applicable to different problems outside of the utility market.