We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. We compare the

Unlocking the power of time-series data with multimodal models

submited by

Style Pass

2024-12-02 20:00:18

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

We compare the performance of multimodal models on the understanding of time-series data when presented visually as plots compared to numerical values. We find significant performance improvements when presented with plots on tasks like fall detection.

The successful application of machine learning to understand the behavior of complex real-world systems from healthcare to climate requires robust methods for processing time series data. This type of data is made up of streams of values that change over time, and can represent topics as varied as a patient’s ECG signal in the ICU or a storm system moving across the Earth.

Highly capable multimodal foundation models, such as Gemini Pro, have recently burst onto the scene and are able to reason not only about text, like the large language models (LLMs) that preceded them, but also about other modalities of input, including images. These new models are powerful in their abilities to consume and understand different kinds of data for real-world use cases, such as demonstrating expert medical knowledge or answering physics questions, but haven’t yet been leveraged to make sense of time-series data at scale, despite the clear importance of this type of data. As chat interfaces mature generally across industries and data modalities, products will need the ability to interrogate time series data via natural language to meet user needs. When working with time series data, previous attempts to improve performance of LLMs have included sophisticated prompt tuning and engineering or training a domain specific encoder.