We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. We develop a se

Relationships are complicated! An analysis of relationships between datasets on the Web

submited by
Style Pass
2024-11-08 04:30:02

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

We develop a series of methods to automatically identify relationships between datasets on the Web and compare their performance on a large corpus of datasets generated from Web pages with schema.org markup.

The Web has millions of datasets, and that number continues to grow rapidly. Many of these datasets are intricately connected through complex relationships. Google Dataset Search helps users navigate this landscape by indexing metadata from diverse sources (e.g., government, academic, and institutional repositories) and allowing users to search for datasets based on topics, formats, publication dates, and more. Understanding the relationships between datasets, particularly from the perspective of data practitioners, is critical for research and decision-making.

Consider a few examples. When a scientist works on reproducing experimental results from a publication, she must identify which specific dataset snapshot a publication has used. When evaluating the trustworthiness of a dataset available on multiple platforms, users may want to choose the repository that they trust the most. If a researcher wants to compare slices of a large dataset, she must ascertain that these slices come from the same snapshot of the larger dataset. All these tasks require understanding the semantics of relationships between datasets.

Leave a Comment