SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data

submited by
Style Pass
2024-04-30 17:00:09

SatBird is a dataset and benchmark for the task of predicting bird species encounter rates jointly at a specific location using remote sensing data. The dataset was obtained from publicly available eBird bird observation records, Sentinel-2 satellite data, and WorldClim and SoilGrids environmental data. SatBird is composed of 3 sub-datasets: (i) USA summer dataset, generally corresponding to the breeding season, (ii) USA winter dataset, the nonbreeding season, (iii) Kenya dataset, as an example of a low-data regime. 670 and 1054 bird species are considered for the USA and Kenya respectively.

SatBird is designed with the goal of completing species distribution mapping in places that have yet not been surveyed, leveraging the presence-absece nature of complete checklists in the eBird database.

Overview of data streams for SatBird, as well as inputs and outputs for the task for predicting species encounter rates. Sentinel-2 10m-resolution satellite data can be used along low resolution environmental data as input to a model after matching their resolutions. Labels are derived from eBird complete checklists. Observations of vagrants (migrating birds) in the labels are corrected with range maps from eBird, which can also be incorporated in the model to make it geography-aware.

Leave a Comment