A journey of optimization of cloud-based geospatial data processing. Introducing a new approach to raster data access, harnessing the power of STAC GeoParquet and cloud-native workflows to push the boundaries of satellite imagery analysis.
The rapid growth of Earth observation data in cloud storage, which will continue to grow exponentially, powered by falling rocket launch prices by companies like SpaceX, has pushed us to think of how we access and analyze satellite imagery. With major space agencies like ESA and NASA adopting Cloud-Optimized GeoTIFFs (COGs) as their standard format, we're seeing unprecedented volumes of data becoming available through public cloud buckets. This accessibility brings new challenges around efficient data access patterns and resource utilization. In this article, we introduce an alternative approach to cloud-based raster data access, building upon the foundational work of GDAL and Rasterio while exploring optimizations specifically for cloud-native workflows. We hope this article can contribute to the collective efforts of the geospatial community in tackling some of the challenges of big data in the cloud era by trying out various approaches to problem-solving.
Traditional GeoTIFF files weren't designed with cloud storage in mind. Reading these files often required downloading entire datasets, even when only a small portion was needed. The introduction of COGs marked a significant shift, enabling efficient partial reads through HTTP range requests.