☁⛈️

No Two Clouds Are Alike!

Storage Challenges for Cloud-Optimized Geodata

Brandon Liu, Protomaps LLC

  • Protomaps is an end-to-end system for packaging and styling maps using static files
  • Cloud-native from the ground up (PMTiles)
  • open source components: github.com/protomaps
  • Mantle - vector basemap product on your own cloud
  • Lambda @ Edge / Cloudflare Workers
  • Vendor neutrality is a key part of value proposition

PMTiles spec is built on cloud-native principles

  • HTTP 206 Partial Content
  • Spatial index
  • commodity cloud storage
  • What I learned from real-world deployment of vector PMTiles on different clouds
  • applies to other cloud-optimized formats as well - FlatGeobuf, COG, COPC...

Challenge Areas

  1. Performance
  2. Security
  3. Freshness

Never underestimate the bandwidth of a station wagon full of tapes tiles hurtling down the highway...

"Cloud-optimized": optimized for latency, not bandwidth

  • Enables slippy-map user experience with overviews
  • Enables interactive queries without full scans
  • Enables extracting area of interest
  • HTTP/2 multiplexing is vital for concurrent HTTP requests from browser
  • Most vendors support only HTTP/1.1 on core storage platform; HTTP/2 feature of separate CDN product
  • Example: Google Cloud Storage supports HTTP/2

Byte Serving + Compression

  • Vector formats don't have de facto domain compression like PNG, JPG
  • 206 Partial Content incompatible with HTTP Content-Encoding
  • Browser DecompressionStream API not widely implemented
  • Multipart Range: bytes=0-100, 100-150 generally not supported by storage

Security

  • Serving files from storage direct to browser = good!
  • General use case involves cross-origin requests
  • example: S3 CORS configuration allows multiple origins
  • Other storage providers allow only one origin or wildcard *

the Mapster challenge

  • 206 Partial Content also allows download of entire dataset: No DRM!
  • Public hosting on cloud storage risks runaway costs
  • No cloud implements "206 only" access policy
  • Requester Pays a vendor-specific solution

Freshness

  • Even with read-optimized design, need updates
  • Replace whole file on cloud storage and use opaque ETag to detect changes
  • Implementations and caches can use ETag for staleness check
  • Spec designs can consider this - no sidecars!

Conclusions

  • Performance, Security, Freshness should influence spec and implementation design
  • Cloud Storage market is competitive + evolving
  • Don't build for AWS S3 only please 🙂

Thank you!