Introduction#

Cloud#

Cloud computing refers to a system of rather loosely connected nodes. There are many cloud providers, though three are on the top list: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure Cloud (Azure). Each of these cloud providers offer a free Python Hub access.

Cloud centers are distributed around the world, to enable global fast access to computing resources. The centers are called regions.

Usually, storing data on Clouds is the most expensive of geoscience research. One would typically find the 1) data storage/archive, then 2) choose the cloud provider. For optimal I/O performance, it is recommended to choose the region where the data is stored for compute.

Watch a presentation from eScience Institute Cloud experts Naomi Altermann (UW) and Rob Fatland (UW) regarding cloud computing: Presentation and their video tutorial:

Google Colab#

If you have a google account, you can access to a free tier GCP instance that uses CPU, or GPU, or TPU (Tensor Processing Unit).

Here is an example of a Google Colab: Open in Colab

AWS#

AWS is the Amazon services for cloud. It is the cloud leader. Chapter 7 details access and usage of these resources.

Their JupyterHub for machine learning is ran out of Sagemaker Studio. The first 250 hours of use (within the first 2 months) are free.

Why use AWS in the geosciences? It stores already lots of open access data. AWS also gathers Sagemaker notebooks associated with these open data for machine-learning purpose. See the notebook catalog.

Cool geoscience data sets stored on the S3 (storage service) of AwS are. Radiant MLHub stores data on S3.

Some specific data set that could be used in this book:

  • Seismic Data

    • Southern California Seismic Network. Here.

    • Northern California Earthquake Data Center here

    • Distributed Acoustic Sensing (DAS) PoroTomo experiment. Here.

    • OpenEEW: low cost seismometers distributed in populated areas. Here

Azure#

Azure is the Microsoft cloud computer. Chapter 7 details access and usage of these resources.

The JupyterHub free access of Azure is called the Planetary Computer.

Cool data sets to access directly on Azure that focus on oceans, atmosphere, surface land, demographics. Example below: