Python Script Tutorial: Download VIIRS Level 2 Aerosol Data Files from Amazon Web Services (AWS)

VIIRS Level 2 aerosol detection product (ADP) and aerosol optical depth (AOD) granules data files are freely available from NOAA's JPSS archive on AWS.

Reprocessed AOD and ADP data are available from SNPP for 2012-2020 and NOAA-20 for 2018-2020. Operational AOD and ADP data are available in near-real time with a latency of several hours; the data archives for SNPP & NOAA-20 begin on 29 October 2022.

Files on the AWS JPSS VIIRS archive are organized by date. There is no interface to subset data files by entering a geographic domain using latitude/longitude corners. To work around this shortcoming, users can search for files using VIIRS observation start/end times. In this way, the total VIIRS files available each day (global coverage) can be subsetted for a specific geographic region.

Using Jupyter Notebook format, this tutorial demonstrates how to:

  1. Enter parameters to search for VIIRS ADP or AOD Level 2 granules data files on AWS for an observation date and time period of interest, including:
    • Satellite (SNPP or NOAA-20)
    • Product (ADP or AOD)
    • Data Processing (operational or reprocessed)
    • Observation Year
    • Observation Month
    • Observation Day
    • Observation Start Time (UTC)
    • Observation End Time (UTC)
  2. Query AWS for available files matching the user-entered search parameters
  3. Download the available data files to the user's local computer/server

Please acknowledge the NOAA/NESDIS/STAR Aerosols and Atmospheric Composition Science Team if using any of this code in your work/research!


Import Python packages

The first step is to import all of the Python packages we need for the entire script. We will use the S3Fs library to access the AWS S3 using anonymous credentials.

# Import Python packages

# Library to perform array operations
import numpy as np

# Module to interface with Amazon Simple Storage Service (S3)
import s3fs

# Module for manipulating dates and times
import datetime

# Library to create progress bars for loops/functions
from tqdm import tqdm

# Module for accessing system-specific parameters and functions
import sys

# Library to access core utilities for Python packages
from packaging.version import parse

# Module to set filesystem paths appropriate for user's operating system
from pathlib import Path

# Modules to create interactive menus in Jupyter Notebook
from IPython.display import display
import ipywidgets as widgets

Enter search parameters using Jupyter Widgets menus

We need a way for the user to enter the parameters for their AWS search. In Jupyter Notebook, we can use Jupyter widgets to make user-friendly GUI pull-down menus for entering search variables.

First, run this code block to generate the interactive menus. Then, use the menus to select the satellite, aerosol product, and observation year/month/day and start/end time for the AWS search. The search parameters are input to the rest of the script from the menus via the main function, by reading the ".value" of each menu variable.

To find the VIIRS observation times that correspond to a specific geographic region, go to the JSTAR Mapper website; use the menus under "Layer 1" to find the date/satellite of interest, click on VIIRS "granules" in the "Other layers" section, and then hover your cursor over the granule(s) of interest on the map to see the observation time. This screenshot illustrates how to access the VIIRS granules layer. If you want to download all of the data (global coverage) for the day of interest, set the start time as "00:00" and end time as "23:59" in the widget menus, but be warned that there are ~550 VIIRS granules each day and individual aerosol files can be as large as 100-200 MB!

It is only necessary to run the widgets code block once to generate the menus. Subsequently, selections in the menus can be changed, and multiple consecutive AWS searches can be run via the main function, without re-running this widgets code. If this code block is re-run, it will reset all the menus to their default values, and the user will need to re-select the search parameters of interest again before running the main function.

# Enter satellite, VIIRS aerosol product, observation date & start/end times for AWS search
# Selections are made using interactive Jupyter Notebook widgets
# Run this block *once* to generate menus
# When main function is run, it reads ".value" of each menu selection
# Do NOT re-run block if you change menu selections (re-running block resets menus to defaults)!

# Formatting settings for drop-down menus
style = {'description_width':'120px'}
layout = widgets.Layout(width='325px')

# Create drop-down menus using widgets
satellite = widgets.Dropdown(options=[('S-NPP', 'SNPP'), ('NOAA-20', 'NOAA20')], description='Satellite:', style=style, layout=layout)
product = widgets.Dropdown(options=[('Aerosol Detection Product', 'ADP'), ('Aerosol Optical Depth', 'AOD'), ], description='Product:', style=style, layout=layout)
processing = widgets.Dropdown(options=[('operational'), ('reprocessed'), ], description='Data Processing:', style=style, layout=layout)
year = widgets.Dropdown(options=[('2012'), ('2013'), ('2014'), ('2015'), ('2016'), ('2017'), ('2018'), ('2019'), ('2020'), ('2021'), ('2022'), ('2023')], description='Year:', style=style, layout=layout)
month = widgets.Dropdown(options=[('Jan', '01'), ('Feb', '02'), ('Mar', '03'), ('Apr', '04'), ('May', '05'), ('Jun', '06'), ('Jul', '07'), ('Aug', '08'), ('Sep', '09'), ('Oct', '10'), ('Nov', '11'), ('Dec', '12')], description='Month:', style=style, layout=layout)
day = widgets.Dropdown(options=[('01'), ('02'), ('03'), ('04'), ('05'), ('06'), ('07'), ('08'), ('09'), ('10'), ('11'), ('12'), ('13'), ('14'), ('15'), ('16'), ('17'), ('18'), ('19'), ('20'), ('21'), ('22'), ('23'), ('24'), ('25'), ('26'), ('27'), ('28'), ('29'), ('30'), ('31')], description='Day:', style=style, layout=layout)
shour = widgets.Dropdown(options=[('00'), ('01'), ('02'), ('03'), ('04'), ('05'), ('06'), ('07'), ('08'), ('09'), ('10'), ('11'), ('12'), ('13'), ('14'), ('15'), ('16'), ('17'), ('18'), ('19'), ('20'), ('21'), ('22'), ('23')], description='Start Hour (UTC):', style=style, layout=layout)
smin = widgets.Dropdown(options=[('00'), ('01'), ('02'), ('03'), ('04'), ('05'), ('06'), ('07'), ('08'), ('09'), ('10'), ('11'), ('12'), ('13'), ('14'), ('15'), ('16'), ('17'), ('18'), ('19'), ('20'), ('21'), ('22'), ('23'), ('24'), ('25'), ('26'), ('27'), ('28'), ('29'), ('30'), ('31'), ('32'), ('33'), ('34'), ('35'), ('36'), ('37'), ('38'), ('39'), ('40'), ('41'), ('42'), ('43'), ('44'), ('45'), ('46'), ('47'), ('48'), ('49'), ('50'), ('51'), ('52'), ('53'), ('54'), ('55'), ('56'), ('57'), ('58'), ('59')], description='Start Minutes (UTC):', style=style, layout=layout)
ehour = widgets.Dropdown(options=[('00'), ('01'), ('02'), ('03'), ('04'), ('05'), ('06'), ('07'), ('08'), ('09'), ('10'), ('11'), ('12'), ('13'), ('14'), ('15'), ('16'), ('17'), ('18'), ('19'), ('20'), ('21'), ('22'), ('23')], description='End Hour (UTC):', style=style, layout=layout)
emin = widgets.Dropdown(options=[('00'), ('01'), ('02'), ('03'), ('04'), ('05'), ('06'), ('07'), ('08'), ('09'), ('10'), ('11'), ('12'), ('13'), ('14'), ('15'), ('16'), ('17'), ('18'), ('19'), ('20'), ('21'), ('22'), ('23'), ('24'), ('25'), ('26'), ('27'), ('28'), ('29'), ('30'), ('31'), ('32'), ('33'), ('34'), ('35'), ('36'), ('37'), ('38'), ('39'), ('40'), ('41'), ('42'), ('43'), ('44'), ('45'), ('46'), ('47'), ('48'), ('49'), ('50'), ('51'), ('52'), ('53'), ('54'), ('55'), ('56'), ('57'), ('58'), ('59')], description='End Minutes (UTC):', style=style, layout=layout)

# Format observation start/end time hour and minutes menus to display side-by-side
start_time = widgets.HBox([shour, smin])
end_time = widgets.HBox([ehour, emin])

# Display drop-down menus
print('If you change menu selections (e.g., to run another search), do NOT re-run this block!\nRe-running will re-set all menus to their defaults!')
display(satellite, product, processing, year, month, day)
display(start_time, end_time)

The image below shows a screenshot of the output GUI, with the pull-down menus set to search for NOAA-20 VIIRS ADP data files on 14 November 2022 at 08:34-08:41 UTC, which corresponds to South Asia (click image to open full-size version).

Example of Jupyter widgets pull-down menus

Function to construct VIIRS product path on AWS

VIIRS data files on AWS are organized using the satellite, sensor (VIIRS), data product, and type of data processing. We create a function, called "get_product_path( )", that returns the product path corresponding to the user-entered satellite, product, and processing type. We will use this product path when we query AWS.

# Construct VIIRS aerosol product path for AWS
# "satellite", "product", "processing": parameter variables from widget menus, set in main function

def get_product_path(product, processing, satellite):
    
    # Set AWS NODD product name
    if product == 'AOD':
        product_name = 'Aerosol_Optical_Depth'
    elif product == 'ADP':
        product_name = 'Aerosol_Detection'
        
    # Set AWS NODD processing suffix
    if processing == 'reprocessed':
        suffix = '_EDR_Reprocessed'
    elif processing == 'operational':
        suffix = '_EDR'
    
    # Create AWS NOD product path name
    product_path = satellite + '_VIIRS_' + product_name + suffix

    return product_path

Function to create list of available VIIRS Level 2 aerosol data file names

We create a function, called "aws_viirs_list( )", that returns a list of the available VIIRS aerosol data file names matching the user-entered satellite/product/processing and observation date/time period. There are multiple steps in this function.

First, we access AWS anonymously using the S3Fs library. No login ID or password is required!

Next, we make a list called "full_day_files" that contains all the available VIIRS data file names for the user-entered satellite/product/processing/observation date, encompassing the entire day (00:00 to 23:59 UTC).

To populate "full_day_files", we find the AWS VIIRS product path using the "get_product_path( )" function we created, and use it with the other user-specified search parameters in the S3Fs "ls( )" command to "list" the file names for the entire observation day.

Then we loop through the file names in "full_day_files" and extract the file names that correspond to the exact time period entered by the user and put them in a new list called "data". We do this by comparing the start time in each VIIRS file name in "full_day_files" to the user-entered observation time period. We use reverse indexing, counting from the end of the VIIRS file names, because the length of the beginning of VIIRS Level 2 file names varies depending on the data product.

# Create list of available VIIRS aerosol data file names for user-specified satellite/product/processing & observation date/time
# "year", "month", "day, "start_hour", "start_min", "end_hour", "end_min", "satellite", "product", "processing": parameter 
    # variables from widget menus, set in main function

def aws_viirs_list(year, month, day, start_hour, start_min, end_hour, end_min, satellite, product, processing):

    # Construct aerosol product path for AWS NODD
    product_path = get_product_path(product, processing, satellite)

    # Access AWS using anonymous credentials
    aws = s3fs.S3FileSystem(anon=True)
    
    # Create list of file names for entire day (~550)
    day_files = aws.ls('noaa-jpss/' + satellite + '/VIIRS/' + product_path + '/' + year + '/' + month + '/' + day + '/', refresh=True)  
    
    # Create list of subsetted file names that fall within specified time period(s)
    data = []
    for file in day_files:
        file_time = file.split('_')[-3][9:13]
        if file_time >= (start_hour + start_min) and file_time <= (end_hour + end_min):
            data.append(file)
        
    return data

We create a function, called "get_viirs_files( )", that first prints the AWS search results and the name of the directory where the files will be saved. We also print the sizes of the available files, because VIIRS aerosol files can be quite large (100-200 MB). Then we ask the user if they want to download the files ("yes/no"). This allows the user to review the results of the search, as well as the destination directory, before initiating the download. If there are any problems with the search results, for example if the wrong satellite was selected by mistake, the user can answer "no" to terminate the script, and then adjust the search parameters in the GUI pull-down menus and re-run the script via the main function.

If the user enters "yes" to download the files, we access AWS anonymously again using the S3Fs library, and then we loop through the file names in the "data" list and use the S3Fs "get( )" command to copy (download) the corresponding files to the designated directory on the user's local computer/server.

We use the tqdm library to display a progress bar for the file download. It shows the percent complete of the total download, the number of files downloaded and the total number of files, and the total time elapsed/remaining in the download. Prior to the download loop, we flush the buffer for users running Python v3.8 or earlier, in order to avoid a glitch in the "tqdm" library.

# Print available VIIRS aerosol data files that match user specifications, with option to download files
# "save_path": parameter variable assigned in main function

def get_viirs_files(year, month, day, start_hour, start_min, end_hour, end_min, satellite, product, processing, save_path):
    
    # Query AWS VIIRS archive and print names/sizes of available aerosol files
    data = aws_viirs_list(year, month, day, start_hour, start_min, end_hour, end_min, satellite, product, processing)
    
    if len(data) > 0:   
        # Access AWS using anonymous credentials
        aws = s3fs.S3FileSystem(anon=True)
        
        # Print list of available data files
        print('Available data files (approximate file size):')
        for file in data:
            file_size = aws.size(file)
            # sep='' removes extra spaces b/w print elements
            print(file.split('/')[-1], ' (', int(file_size/1.0E6), ' MB)', sep='')
        
        # Print directory where files will be saved
        print('\nData files will be saved to: ' + str(save_path))
        
        # Ask user if they want to download the available data files
        # If yes, download files to specified directory
        download_question = 'Would you like to download the ' + str(len(data)) + ' files?\nType "yes" or "no" and hit "Enter"\n'
        download_files = input(download_question)
        if download_files in ['yes', 'YES', 'Yes', 'y', 'Y']:
            
            # Display progress bar using tqdm library
            # Flush buffer if Python version < v3.9 to avoid glitch in tqdm library
            if parse(sys.version.split(' ')[0]) < parse('3.9'):
                sys.stdout.flush()
            for name in tqdm(data, unit='files', bar_format="{desc}Downloading:{percentage:3.0f}%|{bar}|{n_fmt}/{total_fmt} [{elapsed}<{remaining}]"):
                # Set save_path + file_name as pathlib.Path object and convert to string
                full_path = str(save_path / name.split('/')[-1])
                # Download file from AWS archive
                aws.get(name, full_path)
            print('\nDownload complete!')
        else:
            print('Files are not being downloaded.')
    else:
        print('No files retrieved. Check settings and try again.')

Main function: execute script

The main function executes the script by calling the "get_viirs_files( )" function we created. Prior to that, we enter the directory where downloaded files will be saved ("save_path"). For simplicity, we set the directory as the current working directory, but this could be replaced by a user-entered directory path; we recommend using the pathlib module to set filesystem paths.

The parameter variables for the "get_ viirs_files( )" function include the AWS search parameters, entered in the widget menus. We obtain these variables by reading ".value" for each of the widget menu variables.

Note that we didn't include any error checks in this script, for simplicity. But users may want to add error checks to ensure the entered observation date is not in the future, and the entered observation end time is not prior to the start time.

# Execute search of AWS to find VIIRS aerosol data files, with option to download files
# Get values from widget menus (AWS search parameters) using ".value"

# Main function
if __name__ == "__main__":
    
    # Set directory to save downloaded VIIRS files (as pathlib.Path object)
    # Use current working directory for simplicity
    save_path = Path.cwd()
 
    # List/download files
    get_viirs_files(year.value, month.value, day.value, shour.value, smin.value, ehour.value, emin.value, satellite.value, 
                  product.value, processing.value, save_path)

Example of output from search for NOAA-20 VIIRS ADP data files on 14 November 2022 at 08:34-08:41 UTC:

Available data files (approximate file size):
JRR-ADP_v2r3_j01_s202211140834350_e202211140835596_c202211140904060.nc (9 MB)
JRR-ADP_v2r3_j01_s202211140836008_e202211140837235_c202211140904200.nc (12 MB)
JRR-ADP_v2r3_j01_s202211140837248_e202211140838493_c202211140904180.nc (10 MB)
JRR-ADP_v2r3_j01_s202211140838505_e202211140840150_c202211140904320.nc (12 MB)
JRR-ADP_v2r3_j01_s202211140840163_e202211140841408_c202211140904030.nc (10 MB)
JRR-ADP_v2r3_j01_s202211140841420_e202211140843066_c202211140903590.nc (10 MB)

Data files will be saved to: C:\Users\Trainings\Website
Would you like to download the 6 files?
Type "yes" or "no" and hit "Enter"
yes
Downloading:100%|█████████████████████████|6/6 [00:03<00:00]

Download complete!