The grab_and_go Module
The grab_and_go module provides a streamlined interface for downloading and processing satellite data in a single operation. It handles the entire workflow from retrieving data from remote servers to extracting fields of interest and saving them to disk.
Functions
- grab(aios_ds, t0, t1, verbose=True, skip_download=False)
Retrieves files from a data source within a given time range.
- Parameters:
- Returns:
List of local file paths if files are downloaded, otherwise None
- Return type:
list or None
- Raises:
ValueError – If the data source is not supported
- extract(aios_ds, local_files, exdict, n_cores, debug=False, single=False, verbose=True)
Extracts data from local files using specified extraction parameters.
- Parameters:
aios_ds (AIOS_DataSet) – Dataset object containing field information
local_files (list) – Path to local files to process
exdict (dict) – Dictionary of extraction parameters
n_cores (int) – Number of cores to use for multiprocessing
debug (bool) – Enable debugging mode
single (bool) – Enable single process mode
verbose (bool) – Enable verbose output
- Returns:
Tuple of fields, inpainted masks, metadata, and times
- Return type:
- Raises:
ValueError – If the dataset field is not supported
- run(dataset, tstart, tend, eoption_file, ex_file, tbl_file, n_cores, tdelta={'days': 1}, verbose=True, debug=False, debug_noasync=False, save_local_files=False)
Complete end-to-end pipeline to grab and extract data from a dataset.
- Parameters:
dataset (str) – Name of the dataset (e.g., ‘VIIRS_NPP’)
tstart (str) – Start time in ISO format (e.g., ‘2020-01-01’)
tend (str) – End time in ISO format
eoption_file (str) – Filename of extraction options
ex_file (str) – Output HDF5 filename for extracted data
tbl_file (str) – Output parquet filename for metadata
n_cores (int) – Number of cores to use
tdelta (dict) – Time delta for processing chunks
verbose (bool) – Enable verbose output
debug (bool) – Enable debug mode
debug_noasync (bool) – Debug without async
save_local_files (bool) – Keep downloaded files after processing
- Returns:
None
Extraction Parameters
The extraction options file (eoption_file) should be a JSON file with the following parameters:
field_size(int): Size of the field to extract in pixelsclear_threshold(float): Percentage threshold for clear conditionsnadir_offset(int): Offset from nadir in pixelstemp_bounds(list): Temperature bounds [min, max] in degrees Celsiusnrepeat(int): Number of repetitions for extractionsub_grid_step(int): Step size for sub-grid extractiongrow_mask(bool): Whether to grow the cloud maskinpaint(bool): Whether to perform inpainting on masked regions
Example Usage
Basic usage with VIIRS NPP data:
import asyncio
from wrangler.grab_and_go import run
# Define extraction options file
extract_file = 'extract_viirs_std.json'
# Run the pipeline to download and process data
run(
dataset='VIIRS_NPP', # Dataset name
tstart='2024-01-01', # Start date
tend='2024-01-02', # End date
eoption_file=extract_file, # Extraction options
ex_file='output.h5', # Output data file
tbl_file='metadata.parquet', # Output metadata file
n_cores=4 # Number of processing cores
)
Handling Larger Time Periods
For processing larger time periods efficiently:
import pandas as pd
from datetime import timedelta
from wrangler.grab_and_go import run
# Process one week at a time
start_date = pd.to_datetime('2024-01-01')
end_date = pd.to_datetime('2024-01-31')
current_date = start_date
while current_date < end_date:
next_date = current_date + timedelta(days=7)
# Ensure we don't go past the end date
if next_date > end_date:
next_date = end_date
# Process this time chunk
run(
dataset='VIIRS_NPP',
tstart=current_date.isoformat(),
tend=next_date.isoformat(),
eoption_file='extract_viirs_std.json',
ex_file=f'viirs_{current_date.strftime("%Y%m%d")}.h5',
tbl_file=f'viirs_meta_{current_date.strftime("%Y%m%d")}.parquet',
n_cores=4
)
current_date = next_date
Output Structure
The extraction process produces two main outputs:
HDF5 File (
ex_file) -fields: Extracted field data (n_fields × field_size × field_size) -inpainted_masks: Inpainted mask data -metadata: Array of metadata for each fieldParquet File (
tbl_file) - Contains all metadata in tabular format:filename: Original source filerow,col: Position in the original granulelat,lon: Geographic coordinatesclear_fraction: Fraction of clear pixelsfield_size: Size of the extracted fielddatetime: Timestamp of the dataex_filename: Path to the extraction file
Notes
Currently supports PODAAC data sources
Only SST (Sea Surface Temperature) fields are supported
Uses multiprocessing for parallel extraction of fields
Automatically validates the metadata table before saving