Quick Start Guide
This guide will help you get started with wrangler, demonstrating the basic workflow for downloading and processing satellite data.
Basic Usage
Loading a Dataset
Start by importing and loading your desired dataset:
from wrangler.datasets.loader import load_dataset
# Load VIIRS NPP dataset
viirs_npp = load_dataset('VIIRS_NPP')
Download and Process Data
The main workflow combines downloading and processing using the grab_and_go module:
from wrangler.grab_and_go import run
# Define your extraction options file
extract_file = 'extract_viirs_std.json'
# Run the pipeline
run(
dataset='VIIRS_NPP', # Dataset name
tstart='2024-01-01', # Start date
tend='2024-01-02', # End date
eoption_file=extract_file, # Extraction options
ex_file='output.h5', # Output HDF5 file
tbl_file='metadata.parquet', # Output metadata file
n_cores=4 # Number of processing cores
)
Extraction Configuration
Create an extraction options JSON file (e.g., ‘extract_viirs_std.json’):
{
"field_size": 192,
"clear_threshold": 5,
"nadir_offset": 0,
"temp_bounds": [-3, 34],
"nrepeat": 1,
"sub_grid_step": 4,
"grow_mask": false,
"inpaint": true
}
Working with Processed Data
Reading the Output
After processing, you can work with the output files:
import h5py
import pandas as pd
# Read the HDF5 file
with h5py.File('output.h5', 'r') as f:
# Access the fields
fields = f['fields'][:]
masks = f['inpainted_masks'][:]
# Read the metadata
metadata = pd.read_parquet('metadata.parquet')
Visualizing Fields
Use the cutout module to visualize processed fields:
from wrangler.cutout import show_image
# Display a single field
show_image(fields[0], cbar=True, clbl='Temperature (°C)')
Advanced Usage
Manual Download and Processing
If you need more control over the pipeline, you can separate the download and processing steps:
from wrangler.grab_and_go import grab, extract
# First, download the files
local_files = grab(viirs_npp, '2024-01-01', '2024-01-02')
# Then process them
fields, masks, metadata, times = extract(
viirs_npp,
local_files,
extract_options,
n_cores=4
)
Field Preprocessing
For custom preprocessing of fields:
from wrangler.preproc.field import main as process_field
# Process a single field
processed_field, meta = process_field(
field,
mask,
inpaint=True,
median=True,
med_size=(3,1),
downscale=True,
dscale_size=(2,2)
)
Common Patterns
1. Quality Control
Filter data based on quality thresholds:
# Filter by clear fraction
good_data = metadata[metadata['clear_fraction'] > 0.95]
2. Geographic Selection
Select data from specific regions:
# Filter by latitude/longitude
region_data = metadata[
(metadata['lat'].between(32, 40)) &
(metadata['lon'].between(-128, -118))
]
3. Batch Processing
Process multiple time periods:
from datetime import datetime, timedelta
import pandas as pd
from wrangler.grab_and_go import run
start_date = pd.to_datetime('2024-01-01')
end_date = pd.to_datetime('2024-01-31')
# Process one day at a time
current_date = start_date
while current_date <= end_date:
next_date = current_date + timedelta(days=1)
run(
dataset='VIIRS_NPP',
tstart=current_date.isoformat(),
tend=next_date.isoformat(),
eoption_file='extract_viirs_std.json',
ex_file=f'output_{current_date.strftime("%Y%m%d")}.h5',
tbl_file=f'metadata_{current_date.strftime("%Y%m%d")}.parquet',
n_cores=4
)
current_date = next_date
Tips and Best Practices
Memory Management - Process data in smaller time chunks for large datasets - Use the n_cores parameter appropriately for your system - Clean up downloaded files by setting save_local_files=False (default)
Quality Control - Always check the clear_fraction in the metadata - Verify temperature bounds are appropriate for your region - Inspect inpainted masks for data quality
Performance - Use multiple cores for processing when available - Consider downscaling for large datasets - Use appropriate batch sizes for your memory constraints
Next Steps
Explore the API documentation for more detailed information
Check out the example notebooks in the repository
Join the community and contribute to the project