Quick Start Guide

This guide will help you get started with wrangler, demonstrating the basic workflow for downloading and processing satellite data.

Basic Usage

Loading a Dataset

Start by importing and loading your desired dataset:

from wrangler.datasets.loader import load_dataset

# Load VIIRS NPP dataset
viirs_npp = load_dataset('VIIRS_NPP')

Download and Process Data

The main workflow combines downloading and processing using the grab_and_go module:

from wrangler.grab_and_go import run

# Define your extraction options file
extract_file = 'extract_viirs_std.json'

# Run the pipeline
run(
    dataset='VIIRS_NPP',          # Dataset name
    tstart='2024-01-01',          # Start date
    tend='2024-01-02',            # End date
    eoption_file=extract_file,    # Extraction options
    ex_file='output.h5',          # Output HDF5 file
    tbl_file='metadata.parquet',  # Output metadata file
    n_cores=4                     # Number of processing cores
)

Extraction Configuration

Create an extraction options JSON file (e.g., ‘extract_viirs_std.json’):

{
    "field_size": 192,
    "clear_threshold": 5,
    "nadir_offset": 0,
    "temp_bounds": [-3, 34],
    "nrepeat": 1,
    "sub_grid_step": 4,
    "grow_mask": false,
    "inpaint": true
}

Working with Processed Data

Reading the Output

After processing, you can work with the output files:

import h5py
import pandas as pd

# Read the HDF5 file
with h5py.File('output.h5', 'r') as f:
    # Access the fields
    fields = f['fields'][:]
    masks = f['inpainted_masks'][:]

# Read the metadata
metadata = pd.read_parquet('metadata.parquet')

Visualizing Fields

Use the cutout module to visualize processed fields:

from wrangler.cutout import show_image

# Display a single field
show_image(fields[0], cbar=True, clbl='Temperature (°C)')

Advanced Usage

Manual Download and Processing

If you need more control over the pipeline, you can separate the download and processing steps:

from wrangler.grab_and_go import grab, extract

# First, download the files
local_files = grab(viirs_npp, '2024-01-01', '2024-01-02')

# Then process them
fields, masks, metadata, times = extract(
    viirs_npp,
    local_files,
    extract_options,
    n_cores=4
)

Field Preprocessing

For custom preprocessing of fields:

from wrangler.preproc.field import main as process_field

# Process a single field
processed_field, meta = process_field(
    field,
    mask,
    inpaint=True,
    median=True,
    med_size=(3,1),
    downscale=True,
    dscale_size=(2,2)
)

Common Patterns

1. Quality Control

Filter data based on quality thresholds:

# Filter by clear fraction
good_data = metadata[metadata['clear_fraction'] > 0.95]

2. Geographic Selection

Select data from specific regions:

# Filter by latitude/longitude
region_data = metadata[
    (metadata['lat'].between(32, 40)) &
    (metadata['lon'].between(-128, -118))
]

3. Batch Processing

Process multiple time periods:

from datetime import datetime, timedelta
import pandas as pd
from wrangler.grab_and_go import run

start_date = pd.to_datetime('2024-01-01')
end_date = pd.to_datetime('2024-01-31')

# Process one day at a time
current_date = start_date
while current_date <= end_date:
    next_date = current_date + timedelta(days=1)
    run(
        dataset='VIIRS_NPP',
        tstart=current_date.isoformat(),
        tend=next_date.isoformat(),
        eoption_file='extract_viirs_std.json',
        ex_file=f'output_{current_date.strftime("%Y%m%d")}.h5',
        tbl_file=f'metadata_{current_date.strftime("%Y%m%d")}.parquet',
        n_cores=4
    )
    current_date = next_date

Tips and Best Practices

Memory Management - Process data in smaller time chunks for large datasets - Use the n_cores parameter appropriately for your system - Clean up downloaded files by setting save_local_files=False (default)
Quality Control - Always check the clear_fraction in the metadata - Verify temperature bounds are appropriate for your region - Inspect inpainted masks for data quality
Performance - Use multiple cores for processing when available - Consider downscaling for large datasets - Use appropriate batch sizes for your memory constraints

Next Steps

Explore the API documentation for more detailed information
Check out the example notebooks in the repository
Join the community and contribute to the project