Quick Start Guide ================ This guide will help you get started with wrangler, demonstrating the basic workflow for downloading and processing satellite data. Basic Usage ---------- Loading a Dataset ^^^^^^^^^^^^^^ Start by importing and loading your desired dataset: .. code-block:: python from wrangler.datasets.loader import load_dataset # Load VIIRS NPP dataset viirs_npp = load_dataset('VIIRS_NPP') Download and Process Data ^^^^^^^^^^^^^^^^^^^^^^ The main workflow combines downloading and processing using the grab_and_go module: .. code-block:: python from wrangler.grab_and_go import run # Define your extraction options file extract_file = 'extract_viirs_std.json' # Run the pipeline run( dataset='VIIRS_NPP', # Dataset name tstart='2024-01-01', # Start date tend='2024-01-02', # End date eoption_file=extract_file, # Extraction options ex_file='output.h5', # Output HDF5 file tbl_file='metadata.parquet', # Output metadata file n_cores=4 # Number of processing cores ) Extraction Configuration ^^^^^^^^^^^^^^^^^^^^^ Create an extraction options JSON file (e.g., 'extract_viirs_std.json'): .. code-block:: json { "field_size": 192, "clear_threshold": 5, "nadir_offset": 0, "temp_bounds": [-3, 34], "nrepeat": 1, "sub_grid_step": 4, "grow_mask": false, "inpaint": true } Working with Processed Data ------------------------ Reading the Output ^^^^^^^^^^^^^^^ After processing, you can work with the output files: .. code-block:: python import h5py import pandas as pd # Read the HDF5 file with h5py.File('output.h5', 'r') as f: # Access the fields fields = f['fields'][:] masks = f['inpainted_masks'][:] # Read the metadata metadata = pd.read_parquet('metadata.parquet') Visualizing Fields ^^^^^^^^^^^^^^^ Use the cutout module to visualize processed fields: .. code-block:: python from wrangler.cutout import show_image # Display a single field show_image(fields[0], cbar=True, clbl='Temperature (°C)') Advanced Usage ------------ Manual Download and Processing ^^^^^^^^^^^^^^^^^^^^^^^^^^ If you need more control over the pipeline, you can separate the download and processing steps: .. code-block:: python from wrangler.grab_and_go import grab, extract # First, download the files local_files = grab(viirs_npp, '2024-01-01', '2024-01-02') # Then process them fields, masks, metadata, times = extract( viirs_npp, local_files, extract_options, n_cores=4 ) Field Preprocessing ^^^^^^^^^^^^^^^ For custom preprocessing of fields: .. code-block:: python from wrangler.preproc.field import main as process_field # Process a single field processed_field, meta = process_field( field, mask, inpaint=True, median=True, med_size=(3,1), downscale=True, dscale_size=(2,2) ) Common Patterns ------------- 1. Quality Control ^^^^^^^^^^^^^^^ Filter data based on quality thresholds: .. code-block:: python # Filter by clear fraction good_data = metadata[metadata['clear_fraction'] > 0.95] 2. Geographic Selection ^^^^^^^^^^^^^^^^^^^ Select data from specific regions: .. code-block:: python # Filter by latitude/longitude region_data = metadata[ (metadata['lat'].between(32, 40)) & (metadata['lon'].between(-128, -118)) ] 3. Batch Processing ^^^^^^^^^^^^^^^^ Process multiple time periods: .. code-block:: python from datetime import datetime, timedelta import pandas as pd from wrangler.grab_and_go import run start_date = pd.to_datetime('2024-01-01') end_date = pd.to_datetime('2024-01-31') # Process one day at a time current_date = start_date while current_date <= end_date: next_date = current_date + timedelta(days=1) run( dataset='VIIRS_NPP', tstart=current_date.isoformat(), tend=next_date.isoformat(), eoption_file='extract_viirs_std.json', ex_file=f'output_{current_date.strftime("%Y%m%d")}.h5', tbl_file=f'metadata_{current_date.strftime("%Y%m%d")}.parquet', n_cores=4 ) current_date = next_date Tips and Best Practices -------------------- 1. Memory Management - Process data in smaller time chunks for large datasets - Use the `n_cores` parameter appropriately for your system - Clean up downloaded files by setting `save_local_files=False` (default) 2. Quality Control - Always check the clear_fraction in the metadata - Verify temperature bounds are appropriate for your region - Inspect inpainted masks for data quality 3. Performance - Use multiple cores for processing when available - Consider downscaling for large datasets - Use appropriate batch sizes for your memory constraints Next Steps --------- - Explore the API documentation for more detailed information - Check out the example notebooks in the repository - Join the community and contribute to the project