SmartCane/webapps/docs/PIPELINE_OVERVIEW.md

20 KiB
Raw Permalink Blame History

SmartCane Processing Pipeline - Complete Script Overview

Pipeline Execution Order

Complete Pipeline Mermaid Diagram

%% Complete Pipeline
graph TD
    %% ===== INPUTS =====
    API["🔑 Planet API<br/>Credentials"]
    GeoJSON["🗺️ pivot.geojson<br/>(Field Boundaries)"]
    HarvestIn["📊 harvest.xlsx<br/>(from Stage 23)"]
    
    %% ===== STAGE 00: DOWNLOAD =====
    Stage00["<b>Stage 00: Python</b><br/>00_download_8band_pu_optimized.py"]
    Out00["📦 merged_tif/<br/>YYYY-MM-DD.tif<br/>(4-band or 8-band)<br/>(configurable)"]
    
    %% ===== STAGE 10: OPTIONAL TILING =====
    Stage10["<b>Stage 10: R</b><br/>10_create_per_field_tiffs.R<br/>(Per-field extraction)"]
    Out10["📦 daily_tiles_split/per_field/<br/>YYYY-MM-DD/*.tif<br/>(one per field)"]
    
    %% ===== STAGE 20: CI EXTRACTION =====
    Stage20["<b>Stage 20: R</b><br/>20_ci_extraction.R"]
    Out20a["📦 combined_CI_data.rds<br/>(wide: fields × dates)"]
    Out20b["📦 daily RDS files<br/>(per-date stats)"]
    
    %% ===== STAGE 21: RDS → CSV =====
    Stage21["<b>Stage 21: R</b><br/>21_convert_ci_rds_to_csv.R"]
    Out21["📦 ci_data_for_python.csv<br/>(long format + DOY)"]
    
    %% ===== STAGE 22: BASELINE HARVEST =====
    Stage22["<b>Stage 22: Python</b><br/>22_harvest_baseline_prediction.py<br/>(RUN ONCE)"]
    Out22["📦 harvest_production_export.xlsx<br/>(baseline predictions)"]
    
    %% ===== STAGE 23: HARVEST FORMAT =====
    Stage23["<b>Stage 23: Python</b><br/>23_convert_harvest_format.py"]
    Out23["📦 harvest.xlsx<br/>(standard format)<br/>→ Feeds back to Stage 80"]
    
    %% ===== STAGE 30: GROWTH MODEL =====
    Stage30["<b>Stage 30: R</b><br/>30_interpolate_growth_model.R"]
    Out30["📦 All_pivots_Cumulative_CI...<br/>_quadrant_year_v2.rds<br/>(interpolated daily)"]
    
    %% ===== STAGE 31: WEEKLY HARVEST =====
    Stage31["<b>Stage 31: Python</b><br/>31_harvest_imminent_weekly.py<br/>(Weekly)"]
    Out31["📦 harvest_imminent_weekly.csv<br/>(probabilities)"]
    
    %% ===== STAGE 40: MOSAIC =====
    Stage40["<b>Stage 40: R</b><br/>40_mosaic_creation.R"]
    Out40["📦 weekly_mosaic/<br/>week_WW_YYYY.tif<br/>(5-band composite)"]
    
    %% ===== STAGE 80: KPI =====
    Stage80["<b>Stage 80: R</b><br/>80_calculate_kpis.R"]
    Out80a["📦 field_analysis_week{WW}.xlsx"]
    Out80b["📦 kpi_summary_tables_week{WW}.rds"]
    
    %% ===== STAGE 90: REPORT =====
    Stage90["<b>Stage 90: R/RMarkdown</b><br/>90_CI_report_with_kpis_simple.Rmd"]
    Out90["📦 SmartCane_Report_week{WW}_{YYYY}.docx<br/>(FINAL OUTPUT)"]
    
    %% ===== CONNECTIONS: INPUTS TO STAGE 00 =====
    API --> Stage00
    GeoJSON --> Stage00
    
    %% ===== STAGE 00 → 10 OR 20 =====
    Stage00 --> Out00
    Out00 --> Stage10
    Out00 --> Stage20
    
    %% ===== STAGE 10 → 20 =====
    Stage10 --> Out10
    Out10 --> Stage20
    
    %% ===== STAGE 20 → 21, 30, 40 =====
    GeoJSON --> Stage20
    Stage20 --> Out20a
    Stage20 --> Out20b
    Out20a --> Stage21
    Out20a --> Stage30
    Out00 --> Stage40
    
    %% ===== STAGE 21 → 22, 31 =====
    Stage21 --> Out21
    Out21 --> Stage22
    Out21 --> Stage31
    
    %% ===== STAGE 22 → 23 =====
    Stage22 --> Out22
    Out22 --> Stage23
    
    %% ===== STAGE 23 → 80 & FEEDBACK =====
    Stage23 --> Out23
    Out23 -.->|"Feeds back<br/>(Season context)"| Stage80
    
    %% ===== STAGE 30 → 80 =====
    Stage30 --> Out30
    Out30 --> Stage80
    
    %% ===== STAGE 31 (PARALLEL) =====
    Stage31 --> Out31
    
    %% ===== STAGE 40 → 80, 90 =====
    Stage40 --> Out40
    Out40 --> Stage80
    Out40 --> Stage90
    
    %% ===== STAGE 80 → 90 =====
    Stage80 --> Out80a
    Stage80 --> Out80b
    Out80a --> Stage90
    Out80b --> Stage90
    
    %% ===== STAGE 90 FINAL =====
    Stage90 --> Out90
    
    %% ===== ADDITIONAL INPUTS =====
    HarvestIn --> Stage30
    HarvestIn --> Stage80
    GeoJSON --> Stage30
    GeoJSON --> Stage40
    GeoJSON --> Stage80
    
    %% ===== STYLING =====
    classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef pyStage fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef rStage fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef output fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef finalOutput fill:#ffebee,stroke:#c62828,stroke-width:3px
    
    class API,GeoJSON,HarvestIn input
    class Stage00,Stage22,Stage23,Stage31 pyStage
    class Stage10,Stage20,Stage21,Stage30,Stage40,Stage80,Stage90 rStage
    class Out00,Out10,Out20a,Out20b,Out21,Out22,Out30,Out31,Out40,Out80a,Out80b output
    class Out23,Out90 finalOutput

Detailed Stage Descriptions

Stage 00: PYTHON - Download Satellite Data
         └─ 00_download_8band_pu_optimized.py
            INPUT: Planet API credentials, field boundaries (pivot.geojson), date range
OUTPUT: laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif (4-band or 8-band)
             RUN FREQUENCY: Daily or as-needed
             NOTES: Download script configures band count; consolidates to single merged_tif/ folder

Stage 10: R - Create Per-Field Daily Tiles
          └─ 10_create_per_field_tiffs.R
             INPUT: Daily GeoTIFFs from merged_tif/
                    Field boundaries (pivot.geojson)
             OUTPUT: laravel_app/storage/app/{project}/daily_tiles_split/per_field/{YYYY-MM-DD}/*.tif
             RUN FREQUENCY: Optional - per-field extraction for efficient memory use
             NOTES: Creates one GeoTIFF per field per day

Stage 20: R - Extract Canopy Index (CI) from Daily Imagery
          └─ 20_ci_extraction_per_field.R
             INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/)
                    Field boundaries (pivot.geojson)
             OUTPUT: RDS files:
                     - laravel_app/storage/app/{project}/Data/extracted_ci/daily_vals/extracted_{YYYY-MM-DD}_{suffix}.rds
                     - laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/combined_CI_data.rds (wide format)
             RUN FREQUENCY: Daily or on-demand
             COMMAND: Rscript 20_ci_extraction_per_field.R [end_date] [offset] [project_dir] [data_source]
             EXAMPLE: Rscript 20_ci_extraction_per_field.R 2026-01-02 7 angata merged_tif
             NOTES: Auto-detects per-field tiles if daily_tiles_split/per_field/ exists; outputs cumulative CI (fields × dates)

Stage 21: R - Convert CI RDS to CSV for Python Harvest Detection
          └─ 21_convert_ci_rds_to_csv.R
             INPUT: combined_CI_data.rds (from Stage 20)
             OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/ci_data_for_python/ci_data_for_python.csv
             RUN FREQUENCY: After Stage 20
             COMMAND: Rscript 21_convert_ci_rds_to_csv.R [project_dir]
             EXAMPLE: Rscript 21_convert_ci_rds_to_csv.R angata
             NOTES: Converts wide RDS (fields × dates) to long CSV; interpolates missing dates; adds DOY column

Stage 22: PYTHON - Baseline Harvest Prediction (LSTM Model 307)
          └─ 22_harvest_baseline_prediction.py
             INPUT: ci_data_for_python.csv (complete historical CI data)
             OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_production_export.xlsx
             RUN FREQUENCY: ONCE - establishes ground truth baseline for all fields
             COMMAND: python 22_harvest_baseline_prediction.py [project_name]
             EXAMPLE: python 22_harvest_baseline_prediction.py angata
             NOTES: Two-step detection (Phase 1: growing window, Phase 2: ±40 day argmax refinement)
                    Tuned parameters: threshold=0.3, consecutive_days=2
                    Uses LSTM Model 307 dual output heads (imminent + detected)

Stage 23: PYTHON - Convert Harvest Format to Standard Structure
          └─ 23_convert_harvest_format.py
             INPUT: harvest_production_export.xlsx (from Stage 22)
                    CI data date range (determines season_start for first season)
             OUTPUT: laravel_app/storage/app/{project}/Data/harvest.xlsx (standard format)
             RUN FREQUENCY: After Stage 22
             COMMAND: python 23_convert_harvest_format.py [project_name]
             EXAMPLE: python 23_convert_harvest_format.py angata
             NOTES: Converts to standard harvest.xlsx format with columns:
                    field, sub_field, year, season, season_start, season_end, age, sub_area, tonnage_ha
                    Season format: "Data{year} : {field}"
                    Only includes completed seasons (with season_end filled)

Stage 30: R - Growth Model Interpolation (Smooth CI Time Series)
          └─ 30_interpolate_growth_model.R
             INPUT: combined_CI_data.rds (from Stage 20)
                    harvest.xlsx (optional, for seasonal context)
             OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/
                     All_pivots_Cumulative_CI_quadrant_year_v2.rds
             RUN FREQUENCY: Weekly or after CI extraction updates
             COMMAND: Rscript 30_interpolate_growth_model.R [project_dir]
             EXAMPLE: Rscript 30_interpolate_growth_model.R angata
             NOTES: Linear interpolation across gaps; calculates daily change and cumulative CI
                    Outputs long-format data (Date, DOY, field, value, season, etc.)

Stage 31: PYTHON - Weekly Harvest Monitoring (Real-Time Alerts)
          └─ 31_harvest_imminent_weekly.py
             INPUT: ci_data_for_python.csv (recent CI data, last ~300 days)
                    harvest_production_export.xlsx (optional baseline reference)
             OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_imminent_weekly.csv
             RUN FREQUENCY: Weekly or daily for operational alerts
             COMMAND: python 31_harvest_imminent_weekly.py [project_name]
             EXAMPLE: python 31_harvest_imminent_weekly.py angata
             NOTES: Single-run inference on recent data; outputs probabilities (imminent_prob, detected_prob)
                    Used for real-time decision support; compared against baseline from Stage 22

Stage 40: R - Create Weekly 5-Band Mosaics
          └─ 40_mosaic_creation_per_field.R
             INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/)
                    Field boundaries (pivot.geojson)
             OUTPUT: laravel_app/storage/app/{project}/weekly_mosaic/week_{WW}_{YYYY}.tif
             RUN FREQUENCY: Weekly
             COMMAND: Rscript 40_mosaic_creation_per_field.R [end_date] [offset] [project_dir]
             EXAMPLE: Rscript 40_mosaic_creation_per_field.R 2026-01-14 7 angata
             NOTES: Composites daily images using MAX function; 5 bands (R, G, B, NIR, CI)
                    Automatically selects images with acceptable cloud coverage
                    Output uses ISO week numbering (week_WW_YYYY)

Stage 80: R - Calculate KPIs & Per-Field Analysis
          └─ 80_calculate_kpis.R
             INPUT: Weekly mosaic (from Stage 40)
                    Growth model data (from Stage 30)
                    Field boundaries (pivot.geojson)
                    Harvest data (harvest.xlsx)
             OUTPUT: laravel_app/storage/app/{project}/reports/
                     - {project}_field_analysis_week{WW}.xlsx
                     - {project}_kpi_summary_tables_week{WW}.rds
             RUN FREQUENCY: Weekly
             COMMAND: Rscript 80_calculate_kpis.R [end_date] [project_dir] [offset_days]
             EXAMPLE: Rscript 80_calculate_kpis.R 2026-01-14 angata 7
             NOTES: Parallel processing for 1000+ fields; calculates:
                    - Per-field uniformity (CV), phase assignment, growth trends
                    - Status triggers (germination, rapid growth, disease, harvest imminence)
                    - Farm-level KPI metrics (6 high-level indicators)
                    TEST_MODE=TRUE uses only recent weeks for development

Stage 90: R (RMarkdown) - Generate Executive Report (Word Document)
          └─ 90_CI_report_with_kpis_simple.Rmd
             INPUT: Weekly mosaic (from Stage 40)
                    KPI summary data (from Stage 80)
                    Field analysis (from Stage 80)
                    Field boundaries & harvest data (for context)
             OUTPUT: laravel_app/storage/app/{project}/reports/
                     SmartCane_Report_week{WW}_{YYYY}.docx (PRIMARY OUTPUT)
                     SmartCane_Report_week{WW}_{YYYY}.html (optional)
             RUN FREQUENCY: Weekly
             RENDERING: R/RMarkdown with officer + flextable packages
             NOTES: Executive summary with KPI overview, phase distribution, status triggers
                    Field-by-field detail pages with CI metrics and interpretation guides
                    Automatic unit conversion (hectares ↔ acres)

Data Storage & Persistence

All data persists to the file system. No database writes occur during analysis—reads only for metadata.

laravel_app/storage/app/{project}/
├── Data/
│   ├── pivot.geojson                    # Field boundaries (read-only input)
│   ├── harvest.xlsx                     # Season dates & yield (standard format from Stage 23)
│   ├── vrt/                             # Virtual raster files (daily VRTs from Stage 20)
│   │   └── YYYY-MM-DD.vrt
│   ├── extracted_ci/
│   │   ├── ci_data_for_python/
│   │   │   └── ci_data_for_python.csv            # CSV for Python (from Stage 21)
│   │   ├── daily_vals/
│   │   │   └── extracted_YYYY-MM-DD_{suffix}.rds    # Daily field CI stats (from Stage 20)
│   │   └── cumulative_vals/
│   │       ├── combined_CI_data.rds                  # Cumulative CI, wide format (from Stage 20)
│   │       └── All_pivots_Cumulative_CI_quadrant_year_v2.rds  # Interpolated daily (from Stage 30)
│   └── HarvestData/
│       ├── harvest_production_export.xlsx      # Baseline harvest predictions (from Stage 22)
│       └── harvest_imminent_weekly.csv         # Weekly monitoring output (from Stage 31)
│
├── merged_tif/                          # Raw satellite imagery (Stage 00 output)
│   └── YYYY-MM-DD.tif                   # 4-band or 8-band (configurable via download script)
│
├── daily_tiles_split/                   # (Optional) Per-field tile processing (Stage 10 output)
│   ├── per_field/
│   │   └── YYYY-MM-DD/                  # Date-specific folder
│   │       └── {FIELD}_YYYY-MM-DD.tif   # One per-field GeoTIFF per day
│
├── weekly_mosaic/                       # Weekly composite mosaics (Stage 40 output)
│   └── week_WW_YYYY.tif                 # 5 bands: R, G, B, NIR, CI (composite)
│
└── reports/                             # Analysis outputs & reports (Stage 80, 90 outputs)
    ├── SmartCane_Report_week{WW}_{YYYY}.docx    # FINAL REPORT (Stage 90)
    ├── SmartCane_Report_week{WW}_{YYYY}.html    # Alternative format
    ├── {project}_field_analysis_week{WW}.xlsx   # Field-by-field data (Stage 80)
    ├── {project}_kpi_summary_tables_week{WW}.rds    # Summary RDS (Stage 80)
    └── kpis/
        └── week_WW_YYYY/                        # Week-specific KPI folder

Key File Formats

Format Stage Purpose Example
.tif (GeoTIFF) 00, 10, 40 Geospatial raster imagery 2026-01-14.tif (4-band), week_02_2026.tif (5-band)
.vrt (Virtual Raster) 20 Virtual pointer to TIFFs 2026-01-14.vrt
.rds (R Binary) 20, 21, 30, 80 R serialized data objects combined_CI_data.rds, All_pivots_Cumulative_CI_quadrant_year_v2.rds
.csv (Comma-Separated) 21, 31 Tabular data for Python ci_data_for_python.csv, harvest_imminent_weekly.csv
.xlsx (Excel) 22, 23, 80 Tabular reports & harvest data harvest.xlsx, harvest_production_export.xlsx, field analysis
.docx (Word) 90 Executive report (final output) SmartCane_Report_week02_2026.docx
.json 10 Tiling metadata tiling_config.json
.geojson Input Field boundaries (read-only) pivot.geojson

Script Dependencies & Utility Files

parameters_project.R
    ├─ Loaded by: 20_ci_extraction.R, 30_interpolate_growth_model.R, 
    │             40_mosaic_creation.R, 80_calculate_kpis.R, 90_CI_report_with_kpis_simple.Rmd
    └─ Purpose: Initializes project config (paths, field boundaries, harvest data)

harvest_date_pred_utils.py
    ├─ Used by: 22_harvest_baseline_prediction.py, 23_convert_harvest_format.py, 31_harvest_imminent_weekly.py
    └─ Purpose: LSTM model loading, feature extraction, two-step harvest detection

20_ci_extraction_utils.R
    ├─ Used by: 20_ci_extraction.R
    └─ Purpose: CI calculation, field masking, RDS I/O, tile detection

30_growth_model_utils.R
    ├─ Used by: 30_interpolate_growth_model.R
    └─ Purpose: Linear interpolation, daily metrics, seasonal grouping

40_mosaic_creation_utils.R, 40_mosaic_creation_tile_utils.R
    ├─ Used by: 40_mosaic_creation.R
    └─ Purpose: Weekly composite creation, cloud assessment, raster masking

kpi_utils.R
    ├─ Used by: 80_calculate_kpis.R
    └─ Purpose: Per-field statistics, phase assignment, trigger detection

report_utils.R
    ├─ Used by: 90_CI_report_with_kpis_simple.Rmd
    └─ Purpose: Report building, table formatting, Word document generation

Command-Line Execution Examples

Daily/Weekly Workflow

# Stage 00: Download today's satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata --cleanup

# Stage 20: Extract CI from daily imagery (last 7 days)
cd ../r_app
Rscript 20_ci_extraction_per_field.R 2026-01-14 7 angata merged_tif

# Stage 21: Convert CI to CSV for harvest detection
Rscript 21_convert_ci_rds_to_csv.R angata

# Stage 31: Weekly harvest monitoring (real-time alerts)
cd ../python_app
python 31_harvest_imminent_weekly.py angata

# Back to R for mosaic and KPIs
cd ../r_app
Rscript 40_mosaic_creation.R 2026-01-14 7 angata
Rscript 80_calculate_kpis.R 2026-01-14 angata 7

# Stage 90: Generate report
Rscript -e "rmarkdown::render('90_CI_report_with_kpis_simple.Rmd')"

One-Time Setup (Baseline Harvest Detection)

# Only run ONCE to establish baseline
cd python_app
python 22_harvest_baseline_prediction.py angata

# Convert to standard format
python 23_convert_harvest_format.py angata

Processing Notes

CI Extraction (Stage 20)

  • Calculates CI = (NIR - Green) / (NIR + Green)
  • Supports both 4-band and 8-band imagery with auto-detection
  • Handles cloud masking via UDM band (8-band) or manual thresholding (4-band)
  • Outputs cumulative RDS in wide format (fields × dates) for fast lookups

Growth Model (Stage 30)

  • Linear interpolation across missing dates
  • Maintains seasonal context for agricultural lifecycle tracking
  • Outputs long-format data for trend analysis

Harvest Detection (Stages 22 & 31)

  • Model 307: Unidirectional LSTM with dual output heads
    • Imminent Head: Probability field will be harvestable in next 28 days
    • Detected Head: Probability of immediate harvest event
  • Stage 22 (Baseline): Two-step detection on complete historical data
    • Phase 1: Growing window expansion (real-time simulation)
    • Phase 2: ±40 day refinement (argmax harvest signal)
  • Stage 31 (Weekly): Single-run inference on recent data (~300 days)
    • Compares against baseline for anomaly detection

KPI Calculation (Stage 80)

  • Per-field metrics: Uniformity (CV), phase, growth trends, 4-week trends
  • Status triggers: Germination, rapid growth, slow growth, non-uniform, weed pressure, harvest imminence
  • Farm-level KPIs: 6 high-level indicators for executive summary
  • Parallel processing: ~1000+ fields processed in <5 minutes

Future Enhancements

  • Real-Time Monitoring: Daily harvest probability updates integrated into web dashboard
  • SAR Integration: Radar satellite data (Sentinel-1) for all-weather monitoring
  • IoT Sensors: Ground-based soil moisture and weather integration
  • Advanced Yield Models: Enhanced harvest forecasting with satellite + ground truth
  • Automated Alerts: WhatsApp/email dispatch of critical agricultural advice