SmartCane/webapps/docs/ARCHITECTURE_DATA_FLOW.md

13 KiB
Raw Permalink Blame History

SmartCane Data Flow Architecture

This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage.

High-Level Data Flow

%% High-Level Data Flow
flowchart TD
    A["🛰️ External Data Sources<br/>Planet API • GeoJSON • Harvest Data"]
    
    B["🐍 Python Stage 00<br/>00_download_8band_pu_optimized.py"]
    C["💾 4-Band TIFF<br/>merged_tif/{DATE}.tif<br/>RGB+NIR uint16"]
    
    D["🔴 R Stage 10<br/>10_create_per_field_tiffs.R"]
    E["💾 Per-Field Tiles<br/>field_tiles/{FIELD}/{DATE}.tif"]
    
    F["🟢 R Stage 20<br/>20_ci_extraction_per_field.R"]
    G["💾 CI Data<br/>field_tiles_CI/{FIELD}/{DATE}.tif<br/>+ combined_CI_data.rds"]
    
    H["🟡 R Stage 30<br/>30_interpolate_growth_model.R"]
    I["💾 Interpolated Model<br/>All_pivots_Cumulative_CI_quadrant_year_v2.rds"]
    
    J["🟣 R Stage 40<br/>40_mosaic_creation_per_field.R"]
    K["💾 Weekly Mosaics<br/>weekly_mosaic/{FIELD}/week_WW_YYYY.tif"]
    
    L["🟠 R Stage 80<br/>80_calculate_kpis.R"]
    M["💾 KPI Outputs<br/>Excel + RDS Summary"]
    
    N["📄 R Stage 90/91<br/>RMarkdown Reporting"]
    O["✅ Final Outputs<br/>Word Reports • Excel Tables • GeoTIFFs"]
    
    A -->|Download| B
    B -->|Save| C
    C -->|Split| D
    D -->|Save| E
    E -->|Extract CI| F
    F -->|Save| G
    G -->|Interpolate| H
    H -->|Save| I
    I -->|Create Mosaic| J
    J -->|Save| K
    K -->|Calculate KPIs| L
    L -->|Save| M
    M -->|Render Report| N
    N -->|Generate| O

Stage-by-Stage Transformation

Entry Point: External Data Sources

Source Format Key File Purpose
Planet Labs API 4-band GeoTIFF (RGB+NIR) Satellite imagery Raw canopy reflectance
Project GeoJSON GeoJSON polygons pivot.geojson Field boundary masks
Harvest Records Excel spreadsheet harvest.xlsx Season date markers (optional for agronomic_support, required for cane_supply)

Storage Path: laravel_app/storage/app/{PROJECT}/Data/


Stage 00: Download (Python)

Script: python_app/00_download_8band_pu_optimized.py

Inputs:

  • Planet API credentials (SentinelHub)
  • Date range (YYYY-MM-DD format)
  • Project ID (determines bounding box)
  • Cloud masking threshold

Key Processing:

  • Authenticates via SentinelHub SDK
  • Downloads 4 bands (R, G, B, NIR) at 3m resolution
  • Applies UDM1 cloud masking
  • Merges all tiles for the day into single GeoTIFF

Output Format: 4-band uint16 GeoTIFF, ~150-300MB per date

laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif

Execution Context:

  • SOBIT: Triggered via Laravel ProjectDownloadTiffJob queue
  • Dev Laptop: Manual PowerShell command
    cd python_app
    python 00_download_8band_pu_optimized.py angata --date 2026-02-19
    

Stage 10: Per-Field Tile Creation (R)

Script: r_app/10_create_per_field_tiffs.R

Inputs:

  • Merged 4-band TIFF: merged_tif/{DATE}.tif
  • Field boundaries: pivot.geojson

Key Processing:

  • Reads polygon geometries from GeoJSON
  • Clips merged TIFF to each field boundary
  • Preserves 4 bands (R, G, B, NIR) as uint16
  • Handles edge pixels and overlaps

Output Format: Per-field 4-band TIFFs

laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif

Execution Context:

  • SOBIT: Via shell wrapper 10_planet_download.sh
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7
    

Stage 20: CI Extraction (R)

Script: r_app/20_ci_extraction_per_field.R

Inputs:

  • Per-field 4-band TIFFs: field_tiles/{FIELD}/{DATE}.tif
  • Field boundaries: pivot.geojson

Key Processing:

  • Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel
  • Extracts field-level statistics (mean, sd, min, max, pixel count)
  • Handles clouds: CI=0 or NA when green band is absent
  • Creates 5-band output: R, G, B, NIR, CI (float32 for CI band)

Outputs:

field_tiles_CI/{FIELD}/{DATE}.tif         # 5-band daily per-field
Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds   # Field stats RDS
Data/extracted_ci/cumulative_vals/combined_CI_data.rds  # Wide RDS (fields × dates)

Data Format (combined_CI_data.rds):

  • Rows: Field names
  • Columns: Dates (YYYY-MM-DD)
  • Values: Mean CI per field on that date

Execution Context:

  • SOBIT: Via 20_ci_extraction.sh
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7
    

Stage 30: Growth Model Interpolation (R)

Script: r_app/30_interpolate_growth_model.R

Inputs:

  • Cumulative CI data: combined_CI_data.rds (from Stage 20)
  • Harvest dates: harvest.xlsx (groups data into seasons)

Key Processing:

  • Applies LOESS smoothing (span=0.3) to CI time series
  • Interpolates missing dates (handles clouds: if entire field cloudy, skips date)
  • Calculates daily CI changes and cumulative CI sums per season
  • Groups by harvest season (defined in harvest.xlsx)

Output Format: Interpolated growth model (long format RDS)

Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds

Data Structure:

  • Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
  • Used by: Stage 80 (trend analysis), harvest forecasting

Execution Context:

  • SOBIT: Via 30_growth_model.sh
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata
    

Stage 40: Weekly Mosaic Creation (R)

Script: r_app/40_mosaic_creation_per_field.R

Inputs:

  • Daily per-field CI TIFFs: field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif (week's dates)
  • Week number and year

Key Processing:

  • Reads all daily TIFFs for a given ISO week (MondaySunday)
  • Applies MAX function per pixel across the week
    • Max function handles clouds: picks highest (best) CI value visible during week
  • Outputs 5-band composite: R, G, B, NIR, CI (float32)

Output Format: Per-field weekly mosaics

weekly_mosaic/{FIELD}/week_WW_YYYY.tif

Execution Context:

  • SOBIT: Via 40_mosaic_creation.sh
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata
    

Stage 80: KPI Calculation (R)

Script: r_app/80_calculate_kpis.R

Inputs:

  • Current week mosaic: weekly_mosaic/{FIELD}/week_WW_2026.tif
  • Previous weeks' mosaics (for trend analysis)
  • Growth model data: All_pivots_Cumulative_CI_quadrant_year_v2.rds
  • Field boundaries: pivot.geojson
  • Harvest data: harvest.xlsx

Key Processing:

  • Client-type branching (determined from project name):
    • agronomic_support → Sources 80_utils_agronomic_support.R

      • Field uniformity KPI (CV + Moran's I)
      • Area change KPI
      • TCH forecast KPI
      • Growth decline KPI
      • Weed presence KPI
      • Gap filling KPI
    • cane_supply → Sources 80_utils_cane_supply.R

      • Per-field analysis (acreage, phase)
      • Phase assignment (age-based: germination, tillering, grand growth, maturation)
      • Harvest prediction (integrates Python 31 imminent_prob if available)
      • Status triggers

Outputs:

reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx  # Excel - 21 columns, per-field
reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds  # RDS - Summary for rendering

Execution Context:

  • SOBIT: Via 80_calculate_kpis.sh
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7
    

Stages 90/91: Report Rendering (R Markdown)

Scripts:

  • r_app/90_CI_report_with_kpis_agronomic_support.Rmd (agronomic_support client type)
  • r_app/91_CI_report_with_kpis_cane_supply.Rmd (cane_supply client type)

Inputs:

  • Weekly mosaics: weekly_mosaic/{FIELD}/week_*.tif
  • KPI summary: kpi_summary_tables_week{WW}.rds
  • Field boundaries: pivot.geojson
  • CI time series: combined_CI_data.rds
  • Growth model predictions (Script 91 only)

Key Processing:

Script 90 (Agronomic Support):

  • Field uniformity trend plots (CV over 8 weeks)
  • Spatial autocorrelation maps (Moran's I)
  • Interactive field boundary map (tmap)
  • Farm-level KPI averages
  • Colorblind-friendly palette

Script 91 (Cane Supply):

  • Per-field status alerts (harvest readiness, stress)
  • Phase assignment table
  • Tonnage forecasts (CI curves × historical harvest)
  • Age-based harvest window predictions
  • Urgent/warning/opportunity alerts

Output Format: Microsoft Word (.docx) with embedded tables, images, charts

reports/SmartCane_Report_week{WW}_{YYYY}.docx

Execution Context:

  • SOBIT: Via 90_kpi_report.sh (calls rmarkdown::render)
  • Dev Laptop:
    & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e `
      "rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', `
       params=list(data_dir='angata', report_date=as.Date('2026-02-19')), `
       output_file='SmartCane_Report_week07_2026.docx', `
       output_dir='laravel_app/storage/app/angata/reports')"
    

Exit Points: User-Facing Outputs

Output Type Format Location Audience
Reports Word (.docx) reports/SmartCane_Report_*.docx Agronomist / Farm manager
Field Analysis Excel (.xlsx) reports/field_analysis_week*.xlsx Data analyst / Operations
GeoTIFFs 5-band raster weekly_mosaic/{FIELD}/week_*.tif GIS systems
Predictions CSV harvest_imminent_weekly.csv (Python 31 output) Harvest scheduling

File Storage Architecture

laravel_app/storage/app/{PROJECT}/
├── merged_tif/
│   ├── 2026-02-12.tif          ← Stage 00 output (Python download)
│   ├── 2026-02-13.tif
│   └── 2026-02-19.tif
│
├── field_tiles/               ← Stage 10 output
│   ├── Field_001/
│   │   ├── 2026-02-12.tif
│   │   └── 2026-02-19.tif
│   ├── Field_002/
│   │   └── ...
│   └── ...
│
├── field_tiles_CI/            ← Stage 20 output
│   ├── Field_001/
│   │   ├── 2026-02-12.tif (5-band with CI)
│   │   └── 2026-02-19.tif
│   └── ...
│
├── Data/
│   ├── pivot.geojson          ← Input: field boundaries
│   ├── harvest.xlsx           ← Input: harvest dates (Stage 30 requirement)
│   ├── extracted_ci/
│   │   ├── daily_vals/
│   │   │   └── Field_001/2026-02-19.rds  ← Stage 20 output
│   │   └── cumulative_vals/
│   │       ├── combined_CI_data.rds      ← Stage 20 output (wide format)
│   │       └── All_pivots_Cumulative_CI_quadrant_year_v2.rds  ← Stage 30 output
│   └── growth_model_interpolated/        ← Stage 30 output
│
├── weekly_mosaic/             ← Stage 40 output
│   ├── Field_001/
│   │   ├── week_07_2026.tif   (5-band, MAX-aggregated)
│   │   └── week_06_2026.tif
│   └── ...
│
└── reports/                   ← Stages 80/90/91 output
    ├── SmartCane_Report_week07_2026.docx
    ├── angata_field_analysis_week07_2026.xlsx
    └── kpis/
        └── angata_kpi_summary_tables_week07.rds

Data Format Reference

RDS Files (R Serialized Objects)

combined_CI_data.rds (Stage 20 output)

  • Type: data.frame
  • Rows: Field names
  • Cols: ISO dates (YYYY-MM-DD)
  • Values: Mean Canopy Index per field-date

All_pivots_Cumulative_CI_quadrant_year_v2.rds (Stage 30 output)

  • Type: data.frame
  • Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
  • Used by: Scripts 80, 90/91, harvest prediction

kpi_summary_tables_week{WW}.rds (Stage 80 output)

  • Type: list of data.frames
  • Contains: Weekly KPI summaries for all fields
  • Used by: Scripts 90/91 rendering

GeoTIFF Bands

merged_tif/{DATE}.tif (Stage 00, 4-band)

  • Band 1: Red
  • Band 2: Green
  • Band 3: Blue
  • Band 4: NIR

field_tiles_CI/{FIELD}/{DATE}.tif (Stage 20, 5-band)

  • Bands 1-4: R, G, B, NIR (uint16)
  • Band 5: Canopy Index (float32)

weekly_mosaic/{FIELD}/week_WW_YYYY.tif (Stage 40, 5-band)

  • Bands 1-4: R, G, B, NIR (uint16, MAX of week)
  • Band 5: CI (float32, MAX of week)

Next Steps