# SmartCane Data Flow Architecture This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage. ## High-Level Data Flow ```mermaid %% High-Level Data Flow flowchart TD A["πŸ›°οΈ External Data Sources
Planet API β€’ GeoJSON β€’ Harvest Data"] B["🐍 Python Stage 00
00_download_8band_pu_optimized.py"] C["πŸ’Ύ 4-Band TIFF
merged_tif/{DATE}.tif
RGB+NIR uint16"] D["πŸ”΄ R Stage 10
10_create_per_field_tiffs.R"] E["πŸ’Ύ Per-Field Tiles
field_tiles/{FIELD}/{DATE}.tif"] F["🟒 R Stage 20
20_ci_extraction_per_field.R"] G["πŸ’Ύ CI Data
field_tiles_CI/{FIELD}/{DATE}.tif
+ combined_CI_data.rds"] H["🟑 R Stage 30
30_interpolate_growth_model.R"] I["πŸ’Ύ Interpolated Model
All_pivots_Cumulative_CI_quadrant_year_v2.rds"] J["🟣 R Stage 40
40_mosaic_creation_per_field.R"] K["πŸ’Ύ Weekly Mosaics
weekly_mosaic/{FIELD}/week_WW_YYYY.tif"] L["🟠 R Stage 80
80_calculate_kpis.R"] M["πŸ’Ύ KPI Outputs
Excel + RDS Summary"] N["πŸ“„ R Stage 90/91
RMarkdown Reporting"] O["βœ… Final Outputs
Word Reports β€’ Excel Tables β€’ GeoTIFFs"] A -->|Download| B B -->|Save| C C -->|Split| D D -->|Save| E E -->|Extract CI| F F -->|Save| G G -->|Interpolate| H H -->|Save| I I -->|Create Mosaic| J J -->|Save| K K -->|Calculate KPIs| L L -->|Save| M M -->|Render Report| N N -->|Generate| O ``` ## Stage-by-Stage Transformation ### Entry Point: External Data Sources | Source | Format | Key File | Purpose | |--------|--------|----------|---------| | **Planet Labs API** | 4-band GeoTIFF (RGB+NIR) | Satellite imagery | Raw canopy reflectance | | **Project GeoJSON** | GeoJSON polygons | `pivot.geojson` | Field boundary masks | | **Harvest Records** | Excel spreadsheet | `harvest.xlsx` | Season date markers (optional for agronomic_support, required for cane_supply) | **Storage Path**: `laravel_app/storage/app/{PROJECT}/Data/` --- ### Stage 00: Download (Python) **Script**: `python_app/00_download_8band_pu_optimized.py` **Inputs**: - Planet API credentials (SentinelHub) - Date range (YYYY-MM-DD format) - Project ID (determines bounding box) - Cloud masking threshold **Key Processing**: - Authenticates via SentinelHub SDK - Downloads 4 bands (R, G, B, NIR) at 3m resolution - Applies UDM1 cloud masking - Merges all tiles for the day into single GeoTIFF **Output Format**: 4-band uint16 GeoTIFF, ~150-300MB per date ``` laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif ``` **Execution Context**: - **SOBIT**: Triggered via Laravel `ProjectDownloadTiffJob` queue - **Dev Laptop**: Manual PowerShell command ```powershell cd python_app python 00_download_8band_pu_optimized.py angata --date 2026-02-19 ``` --- ### Stage 10: Per-Field Tile Creation (R) **Script**: `r_app/10_create_per_field_tiffs.R` **Inputs**: - Merged 4-band TIFF: `merged_tif/{DATE}.tif` - Field boundaries: `pivot.geojson` **Key Processing**: - Reads polygon geometries from GeoJSON - Clips merged TIFF to each field boundary - Preserves 4 bands (R, G, B, NIR) as uint16 - Handles edge pixels and overlaps **Output Format**: Per-field 4-band TIFFs ``` laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif ``` **Execution Context**: - **SOBIT**: Via shell wrapper `10_planet_download.sh` - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7 ``` --- ### Stage 20: CI Extraction (R) **Script**: `r_app/20_ci_extraction_per_field.R` **Inputs**: - Per-field 4-band TIFFs: `field_tiles/{FIELD}/{DATE}.tif` - Field boundaries: `pivot.geojson` **Key Processing**: - Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel - Extracts field-level statistics (mean, sd, min, max, pixel count) - Handles clouds: CI=0 or NA when green band is absent - Creates 5-band output: R, G, B, NIR, CI (float32 for CI band) **Outputs**: ``` field_tiles_CI/{FIELD}/{DATE}.tif # 5-band daily per-field Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds # Field stats RDS Data/extracted_ci/cumulative_vals/combined_CI_data.rds # Wide RDS (fields Γ— dates) ``` **Data Format** (combined_CI_data.rds): - Rows: Field names - Columns: Dates (YYYY-MM-DD) - Values: Mean CI per field on that date **Execution Context**: - **SOBIT**: Via `20_ci_extraction.sh` - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7 ``` --- ### Stage 30: Growth Model Interpolation (R) **Script**: `r_app/30_interpolate_growth_model.R` **Inputs**: - Cumulative CI data: `combined_CI_data.rds` (from Stage 20) - Harvest dates: `harvest.xlsx` (groups data into seasons) **Key Processing**: - Applies LOESS smoothing (span=0.3) to CI time series - Interpolates missing dates (handles clouds: if entire field cloudy, skips date) - Calculates daily CI changes and cumulative CI sums per season - Groups by harvest season (defined in harvest.xlsx) **Output Format**: Interpolated growth model (long format RDS) ``` Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds ``` **Data Structure**: - Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase - Used by: Stage 80 (trend analysis), harvest forecasting **Execution Context**: - **SOBIT**: Via `30_growth_model.sh` - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata ``` --- ### Stage 40: Weekly Mosaic Creation (R) **Script**: `r_app/40_mosaic_creation_per_field.R` **Inputs**: - Daily per-field CI TIFFs: `field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif` (week's dates) - Week number and year **Key Processing**: - Reads all daily TIFFs for a given ISO week (Monday–Sunday) - Applies MAX function per pixel across the week - Max function handles clouds: picks highest (best) CI value visible during week - Outputs 5-band composite: R, G, B, NIR, CI (float32) **Output Format**: Per-field weekly mosaics ``` weekly_mosaic/{FIELD}/week_WW_YYYY.tif ``` **Execution Context**: - **SOBIT**: Via `40_mosaic_creation.sh` - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata ``` --- ### Stage 80: KPI Calculation (R) **Script**: `r_app/80_calculate_kpis.R` **Inputs**: - Current week mosaic: `weekly_mosaic/{FIELD}/week_WW_2026.tif` - Previous weeks' mosaics (for trend analysis) - Growth model data: `All_pivots_Cumulative_CI_quadrant_year_v2.rds` - Field boundaries: `pivot.geojson` - Harvest data: `harvest.xlsx` **Key Processing**: - **Client-type branching** (determined from project name): - **agronomic_support** β†’ Sources `80_utils_agronomic_support.R` - Field uniformity KPI (CV + Moran's I) - Area change KPI - TCH forecast KPI - Growth decline KPI - Weed presence KPI - Gap filling KPI - **cane_supply** β†’ Sources `80_utils_cane_supply.R` - Per-field analysis (acreage, phase) - Phase assignment (age-based: germination, tillering, grand growth, maturation) - Harvest prediction (integrates Python 31 imminent_prob if available) - Status triggers **Outputs**: ``` reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx # Excel - 21 columns, per-field reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds # RDS - Summary for rendering ``` **Execution Context**: - **SOBIT**: Via `80_calculate_kpis.sh` - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7 ``` --- ### Stages 90/91: Report Rendering (R Markdown) **Scripts**: - `r_app/90_CI_report_with_kpis_agronomic_support.Rmd` (agronomic_support client type) - `r_app/91_CI_report_with_kpis_cane_supply.Rmd` (cane_supply client type) **Inputs**: - Weekly mosaics: `weekly_mosaic/{FIELD}/week_*.tif` - KPI summary: `kpi_summary_tables_week{WW}.rds` - Field boundaries: `pivot.geojson` - CI time series: `combined_CI_data.rds` - Growth model predictions (Script 91 only) **Key Processing**: **Script 90 (Agronomic Support)**: - Field uniformity trend plots (CV over 8 weeks) - Spatial autocorrelation maps (Moran's I) - Interactive field boundary map (tmap) - Farm-level KPI averages - Colorblind-friendly palette **Script 91 (Cane Supply)**: - Per-field status alerts (harvest readiness, stress) - Phase assignment table - Tonnage forecasts (CI curves Γ— historical harvest) - Age-based harvest window predictions - Urgent/warning/opportunity alerts **Output Format**: Microsoft Word (.docx) with embedded tables, images, charts ``` reports/SmartCane_Report_week{WW}_{YYYY}.docx ``` **Execution Context**: - **SOBIT**: Via `90_kpi_report.sh` (calls rmarkdown::render) - **Dev Laptop**: ```powershell & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e ` "rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', ` params=list(data_dir='angata', report_date=as.Date('2026-02-19')), ` output_file='SmartCane_Report_week07_2026.docx', ` output_dir='laravel_app/storage/app/angata/reports')" ``` --- ## Exit Points: User-Facing Outputs | Output Type | Format | Location | Audience | |-------------|--------|----------|----------| | **Reports** | Word (.docx) | `reports/SmartCane_Report_*.docx` | Agronomist / Farm manager | | **Field Analysis** | Excel (.xlsx) | `reports/field_analysis_week*.xlsx` | Data analyst / Operations | | **GeoTIFFs** | 5-band raster | `weekly_mosaic/{FIELD}/week_*.tif` | GIS systems | | **Predictions** | CSV | `harvest_imminent_weekly.csv` (Python 31 output) | Harvest scheduling | --- ## File Storage Architecture ``` laravel_app/storage/app/{PROJECT}/ β”œβ”€β”€ merged_tif/ β”‚ β”œβ”€β”€ 2026-02-12.tif ← Stage 00 output (Python download) β”‚ β”œβ”€β”€ 2026-02-13.tif β”‚ └── 2026-02-19.tif β”‚ β”œβ”€β”€ field_tiles/ ← Stage 10 output β”‚ β”œβ”€β”€ Field_001/ β”‚ β”‚ β”œβ”€β”€ 2026-02-12.tif β”‚ β”‚ └── 2026-02-19.tif β”‚ β”œβ”€β”€ Field_002/ β”‚ β”‚ └── ... β”‚ └── ... β”‚ β”œβ”€β”€ field_tiles_CI/ ← Stage 20 output β”‚ β”œβ”€β”€ Field_001/ β”‚ β”‚ β”œβ”€β”€ 2026-02-12.tif (5-band with CI) β”‚ β”‚ └── 2026-02-19.tif β”‚ └── ... β”‚ β”œβ”€β”€ Data/ β”‚ β”œβ”€β”€ pivot.geojson ← Input: field boundaries β”‚ β”œβ”€β”€ harvest.xlsx ← Input: harvest dates (Stage 30 requirement) β”‚ β”œβ”€β”€ extracted_ci/ β”‚ β”‚ β”œβ”€β”€ daily_vals/ β”‚ β”‚ β”‚ └── Field_001/2026-02-19.rds ← Stage 20 output β”‚ β”‚ └── cumulative_vals/ β”‚ β”‚ β”œβ”€β”€ combined_CI_data.rds ← Stage 20 output (wide format) β”‚ β”‚ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds ← Stage 30 output β”‚ └── growth_model_interpolated/ ← Stage 30 output β”‚ β”œβ”€β”€ weekly_mosaic/ ← Stage 40 output β”‚ β”œβ”€β”€ Field_001/ β”‚ β”‚ β”œβ”€β”€ week_07_2026.tif (5-band, MAX-aggregated) β”‚ β”‚ └── week_06_2026.tif β”‚ └── ... β”‚ └── reports/ ← Stages 80/90/91 output β”œβ”€β”€ SmartCane_Report_week07_2026.docx β”œβ”€β”€ angata_field_analysis_week07_2026.xlsx └── kpis/ └── angata_kpi_summary_tables_week07.rds ``` --- ## Data Format Reference ### RDS Files (R Serialized Objects) **combined_CI_data.rds** (Stage 20 output) - Type: data.frame - Rows: Field names - Cols: ISO dates (YYYY-MM-DD) - Values: Mean Canopy Index per field-date **All_pivots_Cumulative_CI_quadrant_year_v2.rds** (Stage 30 output) - Type: data.frame - Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase - Used by: Scripts 80, 90/91, harvest prediction **kpi_summary_tables_week{WW}.rds** (Stage 80 output) - Type: list of data.frames - Contains: Weekly KPI summaries for all fields - Used by: Scripts 90/91 rendering ### GeoTIFF Bands **merged_tif/{DATE}.tif** (Stage 00, 4-band) - Band 1: Red - Band 2: Green - Band 3: Blue - Band 4: NIR **field_tiles_CI/{FIELD}/{DATE}.tif** (Stage 20, 5-band) - Bands 1-4: R, G, B, NIR (uint16) - Band 5: Canopy Index (float32) **weekly_mosaic/{FIELD}/week_WW_YYYY.tif** (Stage 40, 5-band) - Bands 1-4: R, G, B, NIR (uint16, MAX of week) - Band 5: CI (float32, MAX of week) --- ## Next Steps - See [CLIENT_TYPE_ARCHITECTURE.md](CLIENT_TYPE_ARCHITECTURE.md) for how agronomic_support and cane_supply types branch - See [SOBIT_DEPLOYMENT.md](SOBIT_DEPLOYMENT.md) for Laravel queue orchestration - See [DEV_LAPTOP_EXECUTION.md](DEV_LAPTOP_EXECUTION.md) for manual execution workflow