13 KiB
SmartCane Data Flow Architecture
This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage.
High-Level Data Flow
%% High-Level Data Flow
flowchart TD
A["🛰️ External Data Sources<br/>Planet API • GeoJSON • Harvest Data"]
B["🐍 Python Stage 00<br/>00_download_8band_pu_optimized.py"]
C["💾 4-Band TIFF<br/>merged_tif/{DATE}.tif<br/>RGB+NIR uint16"]
D["🔴 R Stage 10<br/>10_create_per_field_tiffs.R"]
E["💾 Per-Field Tiles<br/>field_tiles/{FIELD}/{DATE}.tif"]
F["🟢 R Stage 20<br/>20_ci_extraction_per_field.R"]
G["💾 CI Data<br/>field_tiles_CI/{FIELD}/{DATE}.tif<br/>+ combined_CI_data.rds"]
H["🟡 R Stage 30<br/>30_interpolate_growth_model.R"]
I["💾 Interpolated Model<br/>All_pivots_Cumulative_CI_quadrant_year_v2.rds"]
J["🟣 R Stage 40<br/>40_mosaic_creation_per_field.R"]
K["💾 Weekly Mosaics<br/>weekly_mosaic/{FIELD}/week_WW_YYYY.tif"]
L["🟠 R Stage 80<br/>80_calculate_kpis.R"]
M["💾 KPI Outputs<br/>Excel + RDS Summary"]
N["📄 R Stage 90/91<br/>RMarkdown Reporting"]
O["✅ Final Outputs<br/>Word Reports • Excel Tables • GeoTIFFs"]
A -->|Download| B
B -->|Save| C
C -->|Split| D
D -->|Save| E
E -->|Extract CI| F
F -->|Save| G
G -->|Interpolate| H
H -->|Save| I
I -->|Create Mosaic| J
J -->|Save| K
K -->|Calculate KPIs| L
L -->|Save| M
M -->|Render Report| N
N -->|Generate| O
Stage-by-Stage Transformation
Entry Point: External Data Sources
| Source | Format | Key File | Purpose |
|---|---|---|---|
| Planet Labs API | 4-band GeoTIFF (RGB+NIR) | Satellite imagery | Raw canopy reflectance |
| Project GeoJSON | GeoJSON polygons | pivot.geojson |
Field boundary masks |
| Harvest Records | Excel spreadsheet | harvest.xlsx |
Season date markers (optional for agronomic_support, required for cane_supply) |
Storage Path: laravel_app/storage/app/{PROJECT}/Data/
Stage 00: Download (Python)
Script: python_app/00_download_8band_pu_optimized.py
Inputs:
- Planet API credentials (SentinelHub)
- Date range (YYYY-MM-DD format)
- Project ID (determines bounding box)
- Cloud masking threshold
Key Processing:
- Authenticates via SentinelHub SDK
- Downloads 4 bands (R, G, B, NIR) at 3m resolution
- Applies UDM1 cloud masking
- Merges all tiles for the day into single GeoTIFF
Output Format: 4-band uint16 GeoTIFF, ~150-300MB per date
laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif
Execution Context:
- SOBIT: Triggered via Laravel
ProjectDownloadTiffJobqueue - Dev Laptop: Manual PowerShell command
cd python_app python 00_download_8band_pu_optimized.py angata --date 2026-02-19
Stage 10: Per-Field Tile Creation (R)
Script: r_app/10_create_per_field_tiffs.R
Inputs:
- Merged 4-band TIFF:
merged_tif/{DATE}.tif - Field boundaries:
pivot.geojson
Key Processing:
- Reads polygon geometries from GeoJSON
- Clips merged TIFF to each field boundary
- Preserves 4 bands (R, G, B, NIR) as uint16
- Handles edge pixels and overlaps
Output Format: Per-field 4-band TIFFs
laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif
Execution Context:
- SOBIT: Via shell wrapper
10_planet_download.sh - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7
Stage 20: CI Extraction (R)
Script: r_app/20_ci_extraction_per_field.R
Inputs:
- Per-field 4-band TIFFs:
field_tiles/{FIELD}/{DATE}.tif - Field boundaries:
pivot.geojson
Key Processing:
- Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel
- Extracts field-level statistics (mean, sd, min, max, pixel count)
- Handles clouds: CI=0 or NA when green band is absent
- Creates 5-band output: R, G, B, NIR, CI (float32 for CI band)
Outputs:
field_tiles_CI/{FIELD}/{DATE}.tif # 5-band daily per-field
Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds # Field stats RDS
Data/extracted_ci/cumulative_vals/combined_CI_data.rds # Wide RDS (fields × dates)
Data Format (combined_CI_data.rds):
- Rows: Field names
- Columns: Dates (YYYY-MM-DD)
- Values: Mean CI per field on that date
Execution Context:
- SOBIT: Via
20_ci_extraction.sh - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7
Stage 30: Growth Model Interpolation (R)
Script: r_app/30_interpolate_growth_model.R
Inputs:
- Cumulative CI data:
combined_CI_data.rds(from Stage 20) - Harvest dates:
harvest.xlsx(groups data into seasons)
Key Processing:
- Applies LOESS smoothing (span=0.3) to CI time series
- Interpolates missing dates (handles clouds: if entire field cloudy, skips date)
- Calculates daily CI changes and cumulative CI sums per season
- Groups by harvest season (defined in harvest.xlsx)
Output Format: Interpolated growth model (long format RDS)
Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
Data Structure:
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Stage 80 (trend analysis), harvest forecasting
Execution Context:
- SOBIT: Via
30_growth_model.sh - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata
Stage 40: Weekly Mosaic Creation (R)
Script: r_app/40_mosaic_creation_per_field.R
Inputs:
- Daily per-field CI TIFFs:
field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif(week's dates) - Week number and year
Key Processing:
- Reads all daily TIFFs for a given ISO week (Monday–Sunday)
- Applies MAX function per pixel across the week
- Max function handles clouds: picks highest (best) CI value visible during week
- Outputs 5-band composite: R, G, B, NIR, CI (float32)
Output Format: Per-field weekly mosaics
weekly_mosaic/{FIELD}/week_WW_YYYY.tif
Execution Context:
- SOBIT: Via
40_mosaic_creation.sh - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata
Stage 80: KPI Calculation (R)
Script: r_app/80_calculate_kpis.R
Inputs:
- Current week mosaic:
weekly_mosaic/{FIELD}/week_WW_2026.tif - Previous weeks' mosaics (for trend analysis)
- Growth model data:
All_pivots_Cumulative_CI_quadrant_year_v2.rds - Field boundaries:
pivot.geojson - Harvest data:
harvest.xlsx
Key Processing:
- Client-type branching (determined from project name):
-
agronomic_support → Sources
80_utils_agronomic_support.R- Field uniformity KPI (CV + Moran's I)
- Area change KPI
- TCH forecast KPI
- Growth decline KPI
- Weed presence KPI
- Gap filling KPI
-
cane_supply → Sources
80_utils_cane_supply.R- Per-field analysis (acreage, phase)
- Phase assignment (age-based: germination, tillering, grand growth, maturation)
- Harvest prediction (integrates Python 31 imminent_prob if available)
- Status triggers
-
Outputs:
reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx # Excel - 21 columns, per-field
reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds # RDS - Summary for rendering
Execution Context:
- SOBIT: Via
80_calculate_kpis.sh - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7
Stages 90/91: Report Rendering (R Markdown)
Scripts:
r_app/90_CI_report_with_kpis_agronomic_support.Rmd(agronomic_support client type)r_app/91_CI_report_with_kpis_cane_supply.Rmd(cane_supply client type)
Inputs:
- Weekly mosaics:
weekly_mosaic/{FIELD}/week_*.tif - KPI summary:
kpi_summary_tables_week{WW}.rds - Field boundaries:
pivot.geojson - CI time series:
combined_CI_data.rds - Growth model predictions (Script 91 only)
Key Processing:
Script 90 (Agronomic Support):
- Field uniformity trend plots (CV over 8 weeks)
- Spatial autocorrelation maps (Moran's I)
- Interactive field boundary map (tmap)
- Farm-level KPI averages
- Colorblind-friendly palette
Script 91 (Cane Supply):
- Per-field status alerts (harvest readiness, stress)
- Phase assignment table
- Tonnage forecasts (CI curves × historical harvest)
- Age-based harvest window predictions
- Urgent/warning/opportunity alerts
Output Format: Microsoft Word (.docx) with embedded tables, images, charts
reports/SmartCane_Report_week{WW}_{YYYY}.docx
Execution Context:
- SOBIT: Via
90_kpi_report.sh(calls rmarkdown::render) - Dev Laptop:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e ` "rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', ` params=list(data_dir='angata', report_date=as.Date('2026-02-19')), ` output_file='SmartCane_Report_week07_2026.docx', ` output_dir='laravel_app/storage/app/angata/reports')"
Exit Points: User-Facing Outputs
| Output Type | Format | Location | Audience |
|---|---|---|---|
| Reports | Word (.docx) | reports/SmartCane_Report_*.docx |
Agronomist / Farm manager |
| Field Analysis | Excel (.xlsx) | reports/field_analysis_week*.xlsx |
Data analyst / Operations |
| GeoTIFFs | 5-band raster | weekly_mosaic/{FIELD}/week_*.tif |
GIS systems |
| Predictions | CSV | harvest_imminent_weekly.csv (Python 31 output) |
Harvest scheduling |
File Storage Architecture
laravel_app/storage/app/{PROJECT}/
├── merged_tif/
│ ├── 2026-02-12.tif ← Stage 00 output (Python download)
│ ├── 2026-02-13.tif
│ └── 2026-02-19.tif
│
├── field_tiles/ ← Stage 10 output
│ ├── Field_001/
│ │ ├── 2026-02-12.tif
│ │ └── 2026-02-19.tif
│ ├── Field_002/
│ │ └── ...
│ └── ...
│
├── field_tiles_CI/ ← Stage 20 output
│ ├── Field_001/
│ │ ├── 2026-02-12.tif (5-band with CI)
│ │ └── 2026-02-19.tif
│ └── ...
│
├── Data/
│ ├── pivot.geojson ← Input: field boundaries
│ ├── harvest.xlsx ← Input: harvest dates (Stage 30 requirement)
│ ├── extracted_ci/
│ │ ├── daily_vals/
│ │ │ └── Field_001/2026-02-19.rds ← Stage 20 output
│ │ └── cumulative_vals/
│ │ ├── combined_CI_data.rds ← Stage 20 output (wide format)
│ │ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds ← Stage 30 output
│ └── growth_model_interpolated/ ← Stage 30 output
│
├── weekly_mosaic/ ← Stage 40 output
│ ├── Field_001/
│ │ ├── week_07_2026.tif (5-band, MAX-aggregated)
│ │ └── week_06_2026.tif
│ └── ...
│
└── reports/ ← Stages 80/90/91 output
├── SmartCane_Report_week07_2026.docx
├── angata_field_analysis_week07_2026.xlsx
└── kpis/
└── angata_kpi_summary_tables_week07.rds
Data Format Reference
RDS Files (R Serialized Objects)
combined_CI_data.rds (Stage 20 output)
- Type: data.frame
- Rows: Field names
- Cols: ISO dates (YYYY-MM-DD)
- Values: Mean Canopy Index per field-date
All_pivots_Cumulative_CI_quadrant_year_v2.rds (Stage 30 output)
- Type: data.frame
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Scripts 80, 90/91, harvest prediction
kpi_summary_tables_week{WW}.rds (Stage 80 output)
- Type: list of data.frames
- Contains: Weekly KPI summaries for all fields
- Used by: Scripts 90/91 rendering
GeoTIFF Bands
merged_tif/{DATE}.tif (Stage 00, 4-band)
- Band 1: Red
- Band 2: Green
- Band 3: Blue
- Band 4: NIR
field_tiles_CI/{FIELD}/{DATE}.tif (Stage 20, 5-band)
- Bands 1-4: R, G, B, NIR (uint16)
- Band 5: Canopy Index (float32)
weekly_mosaic/{FIELD}/week_WW_YYYY.tif (Stage 40, 5-band)
- Bands 1-4: R, G, B, NIR (uint16, MAX of week)
- Band 5: CI (float32, MAX of week)
Next Steps
- See CLIENT_TYPE_ARCHITECTURE.md for how agronomic_support and cane_supply types branch
- See SOBIT_DEPLOYMENT.md for Laravel queue orchestration
- See DEV_LAPTOP_EXECUTION.md for manual execution workflow