# SmartCane Data Flow Architecture
This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage.
## High-Level Data Flow
```mermaid
%% High-Level Data Flow
flowchart TD
A["π°οΈ External Data Sources
Planet API β’ GeoJSON β’ Harvest Data"]
B["π Python Stage 00
00_download_8band_pu_optimized.py"]
C["πΎ 4-Band TIFF
merged_tif/{DATE}.tif
RGB+NIR uint16"]
D["π΄ R Stage 10
10_create_per_field_tiffs.R"]
E["πΎ Per-Field Tiles
field_tiles/{FIELD}/{DATE}.tif"]
F["π’ R Stage 20
20_ci_extraction_per_field.R"]
G["πΎ CI Data
field_tiles_CI/{FIELD}/{DATE}.tif
+ combined_CI_data.rds"]
H["π‘ R Stage 30
30_interpolate_growth_model.R"]
I["πΎ Interpolated Model
All_pivots_Cumulative_CI_quadrant_year_v2.rds"]
J["π£ R Stage 40
40_mosaic_creation_per_field.R"]
K["πΎ Weekly Mosaics
weekly_mosaic/{FIELD}/week_WW_YYYY.tif"]
L["π R Stage 80
80_calculate_kpis.R"]
M["πΎ KPI Outputs
Excel + RDS Summary"]
N["π R Stage 90/91
RMarkdown Reporting"]
O["β
Final Outputs
Word Reports β’ Excel Tables β’ GeoTIFFs"]
A -->|Download| B
B -->|Save| C
C -->|Split| D
D -->|Save| E
E -->|Extract CI| F
F -->|Save| G
G -->|Interpolate| H
H -->|Save| I
I -->|Create Mosaic| J
J -->|Save| K
K -->|Calculate KPIs| L
L -->|Save| M
M -->|Render Report| N
N -->|Generate| O
```
## Stage-by-Stage Transformation
### Entry Point: External Data Sources
| Source | Format | Key File | Purpose |
|--------|--------|----------|---------|
| **Planet Labs API** | 4-band GeoTIFF (RGB+NIR) | Satellite imagery | Raw canopy reflectance |
| **Project GeoJSON** | GeoJSON polygons | `pivot.geojson` | Field boundary masks |
| **Harvest Records** | Excel spreadsheet | `harvest.xlsx` | Season date markers (optional for agronomic_support, required for cane_supply) |
**Storage Path**: `laravel_app/storage/app/{PROJECT}/Data/`
---
### Stage 00: Download (Python)
**Script**: `python_app/00_download_8band_pu_optimized.py`
**Inputs**:
- Planet API credentials (SentinelHub)
- Date range (YYYY-MM-DD format)
- Project ID (determines bounding box)
- Cloud masking threshold
**Key Processing**:
- Authenticates via SentinelHub SDK
- Downloads 4 bands (R, G, B, NIR) at 3m resolution
- Applies UDM1 cloud masking
- Merges all tiles for the day into single GeoTIFF
**Output Format**: 4-band uint16 GeoTIFF, ~150-300MB per date
```
laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif
```
**Execution Context**:
- **SOBIT**: Triggered via Laravel `ProjectDownloadTiffJob` queue
- **Dev Laptop**: Manual PowerShell command
```powershell
cd python_app
python 00_download_8band_pu_optimized.py angata --date 2026-02-19
```
---
### Stage 10: Per-Field Tile Creation (R)
**Script**: `r_app/10_create_per_field_tiffs.R`
**Inputs**:
- Merged 4-band TIFF: `merged_tif/{DATE}.tif`
- Field boundaries: `pivot.geojson`
**Key Processing**:
- Reads polygon geometries from GeoJSON
- Clips merged TIFF to each field boundary
- Preserves 4 bands (R, G, B, NIR) as uint16
- Handles edge pixels and overlaps
**Output Format**: Per-field 4-band TIFFs
```
laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif
```
**Execution Context**:
- **SOBIT**: Via shell wrapper `10_planet_download.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7
```
---
### Stage 20: CI Extraction (R)
**Script**: `r_app/20_ci_extraction_per_field.R`
**Inputs**:
- Per-field 4-band TIFFs: `field_tiles/{FIELD}/{DATE}.tif`
- Field boundaries: `pivot.geojson`
**Key Processing**:
- Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel
- Extracts field-level statistics (mean, sd, min, max, pixel count)
- Handles clouds: CI=0 or NA when green band is absent
- Creates 5-band output: R, G, B, NIR, CI (float32 for CI band)
**Outputs**:
```
field_tiles_CI/{FIELD}/{DATE}.tif # 5-band daily per-field
Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds # Field stats RDS
Data/extracted_ci/cumulative_vals/combined_CI_data.rds # Wide RDS (fields Γ dates)
```
**Data Format** (combined_CI_data.rds):
- Rows: Field names
- Columns: Dates (YYYY-MM-DD)
- Values: Mean CI per field on that date
**Execution Context**:
- **SOBIT**: Via `20_ci_extraction.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7
```
---
### Stage 30: Growth Model Interpolation (R)
**Script**: `r_app/30_interpolate_growth_model.R`
**Inputs**:
- Cumulative CI data: `combined_CI_data.rds` (from Stage 20)
- Harvest dates: `harvest.xlsx` (groups data into seasons)
**Key Processing**:
- Applies LOESS smoothing (span=0.3) to CI time series
- Interpolates missing dates (handles clouds: if entire field cloudy, skips date)
- Calculates daily CI changes and cumulative CI sums per season
- Groups by harvest season (defined in harvest.xlsx)
**Output Format**: Interpolated growth model (long format RDS)
```
Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
```
**Data Structure**:
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Stage 80 (trend analysis), harvest forecasting
**Execution Context**:
- **SOBIT**: Via `30_growth_model.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata
```
---
### Stage 40: Weekly Mosaic Creation (R)
**Script**: `r_app/40_mosaic_creation_per_field.R`
**Inputs**:
- Daily per-field CI TIFFs: `field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif` (week's dates)
- Week number and year
**Key Processing**:
- Reads all daily TIFFs for a given ISO week (MondayβSunday)
- Applies MAX function per pixel across the week
- Max function handles clouds: picks highest (best) CI value visible during week
- Outputs 5-band composite: R, G, B, NIR, CI (float32)
**Output Format**: Per-field weekly mosaics
```
weekly_mosaic/{FIELD}/week_WW_YYYY.tif
```
**Execution Context**:
- **SOBIT**: Via `40_mosaic_creation.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata
```
---
### Stage 80: KPI Calculation (R)
**Script**: `r_app/80_calculate_kpis.R`
**Inputs**:
- Current week mosaic: `weekly_mosaic/{FIELD}/week_WW_2026.tif`
- Previous weeks' mosaics (for trend analysis)
- Growth model data: `All_pivots_Cumulative_CI_quadrant_year_v2.rds`
- Field boundaries: `pivot.geojson`
- Harvest data: `harvest.xlsx`
**Key Processing**:
- **Client-type branching** (determined from project name):
- **agronomic_support** β Sources `80_utils_agronomic_support.R`
- Field uniformity KPI (CV + Moran's I)
- Area change KPI
- TCH forecast KPI
- Growth decline KPI
- Weed presence KPI
- Gap filling KPI
- **cane_supply** β Sources `80_utils_cane_supply.R`
- Per-field analysis (acreage, phase)
- Phase assignment (age-based: germination, tillering, grand growth, maturation)
- Harvest prediction (integrates Python 31 imminent_prob if available)
- Status triggers
**Outputs**:
```
reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx # Excel - 21 columns, per-field
reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds # RDS - Summary for rendering
```
**Execution Context**:
- **SOBIT**: Via `80_calculate_kpis.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7
```
---
### Stages 90/91: Report Rendering (R Markdown)
**Scripts**:
- `r_app/90_CI_report_with_kpis_agronomic_support.Rmd` (agronomic_support client type)
- `r_app/91_CI_report_with_kpis_cane_supply.Rmd` (cane_supply client type)
**Inputs**:
- Weekly mosaics: `weekly_mosaic/{FIELD}/week_*.tif`
- KPI summary: `kpi_summary_tables_week{WW}.rds`
- Field boundaries: `pivot.geojson`
- CI time series: `combined_CI_data.rds`
- Growth model predictions (Script 91 only)
**Key Processing**:
**Script 90 (Agronomic Support)**:
- Field uniformity trend plots (CV over 8 weeks)
- Spatial autocorrelation maps (Moran's I)
- Interactive field boundary map (tmap)
- Farm-level KPI averages
- Colorblind-friendly palette
**Script 91 (Cane Supply)**:
- Per-field status alerts (harvest readiness, stress)
- Phase assignment table
- Tonnage forecasts (CI curves Γ historical harvest)
- Age-based harvest window predictions
- Urgent/warning/opportunity alerts
**Output Format**: Microsoft Word (.docx) with embedded tables, images, charts
```
reports/SmartCane_Report_week{WW}_{YYYY}.docx
```
**Execution Context**:
- **SOBIT**: Via `90_kpi_report.sh` (calls rmarkdown::render)
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e `
"rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', `
params=list(data_dir='angata', report_date=as.Date('2026-02-19')), `
output_file='SmartCane_Report_week07_2026.docx', `
output_dir='laravel_app/storage/app/angata/reports')"
```
---
## Exit Points: User-Facing Outputs
| Output Type | Format | Location | Audience |
|-------------|--------|----------|----------|
| **Reports** | Word (.docx) | `reports/SmartCane_Report_*.docx` | Agronomist / Farm manager |
| **Field Analysis** | Excel (.xlsx) | `reports/field_analysis_week*.xlsx` | Data analyst / Operations |
| **GeoTIFFs** | 5-band raster | `weekly_mosaic/{FIELD}/week_*.tif` | GIS systems |
| **Predictions** | CSV | `harvest_imminent_weekly.csv` (Python 31 output) | Harvest scheduling |
---
## File Storage Architecture
```
laravel_app/storage/app/{PROJECT}/
βββ merged_tif/
β βββ 2026-02-12.tif β Stage 00 output (Python download)
β βββ 2026-02-13.tif
β βββ 2026-02-19.tif
β
βββ field_tiles/ β Stage 10 output
β βββ Field_001/
β β βββ 2026-02-12.tif
β β βββ 2026-02-19.tif
β βββ Field_002/
β β βββ ...
β βββ ...
β
βββ field_tiles_CI/ β Stage 20 output
β βββ Field_001/
β β βββ 2026-02-12.tif (5-band with CI)
β β βββ 2026-02-19.tif
β βββ ...
β
βββ Data/
β βββ pivot.geojson β Input: field boundaries
β βββ harvest.xlsx β Input: harvest dates (Stage 30 requirement)
β βββ extracted_ci/
β β βββ daily_vals/
β β β βββ Field_001/2026-02-19.rds β Stage 20 output
β β βββ cumulative_vals/
β β βββ combined_CI_data.rds β Stage 20 output (wide format)
β β βββ All_pivots_Cumulative_CI_quadrant_year_v2.rds β Stage 30 output
β βββ growth_model_interpolated/ β Stage 30 output
β
βββ weekly_mosaic/ β Stage 40 output
β βββ Field_001/
β β βββ week_07_2026.tif (5-band, MAX-aggregated)
β β βββ week_06_2026.tif
β βββ ...
β
βββ reports/ β Stages 80/90/91 output
βββ SmartCane_Report_week07_2026.docx
βββ angata_field_analysis_week07_2026.xlsx
βββ kpis/
βββ angata_kpi_summary_tables_week07.rds
```
---
## Data Format Reference
### RDS Files (R Serialized Objects)
**combined_CI_data.rds** (Stage 20 output)
- Type: data.frame
- Rows: Field names
- Cols: ISO dates (YYYY-MM-DD)
- Values: Mean Canopy Index per field-date
**All_pivots_Cumulative_CI_quadrant_year_v2.rds** (Stage 30 output)
- Type: data.frame
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Scripts 80, 90/91, harvest prediction
**kpi_summary_tables_week{WW}.rds** (Stage 80 output)
- Type: list of data.frames
- Contains: Weekly KPI summaries for all fields
- Used by: Scripts 90/91 rendering
### GeoTIFF Bands
**merged_tif/{DATE}.tif** (Stage 00, 4-band)
- Band 1: Red
- Band 2: Green
- Band 3: Blue
- Band 4: NIR
**field_tiles_CI/{FIELD}/{DATE}.tif** (Stage 20, 5-band)
- Bands 1-4: R, G, B, NIR (uint16)
- Band 5: Canopy Index (float32)
**weekly_mosaic/{FIELD}/week_WW_YYYY.tif** (Stage 40, 5-band)
- Bands 1-4: R, G, B, NIR (uint16, MAX of week)
- Band 5: CI (float32, MAX of week)
---
## Next Steps
- See [CLIENT_TYPE_ARCHITECTURE.md](CLIENT_TYPE_ARCHITECTURE.md) for how agronomic_support and cane_supply types branch
- See [SOBIT_DEPLOYMENT.md](SOBIT_DEPLOYMENT.md) for Laravel queue orchestration
- See [DEV_LAPTOP_EXECUTION.md](DEV_LAPTOP_EXECUTION.md) for manual execution workflow