SmartCane/webapps/docs/ARCHITECTURE_DATA_FLOW.md

409 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SmartCane Data Flow Architecture
This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage.
## High-Level Data Flow
```mermaid
%% High-Level Data Flow
flowchart TD
A["🛰️ External Data Sources<br/>Planet API • GeoJSON • Harvest Data"]
B["🐍 Python Stage 00<br/>00_download_8band_pu_optimized.py"]
C["💾 4-Band TIFF<br/>merged_tif/{DATE}.tif<br/>RGB+NIR uint16"]
D["🔴 R Stage 10<br/>10_create_per_field_tiffs.R"]
E["💾 Per-Field Tiles<br/>field_tiles/{FIELD}/{DATE}.tif"]
F["🟢 R Stage 20<br/>20_ci_extraction_per_field.R"]
G["💾 CI Data<br/>field_tiles_CI/{FIELD}/{DATE}.tif<br/>+ combined_CI_data.rds"]
H["🟡 R Stage 30<br/>30_interpolate_growth_model.R"]
I["💾 Interpolated Model<br/>All_pivots_Cumulative_CI_quadrant_year_v2.rds"]
J["🟣 R Stage 40<br/>40_mosaic_creation_per_field.R"]
K["💾 Weekly Mosaics<br/>weekly_mosaic/{FIELD}/week_WW_YYYY.tif"]
L["🟠 R Stage 80<br/>80_calculate_kpis.R"]
M["💾 KPI Outputs<br/>Excel + RDS Summary"]
N["📄 R Stage 90/91<br/>RMarkdown Reporting"]
O["✅ Final Outputs<br/>Word Reports • Excel Tables • GeoTIFFs"]
A -->|Download| B
B -->|Save| C
C -->|Split| D
D -->|Save| E
E -->|Extract CI| F
F -->|Save| G
G -->|Interpolate| H
H -->|Save| I
I -->|Create Mosaic| J
J -->|Save| K
K -->|Calculate KPIs| L
L -->|Save| M
M -->|Render Report| N
N -->|Generate| O
```
## Stage-by-Stage Transformation
### Entry Point: External Data Sources
| Source | Format | Key File | Purpose |
|--------|--------|----------|---------|
| **Planet Labs API** | 4-band GeoTIFF (RGB+NIR) | Satellite imagery | Raw canopy reflectance |
| **Project GeoJSON** | GeoJSON polygons | `pivot.geojson` | Field boundary masks |
| **Harvest Records** | Excel spreadsheet | `harvest.xlsx` | Season date markers (optional for agronomic_support, required for cane_supply) |
**Storage Path**: `laravel_app/storage/app/{PROJECT}/Data/`
---
### Stage 00: Download (Python)
**Script**: `python_app/00_download_8band_pu_optimized.py`
**Inputs**:
- Planet API credentials (SentinelHub)
- Date range (YYYY-MM-DD format)
- Project ID (determines bounding box)
- Cloud masking threshold
**Key Processing**:
- Authenticates via SentinelHub SDK
- Downloads 4 bands (R, G, B, NIR) at 3m resolution
- Applies UDM1 cloud masking
- Merges all tiles for the day into single GeoTIFF
**Output Format**: 4-band uint16 GeoTIFF, ~150-300MB per date
```
laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif
```
**Execution Context**:
- **SOBIT**: Triggered via Laravel `ProjectDownloadTiffJob` queue
- **Dev Laptop**: Manual PowerShell command
```powershell
cd python_app
python 00_download_8band_pu_optimized.py angata --date 2026-02-19
```
---
### Stage 10: Per-Field Tile Creation (R)
**Script**: `r_app/10_create_per_field_tiffs.R`
**Inputs**:
- Merged 4-band TIFF: `merged_tif/{DATE}.tif`
- Field boundaries: `pivot.geojson`
**Key Processing**:
- Reads polygon geometries from GeoJSON
- Clips merged TIFF to each field boundary
- Preserves 4 bands (R, G, B, NIR) as uint16
- Handles edge pixels and overlaps
**Output Format**: Per-field 4-band TIFFs
```
laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif
```
**Execution Context**:
- **SOBIT**: Via shell wrapper `10_planet_download.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7
```
---
### Stage 20: CI Extraction (R)
**Script**: `r_app/20_ci_extraction_per_field.R`
**Inputs**:
- Per-field 4-band TIFFs: `field_tiles/{FIELD}/{DATE}.tif`
- Field boundaries: `pivot.geojson`
**Key Processing**:
- Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel
- Extracts field-level statistics (mean, sd, min, max, pixel count)
- Handles clouds: CI=0 or NA when green band is absent
- Creates 5-band output: R, G, B, NIR, CI (float32 for CI band)
**Outputs**:
```
field_tiles_CI/{FIELD}/{DATE}.tif # 5-band daily per-field
Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds # Field stats RDS
Data/extracted_ci/cumulative_vals/combined_CI_data.rds # Wide RDS (fields × dates)
```
**Data Format** (combined_CI_data.rds):
- Rows: Field names
- Columns: Dates (YYYY-MM-DD)
- Values: Mean CI per field on that date
**Execution Context**:
- **SOBIT**: Via `20_ci_extraction.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7
```
---
### Stage 30: Growth Model Interpolation (R)
**Script**: `r_app/30_interpolate_growth_model.R`
**Inputs**:
- Cumulative CI data: `combined_CI_data.rds` (from Stage 20)
- Harvest dates: `harvest.xlsx` (groups data into seasons)
**Key Processing**:
- Applies LOESS smoothing (span=0.3) to CI time series
- Interpolates missing dates (handles clouds: if entire field cloudy, skips date)
- Calculates daily CI changes and cumulative CI sums per season
- Groups by harvest season (defined in harvest.xlsx)
**Output Format**: Interpolated growth model (long format RDS)
```
Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
```
**Data Structure**:
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Stage 80 (trend analysis), harvest forecasting
**Execution Context**:
- **SOBIT**: Via `30_growth_model.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata
```
---
### Stage 40: Weekly Mosaic Creation (R)
**Script**: `r_app/40_mosaic_creation_per_field.R`
**Inputs**:
- Daily per-field CI TIFFs: `field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif` (week's dates)
- Week number and year
**Key Processing**:
- Reads all daily TIFFs for a given ISO week (MondaySunday)
- Applies MAX function per pixel across the week
- Max function handles clouds: picks highest (best) CI value visible during week
- Outputs 5-band composite: R, G, B, NIR, CI (float32)
**Output Format**: Per-field weekly mosaics
```
weekly_mosaic/{FIELD}/week_WW_YYYY.tif
```
**Execution Context**:
- **SOBIT**: Via `40_mosaic_creation.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata
```
---
### Stage 80: KPI Calculation (R)
**Script**: `r_app/80_calculate_kpis.R`
**Inputs**:
- Current week mosaic: `weekly_mosaic/{FIELD}/week_WW_2026.tif`
- Previous weeks' mosaics (for trend analysis)
- Growth model data: `All_pivots_Cumulative_CI_quadrant_year_v2.rds`
- Field boundaries: `pivot.geojson`
- Harvest data: `harvest.xlsx`
**Key Processing**:
- **Client-type branching** (determined from project name):
- **agronomic_support** → Sources `80_utils_agronomic_support.R`
- Field uniformity KPI (CV + Moran's I)
- Area change KPI
- TCH forecast KPI
- Growth decline KPI
- Weed presence KPI
- Gap filling KPI
- **cane_supply** → Sources `80_utils_cane_supply.R`
- Per-field analysis (acreage, phase)
- Phase assignment (age-based: germination, tillering, grand growth, maturation)
- Harvest prediction (integrates Python 31 imminent_prob if available)
- Status triggers
**Outputs**:
```
reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx # Excel - 21 columns, per-field
reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds # RDS - Summary for rendering
```
**Execution Context**:
- **SOBIT**: Via `80_calculate_kpis.sh`
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7
```
---
### Stages 90/91: Report Rendering (R Markdown)
**Scripts**:
- `r_app/90_CI_report_with_kpis_agronomic_support.Rmd` (agronomic_support client type)
- `r_app/91_CI_report_with_kpis_cane_supply.Rmd` (cane_supply client type)
**Inputs**:
- Weekly mosaics: `weekly_mosaic/{FIELD}/week_*.tif`
- KPI summary: `kpi_summary_tables_week{WW}.rds`
- Field boundaries: `pivot.geojson`
- CI time series: `combined_CI_data.rds`
- Growth model predictions (Script 91 only)
**Key Processing**:
**Script 90 (Agronomic Support)**:
- Field uniformity trend plots (CV over 8 weeks)
- Spatial autocorrelation maps (Moran's I)
- Interactive field boundary map (tmap)
- Farm-level KPI averages
- Colorblind-friendly palette
**Script 91 (Cane Supply)**:
- Per-field status alerts (harvest readiness, stress)
- Phase assignment table
- Tonnage forecasts (CI curves × historical harvest)
- Age-based harvest window predictions
- Urgent/warning/opportunity alerts
**Output Format**: Microsoft Word (.docx) with embedded tables, images, charts
```
reports/SmartCane_Report_week{WW}_{YYYY}.docx
```
**Execution Context**:
- **SOBIT**: Via `90_kpi_report.sh` (calls rmarkdown::render)
- **Dev Laptop**:
```powershell
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e `
"rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', `
params=list(data_dir='angata', report_date=as.Date('2026-02-19')), `
output_file='SmartCane_Report_week07_2026.docx', `
output_dir='laravel_app/storage/app/angata/reports')"
```
---
## Exit Points: User-Facing Outputs
| Output Type | Format | Location | Audience |
|-------------|--------|----------|----------|
| **Reports** | Word (.docx) | `reports/SmartCane_Report_*.docx` | Agronomist / Farm manager |
| **Field Analysis** | Excel (.xlsx) | `reports/field_analysis_week*.xlsx` | Data analyst / Operations |
| **GeoTIFFs** | 5-band raster | `weekly_mosaic/{FIELD}/week_*.tif` | GIS systems |
| **Predictions** | CSV | `harvest_imminent_weekly.csv` (Python 31 output) | Harvest scheduling |
---
## File Storage Architecture
```
laravel_app/storage/app/{PROJECT}/
├── merged_tif/
│ ├── 2026-02-12.tif ← Stage 00 output (Python download)
│ ├── 2026-02-13.tif
│ └── 2026-02-19.tif
├── field_tiles/ ← Stage 10 output
│ ├── Field_001/
│ │ ├── 2026-02-12.tif
│ │ └── 2026-02-19.tif
│ ├── Field_002/
│ │ └── ...
│ └── ...
├── field_tiles_CI/ ← Stage 20 output
│ ├── Field_001/
│ │ ├── 2026-02-12.tif (5-band with CI)
│ │ └── 2026-02-19.tif
│ └── ...
├── Data/
│ ├── pivot.geojson ← Input: field boundaries
│ ├── harvest.xlsx ← Input: harvest dates (Stage 30 requirement)
│ ├── extracted_ci/
│ │ ├── daily_vals/
│ │ │ └── Field_001/2026-02-19.rds ← Stage 20 output
│ │ └── cumulative_vals/
│ │ ├── combined_CI_data.rds ← Stage 20 output (wide format)
│ │ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds ← Stage 30 output
│ └── growth_model_interpolated/ ← Stage 30 output
├── weekly_mosaic/ ← Stage 40 output
│ ├── Field_001/
│ │ ├── week_07_2026.tif (5-band, MAX-aggregated)
│ │ └── week_06_2026.tif
│ └── ...
└── reports/ ← Stages 80/90/91 output
├── SmartCane_Report_week07_2026.docx
├── angata_field_analysis_week07_2026.xlsx
└── kpis/
└── angata_kpi_summary_tables_week07.rds
```
---
## Data Format Reference
### RDS Files (R Serialized Objects)
**combined_CI_data.rds** (Stage 20 output)
- Type: data.frame
- Rows: Field names
- Cols: ISO dates (YYYY-MM-DD)
- Values: Mean Canopy Index per field-date
**All_pivots_Cumulative_CI_quadrant_year_v2.rds** (Stage 30 output)
- Type: data.frame
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
- Used by: Scripts 80, 90/91, harvest prediction
**kpi_summary_tables_week{WW}.rds** (Stage 80 output)
- Type: list of data.frames
- Contains: Weekly KPI summaries for all fields
- Used by: Scripts 90/91 rendering
### GeoTIFF Bands
**merged_tif/{DATE}.tif** (Stage 00, 4-band)
- Band 1: Red
- Band 2: Green
- Band 3: Blue
- Band 4: NIR
**field_tiles_CI/{FIELD}/{DATE}.tif** (Stage 20, 5-band)
- Bands 1-4: R, G, B, NIR (uint16)
- Band 5: Canopy Index (float32)
**weekly_mosaic/{FIELD}/week_WW_YYYY.tif** (Stage 40, 5-band)
- Bands 1-4: R, G, B, NIR (uint16, MAX of week)
- Band 5: CI (float32, MAX of week)
---
## Next Steps
- See [CLIENT_TYPE_ARCHITECTURE.md](CLIENT_TYPE_ARCHITECTURE.md) for how agronomic_support and cane_supply types branch
- See [SOBIT_DEPLOYMENT.md](SOBIT_DEPLOYMENT.md) for Laravel queue orchestration
- See [DEV_LAPTOP_EXECUTION.md](DEV_LAPTOP_EXECUTION.md) for manual execution workflow