409 lines
13 KiB
Markdown
409 lines
13 KiB
Markdown
# SmartCane Data Flow Architecture
|
||
|
||
This diagram shows the complete pipeline from satellite imagery download through final report delivery, highlighting where Python and R interact and how data transforms at each stage.
|
||
|
||
## High-Level Data Flow
|
||
|
||
```mermaid
|
||
%% High-Level Data Flow
|
||
flowchart TD
|
||
A["🛰️ External Data Sources<br/>Planet API • GeoJSON • Harvest Data"]
|
||
|
||
B["🐍 Python Stage 00<br/>00_download_8band_pu_optimized.py"]
|
||
C["💾 4-Band TIFF<br/>merged_tif/{DATE}.tif<br/>RGB+NIR uint16"]
|
||
|
||
D["🔴 R Stage 10<br/>10_create_per_field_tiffs.R"]
|
||
E["💾 Per-Field Tiles<br/>field_tiles/{FIELD}/{DATE}.tif"]
|
||
|
||
F["🟢 R Stage 20<br/>20_ci_extraction_per_field.R"]
|
||
G["💾 CI Data<br/>field_tiles_CI/{FIELD}/{DATE}.tif<br/>+ combined_CI_data.rds"]
|
||
|
||
H["🟡 R Stage 30<br/>30_interpolate_growth_model.R"]
|
||
I["💾 Interpolated Model<br/>All_pivots_Cumulative_CI_quadrant_year_v2.rds"]
|
||
|
||
J["🟣 R Stage 40<br/>40_mosaic_creation_per_field.R"]
|
||
K["💾 Weekly Mosaics<br/>weekly_mosaic/{FIELD}/week_WW_YYYY.tif"]
|
||
|
||
L["🟠 R Stage 80<br/>80_calculate_kpis.R"]
|
||
M["💾 KPI Outputs<br/>Excel + RDS Summary"]
|
||
|
||
N["📄 R Stage 90/91<br/>RMarkdown Reporting"]
|
||
O["✅ Final Outputs<br/>Word Reports • Excel Tables • GeoTIFFs"]
|
||
|
||
A -->|Download| B
|
||
B -->|Save| C
|
||
C -->|Split| D
|
||
D -->|Save| E
|
||
E -->|Extract CI| F
|
||
F -->|Save| G
|
||
G -->|Interpolate| H
|
||
H -->|Save| I
|
||
I -->|Create Mosaic| J
|
||
J -->|Save| K
|
||
K -->|Calculate KPIs| L
|
||
L -->|Save| M
|
||
M -->|Render Report| N
|
||
N -->|Generate| O
|
||
```
|
||
|
||
## Stage-by-Stage Transformation
|
||
|
||
### Entry Point: External Data Sources
|
||
|
||
| Source | Format | Key File | Purpose |
|
||
|--------|--------|----------|---------|
|
||
| **Planet Labs API** | 4-band GeoTIFF (RGB+NIR) | Satellite imagery | Raw canopy reflectance |
|
||
| **Project GeoJSON** | GeoJSON polygons | `pivot.geojson` | Field boundary masks |
|
||
| **Harvest Records** | Excel spreadsheet | `harvest.xlsx` | Season date markers (optional for agronomic_support, required for cane_supply) |
|
||
|
||
**Storage Path**: `laravel_app/storage/app/{PROJECT}/Data/`
|
||
|
||
---
|
||
|
||
### Stage 00: Download (Python)
|
||
|
||
**Script**: `python_app/00_download_8band_pu_optimized.py`
|
||
|
||
**Inputs**:
|
||
- Planet API credentials (SentinelHub)
|
||
- Date range (YYYY-MM-DD format)
|
||
- Project ID (determines bounding box)
|
||
- Cloud masking threshold
|
||
|
||
**Key Processing**:
|
||
- Authenticates via SentinelHub SDK
|
||
- Downloads 4 bands (R, G, B, NIR) at 3m resolution
|
||
- Applies UDM1 cloud masking
|
||
- Merges all tiles for the day into single GeoTIFF
|
||
|
||
**Output Format**: 4-band uint16 GeoTIFF, ~150-300MB per date
|
||
```
|
||
laravel_app/storage/app/{PROJECT}/merged_tif/{YYYY-MM-DD}.tif
|
||
```
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Triggered via Laravel `ProjectDownloadTiffJob` queue
|
||
- **Dev Laptop**: Manual PowerShell command
|
||
```powershell
|
||
cd python_app
|
||
python 00_download_8band_pu_optimized.py angata --date 2026-02-19
|
||
```
|
||
|
||
---
|
||
|
||
### Stage 10: Per-Field Tile Creation (R)
|
||
|
||
**Script**: `r_app/10_create_per_field_tiffs.R`
|
||
|
||
**Inputs**:
|
||
- Merged 4-band TIFF: `merged_tif/{DATE}.tif`
|
||
- Field boundaries: `pivot.geojson`
|
||
|
||
**Key Processing**:
|
||
- Reads polygon geometries from GeoJSON
|
||
- Clips merged TIFF to each field boundary
|
||
- Preserves 4 bands (R, G, B, NIR) as uint16
|
||
- Handles edge pixels and overlaps
|
||
|
||
**Output Format**: Per-field 4-band TIFFs
|
||
```
|
||
laravel_app/storage/app/{PROJECT}/field_tiles/{FIELD}/{DATE}.tif
|
||
```
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via shell wrapper `10_planet_download.sh`
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/10_create_per_field_tiffs.R angata 2026-02-19 7
|
||
```
|
||
|
||
---
|
||
|
||
### Stage 20: CI Extraction (R)
|
||
|
||
**Script**: `r_app/20_ci_extraction_per_field.R`
|
||
|
||
**Inputs**:
|
||
- Per-field 4-band TIFFs: `field_tiles/{FIELD}/{DATE}.tif`
|
||
- Field boundaries: `pivot.geojson`
|
||
|
||
**Key Processing**:
|
||
- Calculates Canopy Index (CI) = (NIR / Green) - 1 for each pixel
|
||
- Extracts field-level statistics (mean, sd, min, max, pixel count)
|
||
- Handles clouds: CI=0 or NA when green band is absent
|
||
- Creates 5-band output: R, G, B, NIR, CI (float32 for CI band)
|
||
|
||
**Outputs**:
|
||
```
|
||
field_tiles_CI/{FIELD}/{DATE}.tif # 5-band daily per-field
|
||
Data/extracted_ci/daily_vals/{FIELD}/{DATE}.rds # Field stats RDS
|
||
Data/extracted_ci/cumulative_vals/combined_CI_data.rds # Wide RDS (fields × dates)
|
||
```
|
||
|
||
**Data Format** (combined_CI_data.rds):
|
||
- Rows: Field names
|
||
- Columns: Dates (YYYY-MM-DD)
|
||
- Values: Mean CI per field on that date
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via `20_ci_extraction.sh`
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/20_ci_extraction_per_field.R angata 2026-02-19 7
|
||
```
|
||
|
||
---
|
||
|
||
### Stage 30: Growth Model Interpolation (R)
|
||
|
||
**Script**: `r_app/30_interpolate_growth_model.R`
|
||
|
||
**Inputs**:
|
||
- Cumulative CI data: `combined_CI_data.rds` (from Stage 20)
|
||
- Harvest dates: `harvest.xlsx` (groups data into seasons)
|
||
|
||
**Key Processing**:
|
||
- Applies LOESS smoothing (span=0.3) to CI time series
|
||
- Interpolates missing dates (handles clouds: if entire field cloudy, skips date)
|
||
- Calculates daily CI changes and cumulative CI sums per season
|
||
- Groups by harvest season (defined in harvest.xlsx)
|
||
|
||
**Output Format**: Interpolated growth model (long format RDS)
|
||
```
|
||
Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
|
||
```
|
||
|
||
**Data Structure**:
|
||
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
|
||
- Used by: Stage 80 (trend analysis), harvest forecasting
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via `30_growth_model.sh`
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/30_interpolate_growth_model.R angata
|
||
```
|
||
|
||
---
|
||
|
||
### Stage 40: Weekly Mosaic Creation (R)
|
||
|
||
**Script**: `r_app/40_mosaic_creation_per_field.R`
|
||
|
||
**Inputs**:
|
||
- Daily per-field CI TIFFs: `field_tiles_CI/{FIELD}/{DATE1,2,3...}.tif` (week's dates)
|
||
- Week number and year
|
||
|
||
**Key Processing**:
|
||
- Reads all daily TIFFs for a given ISO week (Monday–Sunday)
|
||
- Applies MAX function per pixel across the week
|
||
- Max function handles clouds: picks highest (best) CI value visible during week
|
||
- Outputs 5-band composite: R, G, B, NIR, CI (float32)
|
||
|
||
**Output Format**: Per-field weekly mosaics
|
||
```
|
||
weekly_mosaic/{FIELD}/week_WW_YYYY.tif
|
||
```
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via `40_mosaic_creation.sh`
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/40_mosaic_creation_per_field.R 2026-02-19 7 angata
|
||
```
|
||
|
||
---
|
||
|
||
### Stage 80: KPI Calculation (R)
|
||
|
||
**Script**: `r_app/80_calculate_kpis.R`
|
||
|
||
**Inputs**:
|
||
- Current week mosaic: `weekly_mosaic/{FIELD}/week_WW_2026.tif`
|
||
- Previous weeks' mosaics (for trend analysis)
|
||
- Growth model data: `All_pivots_Cumulative_CI_quadrant_year_v2.rds`
|
||
- Field boundaries: `pivot.geojson`
|
||
- Harvest data: `harvest.xlsx`
|
||
|
||
**Key Processing**:
|
||
- **Client-type branching** (determined from project name):
|
||
- **agronomic_support** → Sources `80_utils_agronomic_support.R`
|
||
- Field uniformity KPI (CV + Moran's I)
|
||
- Area change KPI
|
||
- TCH forecast KPI
|
||
- Growth decline KPI
|
||
- Weed presence KPI
|
||
- Gap filling KPI
|
||
|
||
- **cane_supply** → Sources `80_utils_cane_supply.R`
|
||
- Per-field analysis (acreage, phase)
|
||
- Phase assignment (age-based: germination, tillering, grand growth, maturation)
|
||
- Harvest prediction (integrates Python 31 imminent_prob if available)
|
||
- Status triggers
|
||
|
||
**Outputs**:
|
||
```
|
||
reports/{PROJECT}_field_analysis_week{WW}_{YYYY}.xlsx # Excel - 21 columns, per-field
|
||
reports/kpis/{PROJECT}_kpi_summary_tables_week{WW}.rds # RDS - Summary for rendering
|
||
```
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via `80_calculate_kpis.sh`
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/80_calculate_kpis.R 2026-02-19 angata 7
|
||
```
|
||
|
||
---
|
||
|
||
### Stages 90/91: Report Rendering (R Markdown)
|
||
|
||
**Scripts**:
|
||
- `r_app/90_CI_report_with_kpis_agronomic_support.Rmd` (agronomic_support client type)
|
||
- `r_app/91_CI_report_with_kpis_cane_supply.Rmd` (cane_supply client type)
|
||
|
||
**Inputs**:
|
||
- Weekly mosaics: `weekly_mosaic/{FIELD}/week_*.tif`
|
||
- KPI summary: `kpi_summary_tables_week{WW}.rds`
|
||
- Field boundaries: `pivot.geojson`
|
||
- CI time series: `combined_CI_data.rds`
|
||
- Growth model predictions (Script 91 only)
|
||
|
||
**Key Processing**:
|
||
|
||
**Script 90 (Agronomic Support)**:
|
||
- Field uniformity trend plots (CV over 8 weeks)
|
||
- Spatial autocorrelation maps (Moran's I)
|
||
- Interactive field boundary map (tmap)
|
||
- Farm-level KPI averages
|
||
- Colorblind-friendly palette
|
||
|
||
**Script 91 (Cane Supply)**:
|
||
- Per-field status alerts (harvest readiness, stress)
|
||
- Phase assignment table
|
||
- Tonnage forecasts (CI curves × historical harvest)
|
||
- Age-based harvest window predictions
|
||
- Urgent/warning/opportunity alerts
|
||
|
||
**Output Format**: Microsoft Word (.docx) with embedded tables, images, charts
|
||
```
|
||
reports/SmartCane_Report_week{WW}_{YYYY}.docx
|
||
```
|
||
|
||
**Execution Context**:
|
||
- **SOBIT**: Via `90_kpi_report.sh` (calls rmarkdown::render)
|
||
- **Dev Laptop**:
|
||
```powershell
|
||
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" -e `
|
||
"rmarkdown::render('r_app/90_CI_report_with_kpis_agronomic_support.Rmd', `
|
||
params=list(data_dir='angata', report_date=as.Date('2026-02-19')), `
|
||
output_file='SmartCane_Report_week07_2026.docx', `
|
||
output_dir='laravel_app/storage/app/angata/reports')"
|
||
```
|
||
|
||
---
|
||
|
||
## Exit Points: User-Facing Outputs
|
||
|
||
| Output Type | Format | Location | Audience |
|
||
|-------------|--------|----------|----------|
|
||
| **Reports** | Word (.docx) | `reports/SmartCane_Report_*.docx` | Agronomist / Farm manager |
|
||
| **Field Analysis** | Excel (.xlsx) | `reports/field_analysis_week*.xlsx` | Data analyst / Operations |
|
||
| **GeoTIFFs** | 5-band raster | `weekly_mosaic/{FIELD}/week_*.tif` | GIS systems |
|
||
| **Predictions** | CSV | `harvest_imminent_weekly.csv` (Python 31 output) | Harvest scheduling |
|
||
|
||
---
|
||
|
||
## File Storage Architecture
|
||
|
||
```
|
||
laravel_app/storage/app/{PROJECT}/
|
||
├── merged_tif/
|
||
│ ├── 2026-02-12.tif ← Stage 00 output (Python download)
|
||
│ ├── 2026-02-13.tif
|
||
│ └── 2026-02-19.tif
|
||
│
|
||
├── field_tiles/ ← Stage 10 output
|
||
│ ├── Field_001/
|
||
│ │ ├── 2026-02-12.tif
|
||
│ │ └── 2026-02-19.tif
|
||
│ ├── Field_002/
|
||
│ │ └── ...
|
||
│ └── ...
|
||
│
|
||
├── field_tiles_CI/ ← Stage 20 output
|
||
│ ├── Field_001/
|
||
│ │ ├── 2026-02-12.tif (5-band with CI)
|
||
│ │ └── 2026-02-19.tif
|
||
│ └── ...
|
||
│
|
||
├── Data/
|
||
│ ├── pivot.geojson ← Input: field boundaries
|
||
│ ├── harvest.xlsx ← Input: harvest dates (Stage 30 requirement)
|
||
│ ├── extracted_ci/
|
||
│ │ ├── daily_vals/
|
||
│ │ │ └── Field_001/2026-02-19.rds ← Stage 20 output
|
||
│ │ └── cumulative_vals/
|
||
│ │ ├── combined_CI_data.rds ← Stage 20 output (wide format)
|
||
│ │ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds ← Stage 30 output
|
||
│ └── growth_model_interpolated/ ← Stage 30 output
|
||
│
|
||
├── weekly_mosaic/ ← Stage 40 output
|
||
│ ├── Field_001/
|
||
│ │ ├── week_07_2026.tif (5-band, MAX-aggregated)
|
||
│ │ └── week_06_2026.tif
|
||
│ └── ...
|
||
│
|
||
└── reports/ ← Stages 80/90/91 output
|
||
├── SmartCane_Report_week07_2026.docx
|
||
├── angata_field_analysis_week07_2026.xlsx
|
||
└── kpis/
|
||
└── angata_kpi_summary_tables_week07.rds
|
||
```
|
||
|
||
---
|
||
|
||
## Data Format Reference
|
||
|
||
### RDS Files (R Serialized Objects)
|
||
|
||
**combined_CI_data.rds** (Stage 20 output)
|
||
- Type: data.frame
|
||
- Rows: Field names
|
||
- Cols: ISO dates (YYYY-MM-DD)
|
||
- Values: Mean Canopy Index per field-date
|
||
|
||
**All_pivots_Cumulative_CI_quadrant_year_v2.rds** (Stage 30 output)
|
||
- Type: data.frame
|
||
- Columns: field_name, date, interpolated_ci, daily_change, cumulative_ci, season, phase
|
||
- Used by: Scripts 80, 90/91, harvest prediction
|
||
|
||
**kpi_summary_tables_week{WW}.rds** (Stage 80 output)
|
||
- Type: list of data.frames
|
||
- Contains: Weekly KPI summaries for all fields
|
||
- Used by: Scripts 90/91 rendering
|
||
|
||
### GeoTIFF Bands
|
||
|
||
**merged_tif/{DATE}.tif** (Stage 00, 4-band)
|
||
- Band 1: Red
|
||
- Band 2: Green
|
||
- Band 3: Blue
|
||
- Band 4: NIR
|
||
|
||
**field_tiles_CI/{FIELD}/{DATE}.tif** (Stage 20, 5-band)
|
||
- Bands 1-4: R, G, B, NIR (uint16)
|
||
- Band 5: Canopy Index (float32)
|
||
|
||
**weekly_mosaic/{FIELD}/week_WW_YYYY.tif** (Stage 40, 5-band)
|
||
- Bands 1-4: R, G, B, NIR (uint16, MAX of week)
|
||
- Band 5: CI (float32, MAX of week)
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
- See [CLIENT_TYPE_ARCHITECTURE.md](CLIENT_TYPE_ARCHITECTURE.md) for how agronomic_support and cane_supply types branch
|
||
- See [SOBIT_DEPLOYMENT.md](SOBIT_DEPLOYMENT.md) for Laravel queue orchestration
|
||
- See [DEV_LAPTOP_EXECUTION.md](DEV_LAPTOP_EXECUTION.md) for manual execution workflow
|