# SmartCane Processing Pipeline - Complete Script Overview
## Pipeline Execution Order
## Complete Pipeline Mermaid Diagram
```mermaid
%% Complete Pipeline
graph TD
%% ===== INPUTS =====
API["π Planet API
Credentials"]
GeoJSON["πΊοΈ pivot.geojson
(Field Boundaries)"]
HarvestIn["π harvest.xlsx
(from Stage 23)"]
%% ===== STAGE 00: DOWNLOAD =====
Stage00["Stage 00: Python
00_download_8band_pu_optimized.py"]
Out00["π¦ merged_tif/
YYYY-MM-DD.tif
(4-band or 8-band)
(configurable)"]
%% ===== STAGE 10: OPTIONAL TILING =====
Stage10["Stage 10: R
10_create_per_field_tiffs.R
(Per-field extraction)"]
Out10["π¦ daily_tiles_split/per_field/
YYYY-MM-DD/*.tif
(one per field)"]
%% ===== STAGE 20: CI EXTRACTION =====
Stage20["Stage 20: R
20_ci_extraction.R"]
Out20a["π¦ combined_CI_data.rds
(wide: fields Γ dates)"]
Out20b["π¦ daily RDS files
(per-date stats)"]
%% ===== STAGE 21: RDS β CSV =====
Stage21["Stage 21: R
21_convert_ci_rds_to_csv.R"]
Out21["π¦ ci_data_for_python.csv
(long format + DOY)"]
%% ===== STAGE 22: BASELINE HARVEST =====
Stage22["Stage 22: Python
22_harvest_baseline_prediction.py
(RUN ONCE)"]
Out22["π¦ harvest_production_export.xlsx
(baseline predictions)"]
%% ===== STAGE 23: HARVEST FORMAT =====
Stage23["Stage 23: Python
23_convert_harvest_format.py"]
Out23["π¦ harvest.xlsx
(standard format)
β Feeds back to Stage 80"]
%% ===== STAGE 30: GROWTH MODEL =====
Stage30["Stage 30: R
30_interpolate_growth_model.R"]
Out30["π¦ All_pivots_Cumulative_CI...
_quadrant_year_v2.rds
(interpolated daily)"]
%% ===== STAGE 31: WEEKLY HARVEST =====
Stage31["Stage 31: Python
31_harvest_imminent_weekly.py
(Weekly)"]
Out31["π¦ harvest_imminent_weekly.csv
(probabilities)"]
%% ===== STAGE 40: MOSAIC =====
Stage40["Stage 40: R
40_mosaic_creation.R"]
Out40["π¦ weekly_mosaic/
week_WW_YYYY.tif
(5-band composite)"]
%% ===== STAGE 80: KPI =====
Stage80["Stage 80: R
80_calculate_kpis.R"]
Out80a["π¦ field_analysis_week{WW}.xlsx"]
Out80b["π¦ kpi_summary_tables_week{WW}.rds"]
%% ===== STAGE 90: REPORT =====
Stage90["Stage 90: R/RMarkdown
90_CI_report_with_kpis_simple.Rmd"]
Out90["π¦ SmartCane_Report_week{WW}_{YYYY}.docx
(FINAL OUTPUT)"]
%% ===== CONNECTIONS: INPUTS TO STAGE 00 =====
API --> Stage00
GeoJSON --> Stage00
%% ===== STAGE 00 β 10 OR 20 =====
Stage00 --> Out00
Out00 --> Stage10
Out00 --> Stage20
%% ===== STAGE 10 β 20 =====
Stage10 --> Out10
Out10 --> Stage20
%% ===== STAGE 20 β 21, 30, 40 =====
GeoJSON --> Stage20
Stage20 --> Out20a
Stage20 --> Out20b
Out20a --> Stage21
Out20a --> Stage30
Out00 --> Stage40
%% ===== STAGE 21 β 22, 31 =====
Stage21 --> Out21
Out21 --> Stage22
Out21 --> Stage31
%% ===== STAGE 22 β 23 =====
Stage22 --> Out22
Out22 --> Stage23
%% ===== STAGE 23 β 80 & FEEDBACK =====
Stage23 --> Out23
Out23 -.->|"Feeds back
(Season context)"| Stage80
%% ===== STAGE 30 β 80 =====
Stage30 --> Out30
Out30 --> Stage80
%% ===== STAGE 31 (PARALLEL) =====
Stage31 --> Out31
%% ===== STAGE 40 β 80, 90 =====
Stage40 --> Out40
Out40 --> Stage80
Out40 --> Stage90
%% ===== STAGE 80 β 90 =====
Stage80 --> Out80a
Stage80 --> Out80b
Out80a --> Stage90
Out80b --> Stage90
%% ===== STAGE 90 FINAL =====
Stage90 --> Out90
%% ===== ADDITIONAL INPUTS =====
HarvestIn --> Stage30
HarvestIn --> Stage80
GeoJSON --> Stage30
GeoJSON --> Stage40
GeoJSON --> Stage80
%% ===== STYLING =====
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef pyStage fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef rStage fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef output fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef finalOutput fill:#ffebee,stroke:#c62828,stroke-width:3px
class API,GeoJSON,HarvestIn input
class Stage00,Stage22,Stage23,Stage31 pyStage
class Stage10,Stage20,Stage21,Stage30,Stage40,Stage80,Stage90 rStage
class Out00,Out10,Out20a,Out20b,Out21,Out22,Out30,Out31,Out40,Out80a,Out80b output
class Out23,Out90 finalOutput
```
---
## Detailed Stage Descriptions
```
Stage 00: PYTHON - Download Satellite Data
ββ 00_download_8band_pu_optimized.py
INPUT: Planet API credentials, field boundaries (pivot.geojson), date range
OUTPUT: laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif (4-band or 8-band)
RUN FREQUENCY: Daily or as-needed
NOTES: Download script configures band count; consolidates to single merged_tif/ folder
Stage 10: R - Create Per-Field Daily Tiles
ββ 10_create_per_field_tiffs.R
INPUT: Daily GeoTIFFs from merged_tif/
Field boundaries (pivot.geojson)
OUTPUT: laravel_app/storage/app/{project}/daily_tiles_split/per_field/{YYYY-MM-DD}/*.tif
RUN FREQUENCY: Optional - per-field extraction for efficient memory use
NOTES: Creates one GeoTIFF per field per day
Stage 20: R - Extract Canopy Index (CI) from Daily Imagery
ββ 20_ci_extraction_per_field.R
INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/)
Field boundaries (pivot.geojson)
OUTPUT: RDS files:
- laravel_app/storage/app/{project}/Data/extracted_ci/daily_vals/extracted_{YYYY-MM-DD}_{suffix}.rds
- laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/combined_CI_data.rds (wide format)
RUN FREQUENCY: Daily or on-demand
COMMAND: Rscript 20_ci_extraction_per_field.R [end_date] [offset] [project_dir] [data_source]
EXAMPLE: Rscript 20_ci_extraction_per_field.R 2026-01-02 7 angata merged_tif
NOTES: Auto-detects per-field tiles if daily_tiles_split/per_field/ exists; outputs cumulative CI (fields Γ dates)
Stage 21: R - Convert CI RDS to CSV for Python Harvest Detection
ββ 21_convert_ci_rds_to_csv.R
INPUT: combined_CI_data.rds (from Stage 20)
OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/ci_data_for_python/ci_data_for_python.csv
RUN FREQUENCY: After Stage 20
COMMAND: Rscript 21_convert_ci_rds_to_csv.R [project_dir]
EXAMPLE: Rscript 21_convert_ci_rds_to_csv.R angata
NOTES: Converts wide RDS (fields Γ dates) to long CSV; interpolates missing dates; adds DOY column
Stage 22: PYTHON - Baseline Harvest Prediction (LSTM Model 307)
ββ 22_harvest_baseline_prediction.py
INPUT: ci_data_for_python.csv (complete historical CI data)
OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_production_export.xlsx
RUN FREQUENCY: ONCE - establishes ground truth baseline for all fields
COMMAND: python 22_harvest_baseline_prediction.py [project_name]
EXAMPLE: python 22_harvest_baseline_prediction.py angata
NOTES: Two-step detection (Phase 1: growing window, Phase 2: Β±40 day argmax refinement)
Tuned parameters: threshold=0.3, consecutive_days=2
Uses LSTM Model 307 dual output heads (imminent + detected)
Stage 23: PYTHON - Convert Harvest Format to Standard Structure
ββ 23_convert_harvest_format.py
INPUT: harvest_production_export.xlsx (from Stage 22)
CI data date range (determines season_start for first season)
OUTPUT: laravel_app/storage/app/{project}/Data/harvest.xlsx (standard format)
RUN FREQUENCY: After Stage 22
COMMAND: python 23_convert_harvest_format.py [project_name]
EXAMPLE: python 23_convert_harvest_format.py angata
NOTES: Converts to standard harvest.xlsx format with columns:
field, sub_field, year, season, season_start, season_end, age, sub_area, tonnage_ha
Season format: "Data{year} : {field}"
Only includes completed seasons (with season_end filled)
Stage 30: R - Growth Model Interpolation (Smooth CI Time Series)
ββ 30_interpolate_growth_model.R
INPUT: combined_CI_data.rds (from Stage 20)
harvest.xlsx (optional, for seasonal context)
OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/
All_pivots_Cumulative_CI_quadrant_year_v2.rds
RUN FREQUENCY: Weekly or after CI extraction updates
COMMAND: Rscript 30_interpolate_growth_model.R [project_dir]
EXAMPLE: Rscript 30_interpolate_growth_model.R angata
NOTES: Linear interpolation across gaps; calculates daily change and cumulative CI
Outputs long-format data (Date, DOY, field, value, season, etc.)
Stage 31: PYTHON - Weekly Harvest Monitoring (Real-Time Alerts)
ββ 31_harvest_imminent_weekly.py
INPUT: ci_data_for_python.csv (recent CI data, last ~300 days)
harvest_production_export.xlsx (optional baseline reference)
OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_imminent_weekly.csv
RUN FREQUENCY: Weekly or daily for operational alerts
COMMAND: python 31_harvest_imminent_weekly.py [project_name]
EXAMPLE: python 31_harvest_imminent_weekly.py angata
NOTES: Single-run inference on recent data; outputs probabilities (imminent_prob, detected_prob)
Used for real-time decision support; compared against baseline from Stage 22
Stage 40: R - Create Weekly 5-Band Mosaics
ββ 40_mosaic_creation_per_field.R
INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/)
Field boundaries (pivot.geojson)
OUTPUT: laravel_app/storage/app/{project}/weekly_mosaic/week_{WW}_{YYYY}.tif
RUN FREQUENCY: Weekly
COMMAND: Rscript 40_mosaic_creation_per_field.R [end_date] [offset] [project_dir]
EXAMPLE: Rscript 40_mosaic_creation_per_field.R 2026-01-14 7 angata
NOTES: Composites daily images using MAX function; 5 bands (R, G, B, NIR, CI)
Automatically selects images with acceptable cloud coverage
Output uses ISO week numbering (week_WW_YYYY)
Stage 80: R - Calculate KPIs & Per-Field Analysis
ββ 80_calculate_kpis.R
INPUT: Weekly mosaic (from Stage 40)
Growth model data (from Stage 30)
Field boundaries (pivot.geojson)
Harvest data (harvest.xlsx)
OUTPUT: laravel_app/storage/app/{project}/reports/
- {project}_field_analysis_week{WW}.xlsx
- {project}_kpi_summary_tables_week{WW}.rds
RUN FREQUENCY: Weekly
COMMAND: Rscript 80_calculate_kpis.R [end_date] [project_dir] [offset_days]
EXAMPLE: Rscript 80_calculate_kpis.R 2026-01-14 angata 7
NOTES: Parallel processing for 1000+ fields; calculates:
- Per-field uniformity (CV), phase assignment, growth trends
- Status triggers (germination, rapid growth, disease, harvest imminence)
- Farm-level KPI metrics (6 high-level indicators)
TEST_MODE=TRUE uses only recent weeks for development
Stage 90: R (RMarkdown) - Generate Executive Report (Word Document)
ββ 90_CI_report_with_kpis_simple.Rmd
INPUT: Weekly mosaic (from Stage 40)
KPI summary data (from Stage 80)
Field analysis (from Stage 80)
Field boundaries & harvest data (for context)
OUTPUT: laravel_app/storage/app/{project}/reports/
SmartCane_Report_week{WW}_{YYYY}.docx (PRIMARY OUTPUT)
SmartCane_Report_week{WW}_{YYYY}.html (optional)
RUN FREQUENCY: Weekly
RENDERING: R/RMarkdown with officer + flextable packages
NOTES: Executive summary with KPI overview, phase distribution, status triggers
Field-by-field detail pages with CI metrics and interpretation guides
Automatic unit conversion (hectares β acres)
```
---
## Data Storage & Persistence
All data persists to the file system. No database writes occur during analysisβreads only for metadata.
```
laravel_app/storage/app/{project}/
βββ Data/
β βββ pivot.geojson # Field boundaries (read-only input)
β βββ harvest.xlsx # Season dates & yield (standard format from Stage 23)
β βββ vrt/ # Virtual raster files (daily VRTs from Stage 20)
β β βββ YYYY-MM-DD.vrt
β βββ extracted_ci/
β β βββ ci_data_for_python/
β β β βββ ci_data_for_python.csv # CSV for Python (from Stage 21)
β β βββ daily_vals/
β β β βββ extracted_YYYY-MM-DD_{suffix}.rds # Daily field CI stats (from Stage 20)
β β βββ cumulative_vals/
β β βββ combined_CI_data.rds # Cumulative CI, wide format (from Stage 20)
β β βββ All_pivots_Cumulative_CI_quadrant_year_v2.rds # Interpolated daily (from Stage 30)
β βββ HarvestData/
β βββ harvest_production_export.xlsx # Baseline harvest predictions (from Stage 22)
β βββ harvest_imminent_weekly.csv # Weekly monitoring output (from Stage 31)
β
βββ merged_tif/ # Raw satellite imagery (Stage 00 output)
β βββ YYYY-MM-DD.tif # 4-band or 8-band (configurable via download script)
β
βββ daily_tiles_split/ # (Optional) Per-field tile processing (Stage 10 output)
β βββ per_field/
β β βββ YYYY-MM-DD/ # Date-specific folder
β β βββ {FIELD}_YYYY-MM-DD.tif # One per-field GeoTIFF per day
β
βββ weekly_mosaic/ # Weekly composite mosaics (Stage 40 output)
β βββ week_WW_YYYY.tif # 5 bands: R, G, B, NIR, CI (composite)
β
βββ reports/ # Analysis outputs & reports (Stage 80, 90 outputs)
βββ SmartCane_Report_week{WW}_{YYYY}.docx # FINAL REPORT (Stage 90)
βββ SmartCane_Report_week{WW}_{YYYY}.html # Alternative format
βββ {project}_field_analysis_week{WW}.xlsx # Field-by-field data (Stage 80)
βββ {project}_kpi_summary_tables_week{WW}.rds # Summary RDS (Stage 80)
βββ kpis/
βββ week_WW_YYYY/ # Week-specific KPI folder
```
---
## Key File Formats
| Format | Stage | Purpose | Example |
|--------|-------|---------|---------|
| `.tif` (GeoTIFF) | 00, 10, 40 | Geospatial raster imagery | `2026-01-14.tif` (4-band), `week_02_2026.tif` (5-band) |
| `.vrt` (Virtual Raster) | 20 | Virtual pointer to TIFFs | `2026-01-14.vrt` |
| `.rds` (R Binary) | 20, 21, 30, 80 | R serialized data objects | `combined_CI_data.rds`, `All_pivots_Cumulative_CI_quadrant_year_v2.rds` |
| `.csv` (Comma-Separated) | 21, 31 | Tabular data for Python | `ci_data_for_python.csv`, `harvest_imminent_weekly.csv` |
| `.xlsx` (Excel) | 22, 23, 80 | Tabular reports & harvest data | `harvest.xlsx`, `harvest_production_export.xlsx`, field analysis |
| `.docx` (Word) | 90 | Executive report (final output) | `SmartCane_Report_week02_2026.docx` |
| `.json` | 10 | Tiling metadata | `tiling_config.json` |
| `.geojson` | Input | Field boundaries (read-only) | `pivot.geojson` |
---
## Script Dependencies & Utility Files
```
parameters_project.R
ββ Loaded by: 20_ci_extraction.R, 30_interpolate_growth_model.R,
β 40_mosaic_creation.R, 80_calculate_kpis.R, 90_CI_report_with_kpis_simple.Rmd
ββ Purpose: Initializes project config (paths, field boundaries, harvest data)
harvest_date_pred_utils.py
ββ Used by: 22_harvest_baseline_prediction.py, 23_convert_harvest_format.py, 31_harvest_imminent_weekly.py
ββ Purpose: LSTM model loading, feature extraction, two-step harvest detection
20_ci_extraction_utils.R
ββ Used by: 20_ci_extraction.R
ββ Purpose: CI calculation, field masking, RDS I/O, tile detection
30_growth_model_utils.R
ββ Used by: 30_interpolate_growth_model.R
ββ Purpose: Linear interpolation, daily metrics, seasonal grouping
40_mosaic_creation_utils.R, 40_mosaic_creation_tile_utils.R
ββ Used by: 40_mosaic_creation.R
ββ Purpose: Weekly composite creation, cloud assessment, raster masking
kpi_utils.R
ββ Used by: 80_calculate_kpis.R
ββ Purpose: Per-field statistics, phase assignment, trigger detection
report_utils.R
ββ Used by: 90_CI_report_with_kpis_simple.Rmd
ββ Purpose: Report building, table formatting, Word document generation
```
---
## Command-Line Execution Examples
### Daily/Weekly Workflow
```bash
# Stage 00: Download today's satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata --cleanup
# Stage 20: Extract CI from daily imagery (last 7 days)
cd ../r_app
Rscript 20_ci_extraction_per_field.R 2026-01-14 7 angata merged_tif
# Stage 21: Convert CI to CSV for harvest detection
Rscript 21_convert_ci_rds_to_csv.R angata
# Stage 31: Weekly harvest monitoring (real-time alerts)
cd ../python_app
python 31_harvest_imminent_weekly.py angata
# Back to R for mosaic and KPIs
cd ../r_app
Rscript 40_mosaic_creation.R 2026-01-14 7 angata
Rscript 80_calculate_kpis.R 2026-01-14 angata 7
# Stage 90: Generate report
Rscript -e "rmarkdown::render('90_CI_report_with_kpis_simple.Rmd')"
```
### One-Time Setup (Baseline Harvest Detection)
```bash
# Only run ONCE to establish baseline
cd python_app
python 22_harvest_baseline_prediction.py angata
# Convert to standard format
python 23_convert_harvest_format.py angata
```
---
## Processing Notes
### CI Extraction (Stage 20)
- Calculates CI = (NIR - Green) / (NIR + Green)
- Supports both 4-band and 8-band imagery with auto-detection
- Handles cloud masking via UDM band (8-band) or manual thresholding (4-band)
- Outputs cumulative RDS in wide format (fields Γ dates) for fast lookups
### Growth Model (Stage 30)
- Linear interpolation across missing dates
- Maintains seasonal context for agricultural lifecycle tracking
- Outputs long-format data for trend analysis
### Harvest Detection (Stages 22 & 31)
- **Model 307**: Unidirectional LSTM with dual output heads
- Imminent Head: Probability field will be harvestable in next 28 days
- Detected Head: Probability of immediate harvest event
- **Stage 22 (Baseline)**: Two-step detection on complete historical data
- Phase 1: Growing window expansion (real-time simulation)
- Phase 2: Β±40 day refinement (argmax harvest signal)
- **Stage 31 (Weekly)**: Single-run inference on recent data (~300 days)
- Compares against baseline for anomaly detection
### KPI Calculation (Stage 80)
- **Per-field metrics**: Uniformity (CV), phase, growth trends, 4-week trends
- **Status triggers**: Germination, rapid growth, slow growth, non-uniform, weed pressure, harvest imminence
- **Farm-level KPIs**: 6 high-level indicators for executive summary
- **Parallel processing**: ~1000+ fields processed in <5 minutes
---
## Future Enhancements
- **Real-Time Monitoring**: Daily harvest probability updates integrated into web dashboard
- **SAR Integration**: Radar satellite data (Sentinel-1) for all-weather monitoring
- **IoT Sensors**: Ground-based soil moisture and weather integration
- **Advanced Yield Models**: Enhanced harvest forecasting with satellite + ground truth
- **Automated Alerts**: WhatsApp/email dispatch of critical agricultural advice