# SmartCane Processing Pipeline - Complete Script Overview ## Pipeline Execution Order ## Complete Pipeline Mermaid Diagram ```mermaid %% Complete Pipeline graph TD %% ===== INPUTS ===== API["πŸ”‘ Planet API
Credentials"] GeoJSON["πŸ—ΊοΈ pivot.geojson
(Field Boundaries)"] HarvestIn["πŸ“Š harvest.xlsx
(from Stage 23)"] %% ===== STAGE 00: DOWNLOAD ===== Stage00["Stage 00: Python
00_download_8band_pu_optimized.py"] Out00["πŸ“¦ merged_tif/
YYYY-MM-DD.tif
(4-band or 8-band)
(configurable)"] %% ===== STAGE 10: OPTIONAL TILING ===== Stage10["Stage 10: R
10_create_per_field_tiffs.R
(Per-field extraction)"] Out10["πŸ“¦ daily_tiles_split/per_field/
YYYY-MM-DD/*.tif
(one per field)"] %% ===== STAGE 20: CI EXTRACTION ===== Stage20["Stage 20: R
20_ci_extraction.R"] Out20a["πŸ“¦ combined_CI_data.rds
(wide: fields Γ— dates)"] Out20b["πŸ“¦ daily RDS files
(per-date stats)"] %% ===== STAGE 21: RDS β†’ CSV ===== Stage21["Stage 21: R
21_convert_ci_rds_to_csv.R"] Out21["πŸ“¦ ci_data_for_python.csv
(long format + DOY)"] %% ===== STAGE 22: BASELINE HARVEST ===== Stage22["Stage 22: Python
22_harvest_baseline_prediction.py
(RUN ONCE)"] Out22["πŸ“¦ harvest_production_export.xlsx
(baseline predictions)"] %% ===== STAGE 23: HARVEST FORMAT ===== Stage23["Stage 23: Python
23_convert_harvest_format.py"] Out23["πŸ“¦ harvest.xlsx
(standard format)
β†’ Feeds back to Stage 80"] %% ===== STAGE 30: GROWTH MODEL ===== Stage30["Stage 30: R
30_interpolate_growth_model.R"] Out30["πŸ“¦ All_pivots_Cumulative_CI...
_quadrant_year_v2.rds
(interpolated daily)"] %% ===== STAGE 31: WEEKLY HARVEST ===== Stage31["Stage 31: Python
31_harvest_imminent_weekly.py
(Weekly)"] Out31["πŸ“¦ harvest_imminent_weekly.csv
(probabilities)"] %% ===== STAGE 40: MOSAIC ===== Stage40["Stage 40: R
40_mosaic_creation.R"] Out40["πŸ“¦ weekly_mosaic/
week_WW_YYYY.tif
(5-band composite)"] %% ===== STAGE 80: KPI ===== Stage80["Stage 80: R
80_calculate_kpis.R"] Out80a["πŸ“¦ field_analysis_week{WW}.xlsx"] Out80b["πŸ“¦ kpi_summary_tables_week{WW}.rds"] %% ===== STAGE 90: REPORT ===== Stage90["Stage 90: R/RMarkdown
90_CI_report_with_kpis_simple.Rmd"] Out90["πŸ“¦ SmartCane_Report_week{WW}_{YYYY}.docx
(FINAL OUTPUT)"] %% ===== CONNECTIONS: INPUTS TO STAGE 00 ===== API --> Stage00 GeoJSON --> Stage00 %% ===== STAGE 00 β†’ 10 OR 20 ===== Stage00 --> Out00 Out00 --> Stage10 Out00 --> Stage20 %% ===== STAGE 10 β†’ 20 ===== Stage10 --> Out10 Out10 --> Stage20 %% ===== STAGE 20 β†’ 21, 30, 40 ===== GeoJSON --> Stage20 Stage20 --> Out20a Stage20 --> Out20b Out20a --> Stage21 Out20a --> Stage30 Out00 --> Stage40 %% ===== STAGE 21 β†’ 22, 31 ===== Stage21 --> Out21 Out21 --> Stage22 Out21 --> Stage31 %% ===== STAGE 22 β†’ 23 ===== Stage22 --> Out22 Out22 --> Stage23 %% ===== STAGE 23 β†’ 80 & FEEDBACK ===== Stage23 --> Out23 Out23 -.->|"Feeds back
(Season context)"| Stage80 %% ===== STAGE 30 β†’ 80 ===== Stage30 --> Out30 Out30 --> Stage80 %% ===== STAGE 31 (PARALLEL) ===== Stage31 --> Out31 %% ===== STAGE 40 β†’ 80, 90 ===== Stage40 --> Out40 Out40 --> Stage80 Out40 --> Stage90 %% ===== STAGE 80 β†’ 90 ===== Stage80 --> Out80a Stage80 --> Out80b Out80a --> Stage90 Out80b --> Stage90 %% ===== STAGE 90 FINAL ===== Stage90 --> Out90 %% ===== ADDITIONAL INPUTS ===== HarvestIn --> Stage30 HarvestIn --> Stage80 GeoJSON --> Stage30 GeoJSON --> Stage40 GeoJSON --> Stage80 %% ===== STYLING ===== classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px classDef pyStage fill:#fff3e0,stroke:#f57c00,stroke-width:2px classDef rStage fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px classDef output fill:#e8f5e9,stroke:#388e3c,stroke-width:2px classDef finalOutput fill:#ffebee,stroke:#c62828,stroke-width:3px class API,GeoJSON,HarvestIn input class Stage00,Stage22,Stage23,Stage31 pyStage class Stage10,Stage20,Stage21,Stage30,Stage40,Stage80,Stage90 rStage class Out00,Out10,Out20a,Out20b,Out21,Out22,Out30,Out31,Out40,Out80a,Out80b output class Out23,Out90 finalOutput ``` --- ## Detailed Stage Descriptions ``` Stage 00: PYTHON - Download Satellite Data └─ 00_download_8band_pu_optimized.py INPUT: Planet API credentials, field boundaries (pivot.geojson), date range OUTPUT: laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif (4-band or 8-band) RUN FREQUENCY: Daily or as-needed NOTES: Download script configures band count; consolidates to single merged_tif/ folder Stage 10: R - Create Per-Field Daily Tiles └─ 10_create_per_field_tiffs.R INPUT: Daily GeoTIFFs from merged_tif/ Field boundaries (pivot.geojson) OUTPUT: laravel_app/storage/app/{project}/daily_tiles_split/per_field/{YYYY-MM-DD}/*.tif RUN FREQUENCY: Optional - per-field extraction for efficient memory use NOTES: Creates one GeoTIFF per field per day Stage 20: R - Extract Canopy Index (CI) from Daily Imagery └─ 20_ci_extraction_per_field.R INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/) Field boundaries (pivot.geojson) OUTPUT: RDS files: - laravel_app/storage/app/{project}/Data/extracted_ci/daily_vals/extracted_{YYYY-MM-DD}_{suffix}.rds - laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/combined_CI_data.rds (wide format) RUN FREQUENCY: Daily or on-demand COMMAND: Rscript 20_ci_extraction_per_field.R [end_date] [offset] [project_dir] [data_source] EXAMPLE: Rscript 20_ci_extraction_per_field.R 2026-01-02 7 angata merged_tif NOTES: Auto-detects per-field tiles if daily_tiles_split/per_field/ exists; outputs cumulative CI (fields Γ— dates) Stage 21: R - Convert CI RDS to CSV for Python Harvest Detection └─ 21_convert_ci_rds_to_csv.R INPUT: combined_CI_data.rds (from Stage 20) OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/ci_data_for_python/ci_data_for_python.csv RUN FREQUENCY: After Stage 20 COMMAND: Rscript 21_convert_ci_rds_to_csv.R [project_dir] EXAMPLE: Rscript 21_convert_ci_rds_to_csv.R angata NOTES: Converts wide RDS (fields Γ— dates) to long CSV; interpolates missing dates; adds DOY column Stage 22: PYTHON - Baseline Harvest Prediction (LSTM Model 307) └─ 22_harvest_baseline_prediction.py INPUT: ci_data_for_python.csv (complete historical CI data) OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_production_export.xlsx RUN FREQUENCY: ONCE - establishes ground truth baseline for all fields COMMAND: python 22_harvest_baseline_prediction.py [project_name] EXAMPLE: python 22_harvest_baseline_prediction.py angata NOTES: Two-step detection (Phase 1: growing window, Phase 2: Β±40 day argmax refinement) Tuned parameters: threshold=0.3, consecutive_days=2 Uses LSTM Model 307 dual output heads (imminent + detected) Stage 23: PYTHON - Convert Harvest Format to Standard Structure └─ 23_convert_harvest_format.py INPUT: harvest_production_export.xlsx (from Stage 22) CI data date range (determines season_start for first season) OUTPUT: laravel_app/storage/app/{project}/Data/harvest.xlsx (standard format) RUN FREQUENCY: After Stage 22 COMMAND: python 23_convert_harvest_format.py [project_name] EXAMPLE: python 23_convert_harvest_format.py angata NOTES: Converts to standard harvest.xlsx format with columns: field, sub_field, year, season, season_start, season_end, age, sub_area, tonnage_ha Season format: "Data{year} : {field}" Only includes completed seasons (with season_end filled) Stage 30: R - Growth Model Interpolation (Smooth CI Time Series) └─ 30_interpolate_growth_model.R INPUT: combined_CI_data.rds (from Stage 20) harvest.xlsx (optional, for seasonal context) OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/ All_pivots_Cumulative_CI_quadrant_year_v2.rds RUN FREQUENCY: Weekly or after CI extraction updates COMMAND: Rscript 30_interpolate_growth_model.R [project_dir] EXAMPLE: Rscript 30_interpolate_growth_model.R angata NOTES: Linear interpolation across gaps; calculates daily change and cumulative CI Outputs long-format data (Date, DOY, field, value, season, etc.) Stage 31: PYTHON - Weekly Harvest Monitoring (Real-Time Alerts) └─ 31_harvest_imminent_weekly.py INPUT: ci_data_for_python.csv (recent CI data, last ~300 days) harvest_production_export.xlsx (optional baseline reference) OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_imminent_weekly.csv RUN FREQUENCY: Weekly or daily for operational alerts COMMAND: python 31_harvest_imminent_weekly.py [project_name] EXAMPLE: python 31_harvest_imminent_weekly.py angata NOTES: Single-run inference on recent data; outputs probabilities (imminent_prob, detected_prob) Used for real-time decision support; compared against baseline from Stage 22 Stage 40: R - Create Weekly 5-Band Mosaics └─ 40_mosaic_creation_per_field.R INPUT: Daily GeoTIFFs (merged_tif/ or daily_tiles_split/per_field/) Field boundaries (pivot.geojson) OUTPUT: laravel_app/storage/app/{project}/weekly_mosaic/week_{WW}_{YYYY}.tif RUN FREQUENCY: Weekly COMMAND: Rscript 40_mosaic_creation_per_field.R [end_date] [offset] [project_dir] EXAMPLE: Rscript 40_mosaic_creation_per_field.R 2026-01-14 7 angata NOTES: Composites daily images using MAX function; 5 bands (R, G, B, NIR, CI) Automatically selects images with acceptable cloud coverage Output uses ISO week numbering (week_WW_YYYY) Stage 80: R - Calculate KPIs & Per-Field Analysis └─ 80_calculate_kpis.R INPUT: Weekly mosaic (from Stage 40) Growth model data (from Stage 30) Field boundaries (pivot.geojson) Harvest data (harvest.xlsx) OUTPUT: laravel_app/storage/app/{project}/reports/ - {project}_field_analysis_week{WW}.xlsx - {project}_kpi_summary_tables_week{WW}.rds RUN FREQUENCY: Weekly COMMAND: Rscript 80_calculate_kpis.R [end_date] [project_dir] [offset_days] EXAMPLE: Rscript 80_calculate_kpis.R 2026-01-14 angata 7 NOTES: Parallel processing for 1000+ fields; calculates: - Per-field uniformity (CV), phase assignment, growth trends - Status triggers (germination, rapid growth, disease, harvest imminence) - Farm-level KPI metrics (6 high-level indicators) TEST_MODE=TRUE uses only recent weeks for development Stage 90: R (RMarkdown) - Generate Executive Report (Word Document) └─ 90_CI_report_with_kpis_simple.Rmd INPUT: Weekly mosaic (from Stage 40) KPI summary data (from Stage 80) Field analysis (from Stage 80) Field boundaries & harvest data (for context) OUTPUT: laravel_app/storage/app/{project}/reports/ SmartCane_Report_week{WW}_{YYYY}.docx (PRIMARY OUTPUT) SmartCane_Report_week{WW}_{YYYY}.html (optional) RUN FREQUENCY: Weekly RENDERING: R/RMarkdown with officer + flextable packages NOTES: Executive summary with KPI overview, phase distribution, status triggers Field-by-field detail pages with CI metrics and interpretation guides Automatic unit conversion (hectares ↔ acres) ``` --- ## Data Storage & Persistence All data persists to the file system. No database writes occur during analysisβ€”reads only for metadata. ``` laravel_app/storage/app/{project}/ β”œβ”€β”€ Data/ β”‚ β”œβ”€β”€ pivot.geojson # Field boundaries (read-only input) β”‚ β”œβ”€β”€ harvest.xlsx # Season dates & yield (standard format from Stage 23) β”‚ β”œβ”€β”€ vrt/ # Virtual raster files (daily VRTs from Stage 20) β”‚ β”‚ └── YYYY-MM-DD.vrt β”‚ β”œβ”€β”€ extracted_ci/ β”‚ β”‚ β”œβ”€β”€ ci_data_for_python/ β”‚ β”‚ β”‚ └── ci_data_for_python.csv # CSV for Python (from Stage 21) β”‚ β”‚ β”œβ”€β”€ daily_vals/ β”‚ β”‚ β”‚ └── extracted_YYYY-MM-DD_{suffix}.rds # Daily field CI stats (from Stage 20) β”‚ β”‚ └── cumulative_vals/ β”‚ β”‚ β”œβ”€β”€ combined_CI_data.rds # Cumulative CI, wide format (from Stage 20) β”‚ β”‚ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds # Interpolated daily (from Stage 30) β”‚ └── HarvestData/ β”‚ β”œβ”€β”€ harvest_production_export.xlsx # Baseline harvest predictions (from Stage 22) β”‚ └── harvest_imminent_weekly.csv # Weekly monitoring output (from Stage 31) β”‚ β”œβ”€β”€ merged_tif/ # Raw satellite imagery (Stage 00 output) β”‚ └── YYYY-MM-DD.tif # 4-band or 8-band (configurable via download script) β”‚ β”œβ”€β”€ daily_tiles_split/ # (Optional) Per-field tile processing (Stage 10 output) β”‚ β”œβ”€β”€ per_field/ β”‚ β”‚ └── YYYY-MM-DD/ # Date-specific folder β”‚ β”‚ └── {FIELD}_YYYY-MM-DD.tif # One per-field GeoTIFF per day β”‚ β”œβ”€β”€ weekly_mosaic/ # Weekly composite mosaics (Stage 40 output) β”‚ └── week_WW_YYYY.tif # 5 bands: R, G, B, NIR, CI (composite) β”‚ └── reports/ # Analysis outputs & reports (Stage 80, 90 outputs) β”œβ”€β”€ SmartCane_Report_week{WW}_{YYYY}.docx # FINAL REPORT (Stage 90) β”œβ”€β”€ SmartCane_Report_week{WW}_{YYYY}.html # Alternative format β”œβ”€β”€ {project}_field_analysis_week{WW}.xlsx # Field-by-field data (Stage 80) β”œβ”€β”€ {project}_kpi_summary_tables_week{WW}.rds # Summary RDS (Stage 80) └── kpis/ └── week_WW_YYYY/ # Week-specific KPI folder ``` --- ## Key File Formats | Format | Stage | Purpose | Example | |--------|-------|---------|---------| | `.tif` (GeoTIFF) | 00, 10, 40 | Geospatial raster imagery | `2026-01-14.tif` (4-band), `week_02_2026.tif` (5-band) | | `.vrt` (Virtual Raster) | 20 | Virtual pointer to TIFFs | `2026-01-14.vrt` | | `.rds` (R Binary) | 20, 21, 30, 80 | R serialized data objects | `combined_CI_data.rds`, `All_pivots_Cumulative_CI_quadrant_year_v2.rds` | | `.csv` (Comma-Separated) | 21, 31 | Tabular data for Python | `ci_data_for_python.csv`, `harvest_imminent_weekly.csv` | | `.xlsx` (Excel) | 22, 23, 80 | Tabular reports & harvest data | `harvest.xlsx`, `harvest_production_export.xlsx`, field analysis | | `.docx` (Word) | 90 | Executive report (final output) | `SmartCane_Report_week02_2026.docx` | | `.json` | 10 | Tiling metadata | `tiling_config.json` | | `.geojson` | Input | Field boundaries (read-only) | `pivot.geojson` | --- ## Script Dependencies & Utility Files ``` parameters_project.R β”œβ”€ Loaded by: 20_ci_extraction.R, 30_interpolate_growth_model.R, β”‚ 40_mosaic_creation.R, 80_calculate_kpis.R, 90_CI_report_with_kpis_simple.Rmd └─ Purpose: Initializes project config (paths, field boundaries, harvest data) harvest_date_pred_utils.py β”œβ”€ Used by: 22_harvest_baseline_prediction.py, 23_convert_harvest_format.py, 31_harvest_imminent_weekly.py └─ Purpose: LSTM model loading, feature extraction, two-step harvest detection 20_ci_extraction_utils.R β”œβ”€ Used by: 20_ci_extraction.R └─ Purpose: CI calculation, field masking, RDS I/O, tile detection 30_growth_model_utils.R β”œβ”€ Used by: 30_interpolate_growth_model.R └─ Purpose: Linear interpolation, daily metrics, seasonal grouping 40_mosaic_creation_utils.R, 40_mosaic_creation_tile_utils.R β”œβ”€ Used by: 40_mosaic_creation.R └─ Purpose: Weekly composite creation, cloud assessment, raster masking kpi_utils.R β”œβ”€ Used by: 80_calculate_kpis.R └─ Purpose: Per-field statistics, phase assignment, trigger detection report_utils.R β”œβ”€ Used by: 90_CI_report_with_kpis_simple.Rmd └─ Purpose: Report building, table formatting, Word document generation ``` --- ## Command-Line Execution Examples ### Daily/Weekly Workflow ```bash # Stage 00: Download today's satellite data cd python_app python 00_download_8band_pu_optimized.py angata --cleanup # Stage 20: Extract CI from daily imagery (last 7 days) cd ../r_app Rscript 20_ci_extraction_per_field.R 2026-01-14 7 angata merged_tif # Stage 21: Convert CI to CSV for harvest detection Rscript 21_convert_ci_rds_to_csv.R angata # Stage 31: Weekly harvest monitoring (real-time alerts) cd ../python_app python 31_harvest_imminent_weekly.py angata # Back to R for mosaic and KPIs cd ../r_app Rscript 40_mosaic_creation.R 2026-01-14 7 angata Rscript 80_calculate_kpis.R 2026-01-14 angata 7 # Stage 90: Generate report Rscript -e "rmarkdown::render('90_CI_report_with_kpis_simple.Rmd')" ``` ### One-Time Setup (Baseline Harvest Detection) ```bash # Only run ONCE to establish baseline cd python_app python 22_harvest_baseline_prediction.py angata # Convert to standard format python 23_convert_harvest_format.py angata ``` --- ## Processing Notes ### CI Extraction (Stage 20) - Calculates CI = (NIR - Green) / (NIR + Green) - Supports both 4-band and 8-band imagery with auto-detection - Handles cloud masking via UDM band (8-band) or manual thresholding (4-band) - Outputs cumulative RDS in wide format (fields Γ— dates) for fast lookups ### Growth Model (Stage 30) - Linear interpolation across missing dates - Maintains seasonal context for agricultural lifecycle tracking - Outputs long-format data for trend analysis ### Harvest Detection (Stages 22 & 31) - **Model 307**: Unidirectional LSTM with dual output heads - Imminent Head: Probability field will be harvestable in next 28 days - Detected Head: Probability of immediate harvest event - **Stage 22 (Baseline)**: Two-step detection on complete historical data - Phase 1: Growing window expansion (real-time simulation) - Phase 2: Β±40 day refinement (argmax harvest signal) - **Stage 31 (Weekly)**: Single-run inference on recent data (~300 days) - Compares against baseline for anomaly detection ### KPI Calculation (Stage 80) - **Per-field metrics**: Uniformity (CV), phase, growth trends, 4-week trends - **Status triggers**: Germination, rapid growth, slow growth, non-uniform, weed pressure, harvest imminence - **Farm-level KPIs**: 6 high-level indicators for executive summary - **Parallel processing**: ~1000+ fields processed in <5 minutes --- ## Future Enhancements - **Real-Time Monitoring**: Daily harvest probability updates integrated into web dashboard - **SAR Integration**: Radar satellite data (Sentinel-1) for all-weather monitoring - **IoT Sensors**: Ground-based soil moisture and weather integration - **Advanced Yield Models**: Enhanced harvest forecasting with satellite + ground truth - **Automated Alerts**: WhatsApp/email dispatch of critical agricultural advice