20 KiB
20 KiB
SmartCane Processing Pipeline - Complete Script Overview
Pipeline Execution Order
Complete Pipeline Mermaid Diagram
graph TD
%% ===== INPUTS =====
API["🔑 Planet API<br/>Credentials"]
GeoJSON["🗺️ pivot.geojson<br/>(Field Boundaries)"]
HarvestIn["📊 harvest.xlsx<br/>(from Stage 23)"]
%% ===== STAGE 00: DOWNLOAD =====
Stage00["<b>Stage 00: Python</b><br/>00_download_8band_pu_optimized.py"]
Out00["📦 merged_tif_8b/<br/>YYYY-MM-DD.tif<br/>(4-band uint16)"]
%% ===== STAGE 10: OPTIONAL TILING =====
Stage10["<b>Stage 10: R</b><br/>10_create_master_grid...<br/>(Optional)"]
Out10["📦 daily_tiles_split/5x5/<br/>YYYY-MM-DD/*.tif<br/>(25 tiles)"]
%% ===== STAGE 20: CI EXTRACTION =====
Stage20["<b>Stage 20: R</b><br/>20_ci_extraction.R"]
Out20a["📦 combined_CI_data.rds<br/>(wide: fields × dates)"]
Out20b["📦 daily RDS files<br/>(per-date stats)"]
%% ===== STAGE 21: RDS → CSV =====
Stage21["<b>Stage 21: R</b><br/>21_convert_ci_rds_to_csv.R"]
Out21["📦 ci_data_for_python.csv<br/>(long format + DOY)"]
%% ===== STAGE 22: BASELINE HARVEST =====
Stage22["<b>Stage 22: Python</b><br/>22_harvest_baseline_prediction.py<br/>(RUN ONCE)"]
Out22["📦 harvest_production_export.xlsx<br/>(baseline predictions)"]
%% ===== STAGE 23: HARVEST FORMAT =====
Stage23["<b>Stage 23: Python</b><br/>23_convert_harvest_format.py"]
Out23["📦 harvest.xlsx<br/>(standard format)<br/>→ Feeds back to Stage 80"]
%% ===== STAGE 30: GROWTH MODEL =====
Stage30["<b>Stage 30: R</b><br/>30_interpolate_growth_model.R"]
Out30["📦 All_pivots_Cumulative_CI...<br/>_quadrant_year_v2.rds<br/>(interpolated daily)"]
%% ===== STAGE 31: WEEKLY HARVEST =====
Stage31["<b>Stage 31: Python</b><br/>31_harvest_imminent_weekly.py<br/>(Weekly)"]
Out31["📦 harvest_imminent_weekly.csv<br/>(probabilities)"]
%% ===== STAGE 40: MOSAIC =====
Stage40["<b>Stage 40: R</b><br/>40_mosaic_creation.R"]
Out40["📦 weekly_mosaic/<br/>week_WW_YYYY.tif<br/>(5-band composite)"]
%% ===== STAGE 80: KPI =====
Stage80["<b>Stage 80: R</b><br/>80_calculate_kpis.R"]
Out80a["📦 field_analysis_week{WW}.xlsx"]
Out80b["📦 kpi_summary_tables_week{WW}.rds"]
%% ===== STAGE 90: REPORT =====
Stage90["<b>Stage 90: R/RMarkdown</b><br/>90_CI_report_with_kpis_simple.Rmd"]
Out90["📦 SmartCane_Report_week{WW}_{YYYY}.docx<br/>(FINAL OUTPUT)"]
%% ===== CONNECTIONS: INPUTS TO STAGE 00 =====
API --> Stage00
GeoJSON --> Stage00
%% ===== STAGE 00 → 10 OR 20 =====
Stage00 --> Out00
Out00 --> Stage10
Out00 --> Stage20
%% ===== STAGE 10 → 20 =====
Stage10 --> Out10
Out10 --> Stage20
%% ===== STAGE 20 → 21, 30, 40 =====
GeoJSON --> Stage20
Stage20 --> Out20a
Stage20 --> Out20b
Out20a --> Stage21
Out20a --> Stage30
Out00 --> Stage40
%% ===== STAGE 21 → 22, 31 =====
Stage21 --> Out21
Out21 --> Stage22
Out21 --> Stage31
%% ===== STAGE 22 → 23 =====
Stage22 --> Out22
Out22 --> Stage23
%% ===== STAGE 23 → 80 & FEEDBACK =====
Stage23 --> Out23
Out23 -.->|"Feeds back<br/>(Season context)"| Stage80
%% ===== STAGE 30 → 80 =====
Stage30 --> Out30
Out30 --> Stage80
%% ===== STAGE 31 (PARALLEL) =====
Stage31 --> Out31
%% ===== STAGE 40 → 80, 90 =====
Stage40 --> Out40
Out40 --> Stage80
Out40 --> Stage90
%% ===== STAGE 80 → 90 =====
Stage80 --> Out80a
Stage80 --> Out80b
Out80a --> Stage90
Out80b --> Stage90
%% ===== STAGE 90 FINAL =====
Stage90 --> Out90
%% ===== ADDITIONAL INPUTS =====
HarvestIn --> Stage30
HarvestIn --> Stage80
GeoJSON --> Stage30
GeoJSON --> Stage40
GeoJSON --> Stage80
%% ===== STYLING =====
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef pyStage fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef rStage fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef output fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef finalOutput fill:#ffebee,stroke:#c62828,stroke-width:3px
class API,GeoJSON,HarvestIn input
class Stage00,Stage22,Stage23,Stage31 pyStage
class Stage10,Stage20,Stage21,Stage30,Stage40,Stage80,Stage90 rStage
class Out00,Out10,Out20a,Out20b,Out21,Out22,Out30,Out31,Out40,Out80a,Out80b output
class Out23,Out90 finalOutput
Detailed Stage Descriptions
Stage 00: PYTHON - Download Satellite Data
└─ 00_download_8band_pu_optimized.py
INPUT: Planet API credentials, field boundaries (pivot.geojson), date range
OUTPUT: laravel_app/storage/app/{project}/merged_tif_8b/{YYYY-MM-DD}.tif (4-band uint16)
RUN FREQUENCY: Daily or as-needed
NOTES: 8-band includes UDM cloud masking, optimized for PU cost
Stage 10: R - (Optional) Create Master Grid & Split TIFFs into Tiles
└─ 10_create_master_grid_and_split_tiffs.R
INPUT: Daily GeoTIFFs from merged_tif_8b/
OUTPUT: laravel_app/storage/app/{project}/daily_tiles_split/5x5/{YYYY-MM-DD}/*.tif
RUN FREQUENCY: Optional - only if tile-based processing desired
NOTES: Creates 25 tiles per day for memory-efficient processing; 5x5 grid hardcoded
Stage 20: R - Extract Canopy Index (CI) from Daily Imagery
└─ 20_ci_extraction.R
INPUT: Daily GeoTIFFs (merged_tif_8b/ or daily_tiles_split/)
Field boundaries (pivot.geojson)
Data source parameter (merged_tif_8b, merged_tif, merged_final_tif)
OUTPUT: RDS files:
- laravel_app/storage/app/{project}/Data/extracted_ci/daily_vals/extracted_{YYYY-MM-DD}_{suffix}.rds
- laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/combined_CI_data.rds (wide format)
RUN FREQUENCY: Daily or on-demand
COMMAND: Rscript 20_ci_extraction.R [end_date] [offset] [project_dir] [data_source]
EXAMPLE: Rscript 20_ci_extraction.R 2026-01-02 7 angata merged_tif_8b
NOTES: Auto-detects tiles if daily_tiles_split/ exists; outputs cumulative CI (fields × dates)
Stage 21: R - Convert CI RDS to CSV for Python Harvest Detection
└─ 21_convert_ci_rds_to_csv.R
INPUT: combined_CI_data.rds (from Stage 20)
OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/ci_data_for_python/ci_data_for_python.csv
RUN FREQUENCY: After Stage 20
COMMAND: Rscript 21_convert_ci_rds_to_csv.R [project_dir]
EXAMPLE: Rscript 21_convert_ci_rds_to_csv.R angata
NOTES: Converts wide RDS (fields × dates) to long CSV; interpolates missing dates; adds DOY column
Stage 22: PYTHON - Baseline Harvest Prediction (LSTM Model 307)
└─ 22_harvest_baseline_prediction.py
INPUT: ci_data_for_python.csv (complete historical CI data)
OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_production_export.xlsx
RUN FREQUENCY: ONCE - establishes ground truth baseline for all fields
COMMAND: python 22_harvest_baseline_prediction.py [project_name]
EXAMPLE: python 22_harvest_baseline_prediction.py angata
NOTES: Two-step detection (Phase 1: growing window, Phase 2: ±40 day argmax refinement)
Tuned parameters: threshold=0.3, consecutive_days=2
Uses LSTM Model 307 dual output heads (imminent + detected)
Stage 23: PYTHON - Convert Harvest Format to Standard Structure
└─ 23_convert_harvest_format.py
INPUT: harvest_production_export.xlsx (from Stage 22)
CI data date range (determines season_start for first season)
OUTPUT: laravel_app/storage/app/{project}/Data/harvest.xlsx (standard format)
RUN FREQUENCY: After Stage 22
COMMAND: python 23_convert_harvest_format.py [project_name]
EXAMPLE: python 23_convert_harvest_format.py angata
NOTES: Converts to standard harvest.xlsx format with columns:
field, sub_field, year, season, season_start, season_end, age, sub_area, tonnage_ha
Season format: "Data{year} : {field}"
Only includes completed seasons (with season_end filled)
Stage 30: R - Growth Model Interpolation (Smooth CI Time Series)
└─ 30_interpolate_growth_model.R
INPUT: combined_CI_data.rds (from Stage 20)
harvest.xlsx (optional, for seasonal context)
OUTPUT: laravel_app/storage/app/{project}/Data/extracted_ci/cumulative_vals/
All_pivots_Cumulative_CI_quadrant_year_v2.rds
RUN FREQUENCY: Weekly or after CI extraction updates
COMMAND: Rscript 30_interpolate_growth_model.R [project_dir]
EXAMPLE: Rscript 30_interpolate_growth_model.R angata
NOTES: Linear interpolation across gaps; calculates daily change and cumulative CI
Outputs long-format data (Date, DOY, field, value, season, etc.)
Stage 31: PYTHON - Weekly Harvest Monitoring (Real-Time Alerts)
└─ 31_harvest_imminent_weekly.py
INPUT: ci_data_for_python.csv (recent CI data, last ~300 days)
harvest_production_export.xlsx (optional baseline reference)
OUTPUT: laravel_app/storage/app/{project}/Data/HarvestData/harvest_imminent_weekly.csv
RUN FREQUENCY: Weekly or daily for operational alerts
COMMAND: python 31_harvest_imminent_weekly.py [project_name]
EXAMPLE: python 31_harvest_imminent_weekly.py angata
NOTES: Single-run inference on recent data; outputs probabilities (imminent_prob, detected_prob)
Used for real-time decision support; compared against baseline from Stage 22
Stage 40: R - Create Weekly 5-Band Mosaics
└─ 40_mosaic_creation.R
INPUT: Daily GeoTIFFs (merged_tif_8b/ or daily_tiles_split/)
Field boundaries (pivot.geojson)
OUTPUT: laravel_app/storage/app/{project}/weekly_mosaic/week_{WW}_{YYYY}.tif
RUN FREQUENCY: Weekly
COMMAND: Rscript 40_mosaic_creation.R [end_date] [offset] [project_dir]
EXAMPLE: Rscript 40_mosaic_creation.R 2026-01-14 7 angata
NOTES: Composites daily images using MAX function; 5 bands (R, G, B, NIR, CI)
Automatically selects images with acceptable cloud coverage
Output uses ISO week numbering (week_WW_YYYY)
Stage 80: R - Calculate KPIs & Per-Field Analysis
└─ 80_calculate_kpis.R
INPUT: Weekly mosaic (from Stage 40)
Growth model data (from Stage 30)
Field boundaries (pivot.geojson)
Harvest data (harvest.xlsx)
OUTPUT: laravel_app/storage/app/{project}/reports/
- {project}_field_analysis_week{WW}.xlsx
- {project}_kpi_summary_tables_week{WW}.rds
RUN FREQUENCY: Weekly
COMMAND: Rscript 80_calculate_kpis.R [end_date] [project_dir] [offset_days]
EXAMPLE: Rscript 80_calculate_kpis.R 2026-01-14 angata 7
NOTES: Parallel processing for 1000+ fields; calculates:
- Per-field uniformity (CV), phase assignment, growth trends
- Status triggers (germination, rapid growth, disease, harvest imminence)
- Farm-level KPI metrics (6 high-level indicators)
TEST_MODE=TRUE uses only recent weeks for development
Stage 90: R (RMarkdown) - Generate Executive Report (Word Document)
└─ 90_CI_report_with_kpis_simple.Rmd
INPUT: Weekly mosaic (from Stage 40)
KPI summary data (from Stage 80)
Field analysis (from Stage 80)
Field boundaries & harvest data (for context)
OUTPUT: laravel_app/storage/app/{project}/reports/
SmartCane_Report_week{WW}_{YYYY}.docx (PRIMARY OUTPUT)
SmartCane_Report_week{WW}_{YYYY}.html (optional)
RUN FREQUENCY: Weekly
RENDERING: R/RMarkdown with officer + flextable packages
NOTES: Executive summary with KPI overview, phase distribution, status triggers
Field-by-field detail pages with CI metrics and interpretation guides
Automatic unit conversion (hectares ↔ acres)
Data Storage & Persistence
All data persists to the file system. No database writes occur during analysis—reads only for metadata.
laravel_app/storage/app/{project}/
├── Data/
│ ├── pivot.geojson # Field boundaries (read-only input)
│ ├── harvest.xlsx # Season dates & yield (standard format from Stage 23)
│ ├── vrt/ # Virtual raster files (daily VRTs from Stage 20)
│ │ └── YYYY-MM-DD.vrt
│ ├── extracted_ci/
│ │ ├── ci_data_for_python/
│ │ │ └── ci_data_for_python.csv # CSV for Python (from Stage 21)
│ │ ├── daily_vals/
│ │ │ └── extracted_YYYY-MM-DD_{suffix}.rds # Daily field CI stats (from Stage 20)
│ │ └── cumulative_vals/
│ │ ├── combined_CI_data.rds # Cumulative CI, wide format (from Stage 20)
│ │ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds # Interpolated daily (from Stage 30)
│ └── HarvestData/
│ ├── harvest_production_export.xlsx # Baseline harvest predictions (from Stage 22)
│ └── harvest_imminent_weekly.csv # Weekly monitoring output (from Stage 31)
│
├── merged_tif_8b/ # Raw 4-band satellite imagery (Stage 00 output)
│ └── YYYY-MM-DD.tif # 4 bands: R, G, B, NIR (uint16 with UDM cloud masking)
│
├── daily_tiles_split/ # (Optional) Tile-based processing (Stage 10 output)
│ ├── 5x5/
│ │ ├── tiling_config.json # Metadata about tiling parameters
│ │ └── YYYY-MM-DD/ # Date-specific folder
│ │ └── YYYY-MM-DD_{00-24}.tif # 25 tiles per day
│
├── weekly_mosaic/ # Weekly composite mosaics (Stage 40 output)
│ └── week_WW_YYYY.tif # 5 bands: R, G, B, NIR, CI (composite)
│
└── reports/ # Analysis outputs & reports (Stage 80, 90 outputs)
├── SmartCane_Report_week{WW}_{YYYY}.docx # FINAL REPORT (Stage 90)
├── SmartCane_Report_week{WW}_{YYYY}.html # Alternative format
├── {project}_field_analysis_week{WW}.xlsx # Field-by-field data (Stage 80)
├── {project}_kpi_summary_tables_week{WW}.rds # Summary RDS (Stage 80)
└── kpis/
└── week_WW_YYYY/ # Week-specific KPI folder
Key File Formats
| Format | Stage | Purpose | Example |
|---|---|---|---|
.tif (GeoTIFF) |
00, 10, 40 | Geospatial raster imagery | 2026-01-14.tif (4-band), week_02_2026.tif (5-band) |
.vrt (Virtual Raster) |
20 | Virtual pointer to TIFFs | 2026-01-14.vrt |
.rds (R Binary) |
20, 21, 30, 80 | R serialized data objects | combined_CI_data.rds, All_pivots_Cumulative_CI_quadrant_year_v2.rds |
.csv (Comma-Separated) |
21, 31 | Tabular data for Python | ci_data_for_python.csv, harvest_imminent_weekly.csv |
.xlsx (Excel) |
22, 23, 80 | Tabular reports & harvest data | harvest.xlsx, harvest_production_export.xlsx, field analysis |
.docx (Word) |
90 | Executive report (final output) | SmartCane_Report_week02_2026.docx |
.json |
10 | Tiling metadata | tiling_config.json |
.geojson |
Input | Field boundaries (read-only) | pivot.geojson |
Script Dependencies & Utility Files
parameters_project.R
├─ Loaded by: 20_ci_extraction.R, 30_interpolate_growth_model.R,
│ 40_mosaic_creation.R, 80_calculate_kpis.R, 90_CI_report_with_kpis_simple.Rmd
└─ Purpose: Initializes project config (paths, field boundaries, harvest data)
harvest_date_pred_utils.py
├─ Used by: 22_harvest_baseline_prediction.py, 23_convert_harvest_format.py, 31_harvest_imminent_weekly.py
└─ Purpose: LSTM model loading, feature extraction, two-step harvest detection
20_ci_extraction_utils.R
├─ Used by: 20_ci_extraction.R
└─ Purpose: CI calculation, field masking, RDS I/O, tile detection
30_growth_model_utils.R
├─ Used by: 30_interpolate_growth_model.R
└─ Purpose: Linear interpolation, daily metrics, seasonal grouping
40_mosaic_creation_utils.R, 40_mosaic_creation_tile_utils.R
├─ Used by: 40_mosaic_creation.R
└─ Purpose: Weekly composite creation, cloud assessment, raster masking
kpi_utils.R
├─ Used by: 80_calculate_kpis.R
└─ Purpose: Per-field statistics, phase assignment, trigger detection
report_utils.R
├─ Used by: 90_CI_report_with_kpis_simple.Rmd
└─ Purpose: Report building, table formatting, Word document generation
Command-Line Execution Examples
Daily/Weekly Workflow
# Stage 00: Download today's satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata --cleanup
# Stage 20: Extract CI from daily imagery (last 7 days)
cd ../r_app
Rscript 20_ci_extraction.R 2026-01-14 7 angata merged_tif_8b
# Stage 21: Convert CI to CSV for harvest detection
Rscript 21_convert_ci_rds_to_csv.R angata
# Stage 31: Weekly harvest monitoring (real-time alerts)
cd ../python_app
python 31_harvest_imminent_weekly.py angata
# Back to R for mosaic and KPIs
cd ../r_app
Rscript 40_mosaic_creation.R 2026-01-14 7 angata
Rscript 80_calculate_kpis.R 2026-01-14 angata 7
# Stage 90: Generate report
Rscript -e "rmarkdown::render('90_CI_report_with_kpis_simple.Rmd')"
One-Time Setup (Baseline Harvest Detection)
# Only run ONCE to establish baseline
cd python_app
python 22_harvest_baseline_prediction.py angata
# Convert to standard format
python 23_convert_harvest_format.py angata
Processing Notes
CI Extraction (Stage 20)
- Calculates CI = (NIR - Green) / (NIR + Green)
- Supports both 4-band and 8-band imagery with auto-detection
- Handles cloud masking via UDM band (8-band) or manual thresholding (4-band)
- Outputs cumulative RDS in wide format (fields × dates) for fast lookups
Growth Model (Stage 30)
- Linear interpolation across missing dates
- Maintains seasonal context for agricultural lifecycle tracking
- Outputs long-format data for trend analysis
Harvest Detection (Stages 22 & 31)
- Model 307: Unidirectional LSTM with dual output heads
- Imminent Head: Probability field will be harvestable in next 28 days
- Detected Head: Probability of immediate harvest event
- Stage 22 (Baseline): Two-step detection on complete historical data
- Phase 1: Growing window expansion (real-time simulation)
- Phase 2: ±40 day refinement (argmax harvest signal)
- Stage 31 (Weekly): Single-run inference on recent data (~300 days)
- Compares against baseline for anomaly detection
KPI Calculation (Stage 80)
- Per-field metrics: Uniformity (CV), phase, growth trends, 4-week trends
- Status triggers: Germination, rapid growth, slow growth, non-uniform, weed pressure, harvest imminence
- Farm-level KPIs: 6 high-level indicators for executive summary
- Parallel processing: ~1000+ fields processed in <5 minutes
Future Enhancements
- Real-Time Monitoring: Daily harvest probability updates integrated into web dashboard
- SAR Integration: Radar satellite data (Sentinel-1) for all-weather monitoring
- IoT Sensors: Ground-based soil moisture and weather integration
- Advanced Yield Models: Enhanced harvest forecasting with satellite + ground truth
- Automated Alerts: WhatsApp/email dispatch of critical agricultural advice