SmartCane/webapps/docs/system_architecture.md

42 KiB
Raw Blame History

SmartCane System Architecture - Python + R Pipeline & File-Based Processing

🗂️ Quick Navigation

New Architecture Guides (start here for complete system understanding):

  • ARCHITECTURE_INTEGRATION_GUIDE.mdStart here! Integrates all dimensions: pipeline stages, client types, and execution models. Includes decision matrices and troubleshooting.

  • ARCHITECTURE_DATA_FLOW.md — Complete Stage 0091 data pipeline with transformations, file formats, and storage locations. High-level overview + stage-by-stage details.

  • CLIENT_TYPE_ARCHITECTURE.md — Explains how agronomic_support (AURA) and cane_supply (ANGATA) client types branch at Stage 80. KPI differences, report differences, configuration mapping.

  • SOBIT_DEPLOYMENT.md — Production server deployment via Laravel job queue. Web UI, shell wrappers, job chaining, error handling, monitoring.

  • DEV_LAPTOP_EXECUTION.md — Developer manual execution on Windows laptops. PowerShell commands, stage-by-stage workflows, configuration, troubleshooting.


Overview

The SmartCane system is a file-based agricultural intelligence platform that processes satellite imagery through sequential Python and R scripts. Raw satellite imagery is downloaded via Planet API (Python), then flows through R processing stages (CI extraction, growth model interpolation, mosaic creation, KPI analysis, harvest detection) with outputs persisted as GeoTIFFs, RDS files, Excel sheets, and Word reports. Harvest monitoring is performed via ML-based harvest detection using LSTM models trained on historical CI sequences.

Processing Pipeline Overview

%% SmartCane Processing Pipeline
graph LR
    DL["🐍 Python Download"] --> TIFF["📦 Daily GeoTIFFs"]
    TIFF --> CI["🟢 CI Extraction<br/>(RDS)"]
    CI --> GM["🟡 Growth Model<br/>(RDS)"]
    TIFF --> GM
    CI --> CCI["📊 Cumulative<br/>CI Data"]
    GM --> KPI["🔴 KPI Calculation"]
    CCI -.-> KPI
    KPI --> FA["📋 Field Analysis &<br/>Report Generation"]
    FA --> OUT["📄 Excel + Word<br/>Outputs"]
    
    style DL fill:#fff3e0
    style TIFF fill:#e8f5e9
    style CI fill:#e8f5e9
    style GM fill:#e8f5e9
    style CCI fill:#fff9c4
    style KPI fill:#ffccbc
    style FA fill:#e0f2f1
    style OUT fill:#f1f8e9

SmartCane Modern Architecture: Complete Pipeline with Client Types & Execution Models

%% SmartCane Modern Architecture
graph TD
    subgraph INPUTS["🔹 INPUTS"]
        API["🔑 Planet API<br/>Credentials"]
        GJ["🗺️ Field Boundaries<br/>pivot.geojson"]
        HV["📊 Harvest Data<br/>harvest.xlsx"]
        CONFIG["⚙️ Configuration<br/>parameters_project.R"]
    end
    
    subgraph STAGE00["STAGE 00: Python Download"]
        PY["🐍 00_download_8band<br/>_pu_optimized.py"]
        PY_OUT["📦 merged_tif/{DATE}.tif<br/>4-band uint16<br/>(R,G,B,NIR)"]
    end
    
    subgraph STAGE10["STAGE 10: Per-Field Tiles"]
        R10["🔴 10_create_per_field<br/>_tiffs.R"]
        R10_OUT["📦 field_tiles/{FIELD}<br/>/{DATE}.tif<br/>4-band per-field"]
    end
    
    subgraph STAGE20["STAGE 20: CI Extraction"]
        R20["🟢 20_ci_extraction<br/>_per_field.R"]
        R20_UTIL["[Utils]<br/>ci_extraction<br/>_utils.R"]
        R20_OUT1["📦 field_tiles_CI<br/>/{FIELD}/{DATE}.tif<br/>5-band + CI"]
        R20_OUT2["📦 combined_CI<br/>_data.rds<br/>(wide format)"]
    end
    
    subgraph STAGE30["STAGE 30: Growth Model"]
        R30["🟡 30_interpolate<br/>_growth_model.R"]
        R30_UTIL["[Utils]<br/>growth_model_utils.R"]
        R30_OUT["📦 All_pivots_Cumulative<br/>_CI_quadrant_year_v2.rds<br/>(interpolated)"]
    end
    
    subgraph STAGE40["STAGE 40: Weekly Mosaic"]
        R40["🟣 40_mosaic_creation<br/>_per_field.R"]
        R40_UTIL["[Utils]<br/>mosaic_creation<br/>_utils.R"]
        R40_OUT["📦 weekly_mosaic<br/>/{FIELD}/week_WW.tif<br/>5-band MAX composite"]
    end
    
    subgraph STAGE80["STAGE 80: KPI Calculation<br/>(Client-Type Branching)"]
        R80["🟠 80_calculate_kpis.R<br/>(reads parameters)"]
        R80_SPLIT{"CLIENT_TYPE?"}
        R80_AGRO["[agronomic_support]<br/>80_utils_agronomic<br/>_support.R<br/>6 KPIs"]
        R80_CANE["[cane_supply]<br/>80_utils_cane<br/>_supply.R<br/>4 KPIs + harvest"]
        R80_OUT1["📦 field_analysis<br/>_week{WW}.xlsx"]
        R80_OUT2["📦 kpi_summary<br/>_tables_week{WW}.rds"]
    end
    
    subgraph STAGE90["STAGE 90: Report (Agronomic)"]
        R90["📄 90_CI_report_with_kpis<br/>_agronomic_support.Rmd"]
        R90_OUT["📦 SmartCane_Report<br/>_agronomic_support_*.docx<br/>(AURA/Chemba/etc)"]
    end
    
    subgraph STAGE91["STAGE 91: Report (Cane)"]
        R91["📄 91_CI_report_with_kpis<br/>_cane_supply.Rmd"]
        R91_OUT["📦 SmartCane_Report<br/>_cane_supply_*.docx<br/>(ANGATA)"]
    end
    
    subgraph EXEC["🔷 EXECUTION MODELS"]
        SOBIT["SOBIT Server<br/>(Production)"]
        SOBIT_EXEC["Laravel Job Queue<br/>→ Shell Wrappers<br/>→ Async Execution"]
        
        DEVLAP["Dev Laptop<br/>(Development)"]
        DEVLAP_EXEC["PowerShell<br/>→ Direct Rscript/python<br/>→ Sync Execution"]
    end
    
    %% ===== CONNECTIONS =====
    API --> PY
    GJ --> PY
    PY --> PY_OUT
    CONFIG --> PY
    
    PY_OUT --> R10
    GJ --> R10
    CONFIG --> R10
    R10 --> R10_OUT
    
    R10_OUT --> R20
    GJ --> R20
    CONFIG --> R20
    R20 --> R20_UTIL
    R20 --> R20_OUT1
    R20 --> R20_OUT2
    
    R20_OUT2 --> R30
    HV --> R30
    CONFIG --> R30
    R30 --> R30_UTIL
    R30 --> R30_OUT
    
    R20_OUT1 --> R40
    GJ --> R40
    CONFIG --> R40
    R40 --> R40_UTIL
    R40 --> R40_OUT
    
    R40_OUT --> R80
    R30_OUT --> R80
    GJ --> R80
    HV --> R80
    CONFIG --> R80
    R80 --> R80_SPLIT
    
    R80_SPLIT -->|PROJECT maps to<br/>agronomic_support| R80_AGRO
    R80_SPLIT -->|PROJECT maps to<br/>cane_supply| R80_CANE
    
    R80_AGRO --> R80_OUT1
    R80_AGRO --> R80_OUT2
    R80_CANE --> R80_OUT1
    R80_CANE --> R80_OUT2
    
    R80_OUT2 --> R90
    R40_OUT --> R90
    GJ --> R90
    R90 --> R90_OUT
    
    R80_OUT2 --> R91
    R40_OUT --> R91
    GJ --> R91
    R91 --> R91_OUT
    
    R90 -.->|Both execution| SOBIT
    R90 -.->|models support| DEVLAP
    R91 -.->|all stages| SOBIT
    R91 -.->|end-to-end| DEVLAP
    
    SOBIT --> SOBIT_EXEC
    DEVLAP --> DEVLAP_EXEC
    
    %% ===== STYLING =====
    classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
    classDef python fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
    classDef stage_r fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
    classDef branch fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000
    classDef util fill:#ede7f6,stroke:#512da8,stroke-width:1.5px,color:#000
    classDef output fill:#f1f8e9,stroke:#558b2f,stroke-width:2px,color:#000
    classDef exec fill:#e0f2f1,stroke:#00695c,stroke-width:2px,color:#000
    
    class INPUTS input
    class STAGE00 python
    class STAGE10,STAGE20,STAGE30,STAGE40,STAGE80,STAGE90,STAGE91 stage_r
    class R80_SPLIT,R80_AGRO,R80_CANE branch
    class R20_UTIL,R30_UTIL,R40_UTIL util
    class PY_OUT,R10_OUT,R20_OUT1,R20_OUT2,R30_OUT,R40_OUT,R80_OUT1,R80_OUT2,R90_OUT,R91_OUT output
    class EXEC,SOBIT,DEVLAP exec

Key Diagram Features

Color Coding:

  • 🔵 Blue = External inputs (API, files)
  • 🟠 Orange = Python stage
  • 🟢 Green = R processing stages
  • 🔴 Red/Pink = Branching logic (client types)
  • 🟡 Yellow = Utility functions
  • 🟢 Light Green = Data outputs
  • 🟦 Teal = Execution models

Critical Paths:

  1. Unified Stages (0040): All projects run identically
  2. Branching Point (Stage 80): parameters_project.R determines client type → sources appropriate utilities → renders appropriate report (90 or 91)
  3. Execution Models: Both SOBIT and Dev Laptop can run all stages; differ in orchestration

Client Type Routing:

  • PROJECT="angata" → CLIENT_TYPE="cane_supply" → Stage 91 report
  • PROJECT="aura"/"chemba" → CLIENT_TYPE="agronomic_support" → Stage 90 report

Data Processing Pipeline

Stage 1: Satellite Data Acquisition (Python)

  • Script: python_app/01_planet_download.py
  • Inputs: API credentials, field boundaries (GeoJSON), date range
  • Outputs: Daily merged GeoTIFFs
  • File Location: laravel_app/storage/app/{project}/merged_tif/
  • File Format: YYYY-MM-DD.tif (4 bands: Red, Green, Blue, NIR, uint16)
  • Processing:
    • Downloads from Sentinel Hub BYOC collection
    • Applies cloud masking (UDM1 band)
    • Merges tiles into daily mosaics
    • Stores at 3m resolution

Stage 2: Canopy Index (CI) Extraction

  • Script: r_app/02_ci_extraction.R
  • Utility Functions: ci_extraction_utils.R (handles tile detection, RDS I/O)
  • Inputs: Daily GeoTIFFs, field boundaries (GeoJSON)
  • Outputs:
    • Daily extractions (RDS): Data/extracted_ci/daily_vals/extracted_{date}_{suffix}.rds
    • Cumulative dataset (RDS): Data/extracted_ci/cumulative_vals/combined_CI_data.rds
  • File Format:
    • Daily: Per-field statistics (mean CI, count, notNA pixels)
    • Cumulative: Wide format with fields as rows, dates as columns
  • Processing:
    • Calculates CI = (NIR / Green) - 1
    • Extracts stats per field using field geometry
    • Handles missing pixels (clouds → NA values)
    • Supports both full rasters and tile-based extraction
  • Key Parameters:
    • CI formula: (NIR / Green) - 1
    • Min valid pixels: 100 per field
    • Cloud masking: UDM1 != 0

Stage 3: Growth Model Interpolation

  • Script: r_app/03_interpolate_growth_model.R
  • Utility Functions: growth_model_utils.R (interpolation, seasonal grouping)
  • Inputs:
    • Combined CI data (RDS from Stage 2)
    • Harvest data with season dates (Excel)
  • Outputs: Interpolated growth model (RDS)
  • File Location: Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
  • File Format: Long-format data frame with columns:
    • Date, DOY (Day of Year), field, subField, value, season
    • CI_per_day, cumulative_CI, FitData (interpolated indicator)
  • Processing:
    • Filters CI by season dates
    • Linear interpolation across gaps: approxfun()
    • Calculates daily changes and cumulative sums
    • Groups by field and season year
  • Key Calculations:
    • CI_per_day = today's CI - yesterday's CI
    • cumulative_CI = rolling sum of daily CI

Stage 4: Weekly Mosaic Creation

  • Script: r_app/04_mosaic_creation.R
  • Utility Functions: mosaic_creation_utils.R, mosaic_creation_tile_utils.R
  • Inputs:
    • Daily VRTs or GeoTIFFs from Stage 1
    • Field boundaries
  • Outputs: Weekly composite mosaic (GeoTIFF)
  • File Location: weekly_mosaic/week_{WW}_{YYYY}.tif
  • File Format: 5-band GeoTIFF (R, G, B, NIR, CI), same spatial extent as daily images
  • Processing:
    • Assesses cloud coverage per daily image
    • Selects images with acceptable cloud coverage (<45%)
    • Composites using MAX function (retains highest CI value)
    • Outputs single weekly composite
  • Key Parameters:
    • Cloud threshold (strict): <5% missing pixels
    • Cloud threshold (relaxed): <45% missing pixels
    • Composite function: MAX across selected images

Stage 5: Field Analysis & KPI Calculation

  • Script: r_app/09_field_analysis_weekly.R or 09b_field_analysis_weekly.R (parallel version)
  • Utility Functions: field_analysis_utils.R, tile extraction functions
  • Inputs:
    • Current week mosaic (GeoTIFF)
    • Previous week mosaic (GeoTIFF)
    • Interpolated growth model (RDS)
    • Field boundaries (GeoJSON)
    • Harvest data (Excel)
  • Outputs:
    • Excel file: reports/{project}_field_analysis_week{WW}.xlsx
    • RDS summary data: reports/kpis/{project}_kpi_summary_tables_week{WW}.rds
  • File Format (Excel):
    • Sheet 1: Field-level data with CI metrics, phase, status triggers
    • Sheet 2: Summary statistics (monitored area, cloud coverage, phase distribution)
  • Processing (per field):
    • Extracts CI from current and previous week mosaics
    • Calculates field-level statistics: mean, std dev, CV (coefficient of variation)
    • Assigns growth phase based on field age (Germination, Tillering, Grand Growth, Maturation)
    • Detects status triggers (rapid growth, disease signals, weed pressure, harvest imminence)
    • Assesses cloud coverage per field
    • Parallel processing using furrr for 1000+ fields
  • Key Calculations:
    • Uniformity (CV): std_dev / mean, thresholds: <0.15 excellent, <0.25 good
    • Change: (current_mean - previous_mean) / previous_mean
    • Phase age: weeks since planting (from harvest.xlsx season_start)
    • Cloud coverage %: (non-NA pixels / total pixels in field) * 100
  • Status Triggers (non-exclusive):
    • Germination Started: 10% of field CI > 2.0
    • Rapid Growth: CI increase > 0.5 units week-over-week
    • Slow Growth: CI increase < 0.1 units week-over-week
    • Non-Uniform Growth: CV > 0.25 (field heterogeneity)
    • Weed Pressure: Rapid increase (>2.0 CI/week) with moderate area (<25%)
    • Harvest Imminence: Age > 240 days + CI plateau detected

Stage 6: Report Generation

  • Script: r_app/10_CI_report_with_kpis_simple.Rmd (RMarkdown)
  • Utility Functions: report_utils.R (doc building, table formatting)
  • Inputs:
    • Weekly mosaics (GeoTIFFs)
    • KPI data and field analysis (RDS)
    • Field boundaries, project config
  • Outputs:
    • Word document (PRIMARY OUTPUT): reports/SmartCane_Report_week{WW}_{YYYY}.docx
    • HTML version (optional): reports/SmartCane_Report_week{WW}_{YYYY}.html
  • Report Contents:
    • Executive summary (KPI overview, monitored area, cloud coverage)
    • Phase distribution tables and visualizations
    • Status trigger summary (fields with active triggers)
    • Field-by-field detail pages with CI metrics
    • Interpretation guides for agronomic thresholds
  • Report Generation Technology:
    • RMarkdown (.Rmd) rendered to Word via officer and flextable packages
    • Tables with automatic width/height fitting
    • Column interpretations embedded in reports
    • Areas reported in both hectares and acres

File Storage Structure

All data persists to the file system. No database writes occur during analysis—only reads for metadata.

laravel_app/storage/app/{project}/
├── Data/
│   ├── pivot.geojson                    # Field boundaries (read-only)
│   ├── pivot_2.geojson                  # ESA variant with extra fields
│   ├── harvest.xlsx                     # Season dates & yield data (read-only)
│   ├── vrt/                             # Virtual raster files (daily VRTs)
│   │   └── YYYY-MM-DD.vrt
│   ├── extracted_ci/
│   │   ├── daily_vals/
│   │   │   └── extracted_YYYY-MM-DD_{suffix}.rds    # Daily field stats
│   │   └── cumulative_vals/
│   │       ├── combined_CI_data.rds                  # Cumulative CI (wide)
│   │       └── All_pivots_Cumulative_CI_quadrant_year_v2.rds  # Interpolated
│   └── daily_tiles_split/               # (Optional) Per-field tile processing
│       ├── per_field/
│       │   └── YYYY-MM-DD/                  # Date-specific folders
│       │       └── {FIELD}_{YYYY-MM-DD}.tif # Per-field daily
│
├── merged_tif/                          # Raw daily satellite images (Stage 1 output)
│   └── YYYY-MM-DD.tif                   # 4 bands: R, G, B, NIR
│
├── merged_final_tif/                    # (Optional) Processed daily images
│   └── YYYY-MM-DD.tif                   # 5 bands: R, G, B, NIR, CI
│
├── weekly_mosaic/                       # Weekly composite mosaics (Stage 4 output)
│   └── week_WW_YYYY.tif                 # 5 bands, ISO week numbering
│
└── reports/
    ├── SmartCane_Report_week{WW}_{YYYY}.docx    # PRIMARY OUTPUT (Stage 6)
    ├── SmartCane_Report_week{WW}_{YYYY}.html    # Alternative format
    ├── {project}_field_analysis_week{WW}.xlsx   # PRIMARY OUTPUT (Stage 5)
    ├── {project}_harvest_predictions_week{WW}.xlsx  # Harvest tracking
    ├── {project}_cloud_coverage_week{WW}.rds    # Per-field cloud stats
    ├── {project}_kpi_summary_tables_week{WW}.rds    # Summary data (consumed by reports)
    └── kpis/
        └── week_WW_YYYY/
            └── *.csv                            # Individual KPI exports

Data Types by File

File Extension Purpose Stage Example Files
.tif Geospatial raster imagery 1, 4 YYYY-MM-DD.tif, week_41_2025.tif
.vrt Virtual raster (pointer to TIFFs) 2 YYYY-MM-DD.vrt
.rds R serialized data (binary format) 2, 3, 5, 6 combined_CI_data.rds, kpi_results_week41.rds
.geojson Field boundaries (read-only) Input pivot.geojson
.xlsx Excel reports & harvest data 5, 6 (output), Input (harvest) field_analysis_week41.xlsx
.docx Word reports (final output) 6 SmartCane_Report_week41_2025.docx
.html HTML reports (alternative) 6 SmartCane_Report_week41_2025.html
.csv Summary tables (for external use) 5, 6 field_details.csv, kpi_summary.csv

Script Dependency Map

01_create_master_grid_and_split_tiffs.R (Optional)
    └→ [Utility] parameters_project.R

02_ci_extraction.R
    ├→ [Utility] parameters_project.R
    └→ [Utility] ci_extraction_utils.R
        └ Functions: find_satellite_images(), process_satellite_images(), 
                     process_ci_values(), process_ci_values_from_tiles()

03_interpolate_growth_model.R
    ├→ [Utility] parameters_project.R
    └→ [Utility] growth_model_utils.R
        └ Functions: load_combined_ci_data(), generate_interpolated_ci_data(),
                     calculate_growth_metrics(), save_growth_model()

04_mosaic_creation.R
    ├→ [Utility] parameters_project.R
    └→ [Utility] mosaic_creation_utils.R
        └ Functions: create_weekly_mosaic_from_tiles(), save_mosaic(),
                     assess_cloud_coverage()

09_field_analysis_weekly.R  (or 09b_field_analysis_weekly.R - parallel version)
    ├→ [Utility] parameters_project.R
    ├→ [Utility] field_analysis_utils.R
    └→ Outputs: Excel files, RDS summary files
        └ Functions: load_ci_data(), analyze_field_stats(),
                     assign_growth_phase(), detect_triggers(),
                     export_to_excel()

10_CI_report_with_kpis_simple.Rmd  (RMarkdown → rendered to .docx/.html)
    ├→ [Utility] parameters_project.R
    ├→ [Utility] report_utils.R
    └→ [Packages] officer, flextable
        └ Functions: body_add_flextable(), add_paragraph(), 
                     officer::read_docx(), save_docx()

Utility Files Description

  • parameters_project.R: Loads project configuration (paths, field boundaries, harvest data, project metadata)
  • ci_extraction_utils.R: CI calculation, field masking, RDS I/O for daily & cumulative CI data
  • growth_model_utils.R: Linear interpolation, seasonal grouping, daily metrics calculation
  • mosaic_creation_utils.R: Weekly mosaic compositing, cloud assessment, raster masking
  • field_analysis_utils.R: Per-field statistics, phase assignment, trigger detection, Excel export
  • report_utils.R: RMarkdown helpers, table formatting, Word document building via officer package

Data Type Reference

RDS (R Data Serialization)

RDS files store R data objects in binary format. They preserve data types, dimensions, and structure perfectly. Key RDS files in the pipeline:

File Structure Rows Columns Use
combined_CI_data.rds Data frame (wide format) # fields # dates All-time CI by field
All_pivots_Cumulative_CI_quadrant_year_v2.rds Data frame (long format) ~1M+ rows 11 columns Interpolated daily CI, used for yield models
kpi_summary_tables_week{WW}.rds List of data frames varies Field KPIs, phase dist., triggers
cloud_coverage_week{WW}.rds Data frame # fields 4 columns Per-field cloud %, category

Excel (.xlsx)

Primary output format for stakeholder consumption:

Sheet Content Rows Columns Key Data
Field Data Field-by-field analysis # fields ~15 CI mean/std, phase, status, cloud%
Summary Farm-wide statistics 10-20 3 Monitored area (ha/acres), cloud dist., phases

Word (.docx)

Executive report format via RMarkdown → officer:

  • Title page with metadata (project, week, date, total fields, acreage)
  • Executive summary with KPIs
  • Phase analysis section with distribution tables
  • Status trigger summary
  • Field-by-field detail pages
  • Interpretation guides

Key Calculations & Thresholds

Canopy Index (CI)

CI = (NIR / Green) - 1

Range: -1 to +∞
Interpretation:
  CI < 0      → Non-vegetated (water, bare soil)
  0 < CI < 1  → Sparse vegetation (early growth)
  1 < CI < 2  → Moderate vegetation
  CI > 2      → Dense vegetation (mature crop)

Growth Phase Assignment (Age-Based)

Based on weeks since planting (season_start from harvest.xlsx):

Phase Age Range Characteristics
Germination 0-6 weeks Variable emergence, low CI
Tillering 6-18 weeks Shoot development, increasing CI
Grand Growth 18-35 weeks Peak growth, high CI accumulation
Maturation 35+ weeks Sugar accumulation, plateau or decline

Field Uniformity (Coefficient of Variation)

CV = std_dev / mean

Interpretation:
  CV < 0.15   → Excellent uniformity
  CV < 0.25   → Good uniformity
  CV < 0.35   → Moderate uniformity
  CV ≥ 0.35   → Poor uniformity (management attention needed)

Cloud Coverage Classification (Per-Field)

cloud_pct = (non_NA_pixels / total_pixels) * 100

Categories:
  ≥99.5%      → Clear view (usable for analysis)
  0-99.5%     → Partial coverage (biased estimates)
  0%          → No image available (excluded from analysis)

Status Triggers (Non-Exclusive)

Fields can have multiple simultaneous triggers:

Trigger Detection Method Data Used
Germination Started 10% of field CI > 2.0 Current week CI extraction
Rapid Growth Week-over-week increase > 0.5 CI units Mosaic-based extraction
Slow Growth Week-over-week increase < 0.1 CI units Mosaic-based extraction
Non-Uniform CV > 0.25 Spatial stats per field
Weed Pressure Rapid increase (>2.0 CI/week) + area <25% Spatial clustering analysis
Harvest Imminence Age > 240 days + CI plateau Temporal analysis, phase assignment

Processing Configuration & Parameters

All parameters are configurable via command-line arguments or environment variables:

Download Stage (Python)

  • DATE: End date for download (YYYY-MM-DD), default: today
  • DAYS: Days lookback, default: 7
  • resolution: Output resolution in meters, default: 3
  • max_threads: Concurrent download threads, default: 15
  • Grid split: (5, 5) bounding boxes (hardcoded)

CI Extraction Stage (R)

  • end_date: End date (YYYY-MM-DD)
  • offset: Days lookback (default: 7)
  • project_dir: Project directory name (required)
  • data_source: Source folder (merged_tif or merged_final_tif)
  • Auto-detection: If daily_tiles_split/ exists, uses tile-based processing

Mosaic Creation Stage (R)

  • end_date: End date
  • offset: Days lookback
  • project_dir: Project directory
  • file_name: Custom output filename (optional)
  • Cloud thresholds: 5% (strict), 45% (relaxed) - hardcoded

Field Analysis Stage (R)

  • end_date: End date
  • project_dir: Project directory
  • Parallel workers: Auto-detected via future::plan() or user-configurable
  • Thresholds: CV, change, weed detection - configurable in code

Database Usage

The system does NOT write to the database during analysis. Database tables (project_reports, project_mosaics, project_mailings) are maintained by the Laravel application for:

  • Report metadata tracking
  • Email delivery history
  • Report version control

File system is the single source of truth for all analysis data.

FieldBoundaries --> KPIScript
HarvestData --> KPIScript
InterpolatedModel --> KPIScript
KPIScript --> KPI1
KPIScript --> KPI2
KPIScript --> KPI3
KPIScript --> KPI4
KPIScript --> KPI5
KPIScript --> KPI6
KPI1 & KPI2 & KPI3 & KPI4 & KPI5 & KPI6 --> KPIParams
KPIParams --> KPIResults

WeeklyMosaic --> ReportScript
KPIResults --> ReportScript
FieldBoundaries --> ReportScript
ReportScript --> Visualizations
Visualizations --> FinalReport

FinalReport --> EmailDelivery
FinalReport --> WebDashboard

Laravel --> ShellScripts
ShellScripts -.->|Triggers| Download
ShellScripts -.->|Triggers| CIScript
ShellScripts -.->|Triggers| GrowthScript
ShellScripts -.->|Triggers| MosaicScript
ShellScripts -.->|Triggers| KPIScript
ShellScripts -.->|Triggers| ReportScript

%% ===== STYLING =====
style INPUTS fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style DOWNLOAD fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style CI_EXTRACTION fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style GROWTH_MODEL fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style WEEKLY_MOSAIC fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style KPI_CALC fill:#e0f2f1,stroke:#00796b,stroke-width:2px
style REPORTING fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style OUTPUTS fill:#ffebee,stroke:#c62828,stroke-width:2px

style DailyMosaics fill:#ffccbc,stroke:#333,stroke-width:1px
style CombinedCI fill:#ffccbc,stroke:#333,stroke-width:1px
style InterpolatedModel fill:#ffccbc,stroke:#333,stroke-width:1px
style WeeklyMosaic fill:#ffccbc,stroke:#333,stroke-width:1px
style KPIResults fill:#ffccbc,stroke:#333,stroke-width:1px
style FinalReport fill:#ffccbc,stroke:#333,stroke-width:1px

### Overall System Architecture

This diagram provides a high-level overview of the complete SmartCane system, showing how major components interact. It focuses on the system boundaries and main data flows between the Python API Downloader, R Processing Engine, Laravel Web App, and data storage components. This view helps understand how the system works as a whole.

```mermaid
%% Overall System Architecture
graph TD
    A["fa:fa-satellite External Satellite Data Providers API"] --> PyDL["fa:fa-download Python API Downloader"];
    C["fa:fa-users Users: Farm Data Input e.g., GeoJSON, Excel"] --> D{"fa:fa-laptop-code Laravel Web App"};

    subgraph SmartCane System
        PyDL --> G["fa:fa-folder-open File System: Raw Satellite Imagery, Rasters, RDS, Reports, Boundaries"];
        E["fa:fa-cogs R Processing Engine"] -- Reads --> G;
        E -- Writes --> G;
        
        D -- Manages/Triggers --> F["fa:fa-terminal Shell Script Orchestration"];
        F -- Executes --> PyDL;
        F -- Executes --> E;
        
        D -- Manages/Accesses --> G;
        D -- Reads/Writes --> H["fa:fa-database Database: Project Metadata, Users, Schedules"];

        E -- Generates --> I["fa:fa-file-alt Agronomic Reports: DOCX, HTML"];
        D -- Accesses/Delivers --> I;
    end

    D --> J["fa:fa-desktop Users: Web Interface (future)"];
    I -- Via Email (SMTP) --> K["fa:fa-envelope Users: Email Reports"];

    style E fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:2px
    style PyDL fill:#ffdd57,stroke:#333,stroke-width:2px

R Processing Engine Detail

This diagram zooms in on the R Processing Engine subsystem, detailing the internal components and data flow. It shows how raw satellite imagery and field data progress through various R scripts to produce crop indices and reports. The diagram highlights the data transformation pipeline within this analytical core of the SmartCane system.

%% R Processing Engine Detail
graph TD
    subgraph R Processing Engine

        direction TB

        subgraph Inputs
            SatelliteImages["fa:fa-image Raw Satellite Imagery"]
            FieldBoundaries["fa:fa-map-marker-alt Field Boundaries .geojson"]
            HarvestData["fa:fa-file-excel Harvest Data .xlsx"]
            ProjectParams["fa:fa-file-code Project Parameters .R"]
        end

        subgraph Core R Scripts & Processes
            ParamConfig("fa:fa-cogs parameters_project.R")
            MosaicScript("fa:fa-images mosaic_creation.R")
            CIExtractionScript("fa:fa-microscope ci_extraction.R")
            ReportUtils("fa:fa-tools executive_report_utils.R")
            DashboardRmd("fa:fa-tachometer-alt CI_report_dashboard_planet_enhanced.Rmd")
            SummaryRmd("fa:fa-list-alt CI_report_executive_summary.Rmd")
        end

        subgraph Outputs
            WeeklyMosaics["fa:fa-file-image Weekly Mosaics .tif"]
            CIDataRDS["fa:fa-database CI Data .rds"]
            CIRasters["fa:fa-layer-group CI Rasters .tif"]
            DashboardReport["fa:fa-chart-bar Dashboard Report .docx/.html"]
            SummaryReport["fa:fa-file-invoice Executive Summary .docx/.html"]
        end

        %% Data Flow
        ProjectParams --> ParamConfig;

        SatelliteImages --> MosaicScript;
        FieldBoundaries --> MosaicScript;
        ParamConfig --> MosaicScript;
        MosaicScript --> WeeklyMosaics;

        WeeklyMosaics --> CIExtractionScript;
        FieldBoundaries --> CIExtractionScript;
        ParamConfig --> CIExtractionScript;
        CIExtractionScript --> CIDataRDS;
        CIExtractionScript --> CIRasters;

        CIDataRDS --> ReportUtils;
        CIRasters --> ReportUtils;
        HarvestData --> ReportUtils;
        ParamConfig --> ReportUtils;

        ReportUtils --> DashboardRmd;
        ReportUtils --> SummaryRmd;
        ParamConfig --> DashboardRmd;
        ParamConfig --> SummaryRmd;

        DashboardRmd --> DashboardReport;
        SummaryRmd --> SummaryReport;

    end

    ShellOrchestration["fa:fa-terminal Shell Scripts e.g., build_mosaic.sh, build_report.sh"] -->|Triggers| R_Processing_Engine["fa:fa-cogs R Processing Engine"]

    style R_Processing_Engine fill:#f9f,stroke:#333,stroke-width:2px
    style Inputs fill:#ccf,stroke:#333,stroke-width:1px
    style Outputs fill:#cfc,stroke:#333,stroke-width:1px
    style Core_R_Scripts_Processes fill:#ffc,stroke:#333,stroke-width:1px

Python API Downloader Detail

This diagram focuses on the Python API Downloader subsystem, showing its internal components and workflow. It illustrates how API credentials, field boundaries, and other inputs are processed through various Python functions to download, process, and prepare satellite imagery. This view reveals the technical implementation details of the data acquisition layer.

%% Python API Downloader Detail
graph TD
    subgraph Python API Downloader

        direction TB

        subgraph Inputs_Py [Inputs]
            APICreds["fa:fa-key API Credentials (SH_CLIENT_ID, SH_CLIENT_SECRET)"]
            DateRangeParams["fa:fa-calendar-alt Date Range Parameters (days_needed, specific_date)"]
            GeoJSONInput["fa:fa-map-marker-alt Field Boundaries (pivot.geojson)"]
            ProjectConfig["fa:fa-cogs Project Configuration (project_name, paths)"]
            EvalScripts["fa:fa-file-code Evalscripts (JS for cloud masking & band selection)"]
        end

        subgraph Core_Python_Logic_Py [Core Python Logic & Libraries]
            SetupConfig["fa:fa-cog SentinelHubConfig & BYOC Definition"]
            DateSlotGen["fa:fa-calendar-check Date Slot Generation (slots)"]
            GeoProcessing["fa:fa-map GeoJSON Parsing & BBox Splitting (geopandas, BBoxSplitter)"]
            AvailabilityCheck["fa:fa-search-location Image Availability Check (SentinelHubCatalog)"]
            RequestHandler["fa:fa-paper-plane Request Generation (SentinelHubRequest, get_true_color_request_day)"]
            DownloadClient["fa:fa-cloud-download-alt Image Download (SentinelHubDownloadClient, download_function)"]
            MergeUtility["fa:fa-object-group Tile Merging (gdal.BuildVRT, gdal.Translate, merge_files)"]
            CleanupUtility["fa:fa-trash-alt Intermediate File Cleanup (empty_folders)"]
        end

        subgraph Outputs_Py [Outputs]
            RawSatImages["fa:fa-file-image Raw Downloaded Satellite Imagery Tiles (response.tiff in dated subfolders)"]
            MergedTifs["fa:fa-images Merged TIFs (merged_tif/{slot}.tif)"]
            VirtualRasters["fa:fa-layer-group Virtual Rasters (merged_virtual/merged{slot}.vrt)"]
            DownloadLogs["fa:fa-file-alt Console Output Logs (print statements)"]
        end
        
        ExternalSatAPI["fa:fa-satellite External Satellite Data Providers API (Planet via Sentinel Hub)"]

        %% Data Flow for Python Downloader
        APICreds --> SetupConfig;
        DateRangeParams --> DateSlotGen;
        GeoJSONInput --> GeoProcessing;
        ProjectConfig --> SetupConfig;
        ProjectConfig --> GeoProcessing;
        ProjectConfig --> MergeUtility;
        ProjectConfig --> CleanupUtility;
        EvalScripts --> RequestHandler;

        DateSlotGen -- Available Slots --> AvailabilityCheck;
        GeoProcessing -- BBox List --> AvailabilityCheck;
        SetupConfig --> AvailabilityCheck;
        AvailabilityCheck -- Filtered Slots & BBoxes --> RequestHandler;
        
        RequestHandler -- Download Requests --> DownloadClient;
        SetupConfig --> DownloadClient;
        DownloadClient -- Downloads Data From --> ExternalSatAPI;
        ExternalSatAPI -- Returns Image Data --> DownloadClient;
        DownloadClient -- Writes --> RawSatImages;
        DownloadClient -- Generates --> DownloadLogs;
        
        RawSatImages --> MergeUtility;
        MergeUtility -- Writes --> MergedTifs;
        MergeUtility -- Writes --> VirtualRasters;
        
    end

    ShellOrchestratorPy["fa:fa-terminal Shell Scripts (e.g., runpython.sh triggering planet_download.ipynb)"] -->|Triggers| Python_API_Downloader["fa:fa-download Python API Downloader"];
    
    style Python_API_Downloader fill:#ffdd57,stroke:#333,stroke-width:2px
    style Inputs_Py fill:#cdeeff,stroke:#333,stroke-width:1px
    style Outputs_Py fill:#d4efdf,stroke:#333,stroke-width:1px
    style Core_Python_Logic_Py fill:#fff5cc,stroke:#333,stroke-width:1px
    style ExternalSatAPI fill:#f5b7b1,stroke:#333,stroke-width:2px

SmartCane Engine Integration Diagram

This diagram illustrates the integration of Python and R components within the SmartCane Engine. Unlike the first diagram that shows the overall system, this one specifically focuses on how the two processing components interact with each other and the rest of the system. It emphasizes the orchestration layer and data flows between the core processing components and external systems.

%% SmartCane Engine Integration
graph TD
    %% External Systems & Users
    Users_DataInput["fa:fa-user Users: Farm Data Input (GeoJSON, Excel, etc.)"] --> Laravel_WebApp;
    ExternalSatAPI["fa:fa-satellite External Satellite Data Providers API"];

    %% Main Application Components
    Laravel_WebApp["fa:fa-globe Laravel Web App (Frontend & Control Plane)"];
    Shell_Orchestration["fa:fa-terminal Shell Script Orchestration (e.g., runcane.sh, runpython.sh, build_mosaic.sh)"];      subgraph SmartCane_Engine ["SmartCane Engine (Data Processing Core)"]
        direction TB
        Python_Downloader["fa:fa-download Python API Downloader"];
        R_Engine["fa:fa-chart-line R Processing Engine"];
    end
    %% Data Storage
    FileSystem["fa:fa-folder File System (Raw Imagery, Rasters, RDS, Reports, Boundaries)"];
    Database["fa:fa-database Database (Project Metadata, Users, Schedules)"];

    %% User Outputs
    Users_WebView["fa:fa-desktop Users: Web Interface (future)"];
    Users_EmailReports["fa:fa-envelope Users: Email Reports (Agronomic Reports)"];
    AgronomicReports["fa:fa-file-alt Agronomic Reports (DOCX, HTML)"];

    %% --- Data Flows & Interactions ---

    %% Laravel to Orchestration & Engine
    Laravel_WebApp -- Manages/Triggers --> Shell_Orchestration;
    Shell_Orchestration -- Executes --> Python_Downloader;
    Shell_Orchestration -- Executes --> R_Engine;

    %% Python Downloader within Engine
    ExternalSatAPI -- Satellite Data --> Python_Downloader;
    Python_Downloader -- Writes Raw Data --> FileSystem;
    %% Inputs to Python (simplified for this view - details in Python-specific diagram)
    %% Laravel_WebApp -- Provides Config/Boundaries --> Python_Downloader; 


    %% R Engine within Engine
    %% Inputs to R (simplified - details in R-specific diagram)
    %% Laravel_WebApp -- Provides Config/Boundaries --> R_Engine;
    R_Engine -- Reads Processed Data/Imagery --> FileSystem;
    R_Engine -- Writes Derived Products --> FileSystem;
    R_Engine -- Generates --> AgronomicReports;

    %% Laravel interaction with Data Storage
    Laravel_WebApp -- Manages/Accesses --> FileSystem;
    Laravel_WebApp -- Reads/Writes --> Database;
    
    %% Output Delivery
    Laravel_WebApp --> Users_WebView;
    AgronomicReports --> Users_EmailReports;
    %% Assuming a mechanism like SMTP, potentially triggered by Laravel or R-Engine completion
    Laravel_WebApp -- Delivers/Displays --> AgronomicReports;


    %% Styling
    style SmartCane_Engine fill:#e6ffe6,stroke:#333,stroke-width:2px
    style Python_Downloader fill:#ffdd57,stroke:#333,stroke-width:2px
    style R_Engine fill:#f9f,stroke:#333,stroke-width:2px
    style Laravel_WebApp fill:#bbf,stroke:#333,stroke-width:2px
    style Shell_Orchestration fill:#f0ad4e,stroke:#333,stroke-width:2px
    style FileSystem fill:#d1e0e0,stroke:#333,stroke-width:1px
    style Database fill:#d1e0e0,stroke:#333,stroke-width:1px
    style ExternalSatAPI fill:#f5b7b1,stroke:#333,stroke-width:2px
    style AgronomicReports fill:#d4efdf,stroke:#333,stroke-width:1px

Future Directions

The SmartCane platform is poised for significant evolution, with several key enhancements and new capabilities planned to further empower users and expand its utility:

  • Advanced Management Dashboard: Development of a more comprehensive and interactive management dashboard to provide users with deeper insights and greater control over their operations.
  • Enhanced Yield Prediction Models: Improving the accuracy and granularity of yield predictions by incorporating more variables and advanced machine learning techniques.
  • Integrated Weather and Irrigation Advice: Leveraging weather forecast data and soil moisture information (potentially from new data sources) to provide precise irrigation scheduling and weather-related agronomic advice.
  • AI-Guided Agronomic Advice: Implementing sophisticated AI algorithms to analyze integrated data (satellite, weather, soil, farm practices) and offer tailored, actionable agronomic recommendations.
  • Automated Advice Generation: Developing capabilities for the system to automatically generate and disseminate critical advice and alerts to users based on real-time data analysis.
  • Expanded Data Source Integration:
    • Radar Data: Incorporating radar satellite imagery (e.g., Sentinel-1) for all-weather monitoring capabilities, particularly useful during cloudy seasons for assessing crop structure, soil moisture, and biomass.
    • IoT and Ground Sensors: Integrating data from in-field IoT devices and soil sensors for highly localized and continuous monitoring of environmental and soil conditions.
  • Client-Facing Portal: Exploration and potential development of a client-facing portal to allow end-users direct access to their data, dashboards, and reports, complementing the current internal management interface.

These future developments aim to transform SmartCane into an even more powerful decision support system, fostering sustainable and efficient agricultural practices.

Conclusion and Integration Summary

The SmartCane system architecture demonstrates a well-integrated solution that combines different technologies and subsystems to solve complex agricultural challenges. Here is a summary of how the key subsystems work together:

Subsystem Integration

  1. Data Flow Sequence

    • The Laravel Web App initiates the workflow and manages user interactions
    • Shell scripts orchestrate the execution sequence of the processing subsystems
    • The Python API Downloader acquires raw data from external sources
    • The R Processing Engine transforms this data into actionable insights
    • Results flow back to users through the web interface and email reports
  2. Technology Integration

    • Python + R: Different programming languages are leveraged for their respective strengths—Python for API communication and data acquisition, R for statistical analysis and report generation
    • Laravel + Processing Engine: Clear separation between web presentation layer and computational backend
    • File System + Database: Hybrid data storage approach with file system for imagery and reports, database for metadata and user information
  3. Key Integration Mechanisms

    • File System Bridge: The different subsystems primarily communicate through standardized file formats (GeoTIFF, GeoJSON, RDS, DOCX)
    • Shell Script Orchestration: Acts as the "glue" between subsystems, ensuring proper execution sequence and environment setup
    • Standardized Data Formats: Use of widely-accepted geospatial and data formats enables interoperability
  4. Extensibility and Scalability

    • The modular architecture allows for replacement or enhancement of individual components
    • The clear subsystem boundaries enable parallel development and testing
    • Standard interfaces simplify integration of new data sources, algorithms, or output methods

The SmartCane architecture balances complexity with maintainability by using well-established technologies and clear boundaries between subsystems. The separation of concerns between data acquisition, processing, and presentation layers ensures that changes in one area minimally impact others, while the consistent data flow pathways ensure that information moves smoothly through the system.