SmartCane/webapps/docs/system_architecture.md

950 lines
42 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!-- filepath: c:\Users\timon\Resilience BV\4020 SCane ESA DEMO - Documenten\General\4020 SCDEMO Team\4020 TechnicalData\WP3\smartcane\r_app\system_architecture.md -->
# SmartCane System Architecture - Python + R Pipeline & File-Based Processing
## 🗂️ Quick Navigation
**New Architecture Guides** (start here for complete system understanding):
- **[ARCHITECTURE_INTEGRATION_GUIDE.md](ARCHITECTURE_INTEGRATION_GUIDE.md)** — *Start here!* Integrates all dimensions: pipeline stages, client types, and execution models. Includes decision matrices and troubleshooting.
- **[ARCHITECTURE_DATA_FLOW.md](ARCHITECTURE_DATA_FLOW.md)** — Complete Stage 0091 data pipeline with transformations, file formats, and storage locations. High-level overview + stage-by-stage details.
- **[CLIENT_TYPE_ARCHITECTURE.md](CLIENT_TYPE_ARCHITECTURE.md)** — Explains how agronomic_support (AURA) and cane_supply (ANGATA) client types branch at Stage 80. KPI differences, report differences, configuration mapping.
- **[SOBIT_DEPLOYMENT.md](SOBIT_DEPLOYMENT.md)** — Production server deployment via Laravel job queue. Web UI, shell wrappers, job chaining, error handling, monitoring.
- **[DEV_LAPTOP_EXECUTION.md](DEV_LAPTOP_EXECUTION.md)** — Developer manual execution on Windows laptops. PowerShell commands, stage-by-stage workflows, configuration, troubleshooting.
---
## Overview
The SmartCane system is a file-based agricultural intelligence platform that processes satellite imagery through sequential Python and R scripts. Raw satellite imagery is downloaded via Planet API (Python), then flows through R processing stages (CI extraction, growth model interpolation, mosaic creation, KPI analysis, harvest detection) with outputs persisted as GeoTIFFs, RDS files, Excel sheets, and Word reports. Harvest monitoring is performed via ML-based harvest detection using LSTM models trained on historical CI sequences.
## Processing Pipeline Overview
```mermaid
%% SmartCane Processing Pipeline
graph LR
DL["🐍 Python Download"] --> TIFF["📦 Daily GeoTIFFs"]
TIFF --> CI["🟢 CI Extraction<br/>(RDS)"]
CI --> GM["🟡 Growth Model<br/>(RDS)"]
TIFF --> GM
CI --> CCI["📊 Cumulative<br/>CI Data"]
GM --> KPI["🔴 KPI Calculation"]
CCI -.-> KPI
KPI --> FA["📋 Field Analysis &<br/>Report Generation"]
FA --> OUT["📄 Excel + Word<br/>Outputs"]
style DL fill:#fff3e0
style TIFF fill:#e8f5e9
style CI fill:#e8f5e9
style GM fill:#e8f5e9
style CCI fill:#fff9c4
style KPI fill:#ffccbc
style FA fill:#e0f2f1
style OUT fill:#f1f8e9
```
## SmartCane Modern Architecture: Complete Pipeline with Client Types & Execution Models
```mermaid
%% SmartCane Modern Architecture
graph TD
subgraph INPUTS["🔹 INPUTS"]
API["🔑 Planet API<br/>Credentials"]
GJ["🗺️ Field Boundaries<br/>pivot.geojson"]
HV["📊 Harvest Data<br/>harvest.xlsx"]
CONFIG["⚙️ Configuration<br/>parameters_project.R"]
end
subgraph STAGE00["STAGE 00: Python Download"]
PY["🐍 00_download_8band<br/>_pu_optimized.py"]
PY_OUT["📦 merged_tif/{DATE}.tif<br/>4-band uint16<br/>(R,G,B,NIR)"]
end
subgraph STAGE10["STAGE 10: Per-Field Tiles"]
R10["🔴 10_create_per_field<br/>_tiffs.R"]
R10_OUT["📦 field_tiles/{FIELD}<br/>/{DATE}.tif<br/>4-band per-field"]
end
subgraph STAGE20["STAGE 20: CI Extraction"]
R20["🟢 20_ci_extraction<br/>_per_field.R"]
R20_UTIL["[Utils]<br/>ci_extraction<br/>_utils.R"]
R20_OUT1["📦 field_tiles_CI<br/>/{FIELD}/{DATE}.tif<br/>5-band + CI"]
R20_OUT2["📦 combined_CI<br/>_data.rds<br/>(wide format)"]
end
subgraph STAGE30["STAGE 30: Growth Model"]
R30["🟡 30_interpolate<br/>_growth_model.R"]
R30_UTIL["[Utils]<br/>growth_model_utils.R"]
R30_OUT["📦 All_pivots_Cumulative<br/>_CI_quadrant_year_v2.rds<br/>(interpolated)"]
end
subgraph STAGE40["STAGE 40: Weekly Mosaic"]
R40["🟣 40_mosaic_creation<br/>_per_field.R"]
R40_UTIL["[Utils]<br/>mosaic_creation<br/>_utils.R"]
R40_OUT["📦 weekly_mosaic<br/>/{FIELD}/week_WW.tif<br/>5-band MAX composite"]
end
subgraph STAGE80["STAGE 80: KPI Calculation<br/>(Client-Type Branching)"]
R80["🟠 80_calculate_kpis.R<br/>(reads parameters)"]
R80_SPLIT{"CLIENT_TYPE?"}
R80_AGRO["[agronomic_support]<br/>80_utils_agronomic<br/>_support.R<br/>6 KPIs"]
R80_CANE["[cane_supply]<br/>80_utils_cane<br/>_supply.R<br/>4 KPIs + harvest"]
R80_OUT1["📦 field_analysis<br/>_week{WW}.xlsx"]
R80_OUT2["📦 kpi_summary<br/>_tables_week{WW}.rds"]
end
subgraph STAGE90["STAGE 90: Report (Agronomic)"]
R90["📄 90_CI_report_with_kpis<br/>_agronomic_support.Rmd"]
R90_OUT["📦 SmartCane_Report<br/>_agronomic_support_*.docx<br/>(AURA/Chemba/etc)"]
end
subgraph STAGE91["STAGE 91: Report (Cane)"]
R91["📄 91_CI_report_with_kpis<br/>_cane_supply.Rmd"]
R91_OUT["📦 SmartCane_Report<br/>_cane_supply_*.docx<br/>(ANGATA)"]
end
subgraph EXEC["🔷 EXECUTION MODELS"]
SOBIT["SOBIT Server<br/>(Production)"]
SOBIT_EXEC["Laravel Job Queue<br/>→ Shell Wrappers<br/>→ Async Execution"]
DEVLAP["Dev Laptop<br/>(Development)"]
DEVLAP_EXEC["PowerShell<br/>→ Direct Rscript/python<br/>→ Sync Execution"]
end
%% ===== CONNECTIONS =====
API --> PY
GJ --> PY
PY --> PY_OUT
CONFIG --> PY
PY_OUT --> R10
GJ --> R10
CONFIG --> R10
R10 --> R10_OUT
R10_OUT --> R20
GJ --> R20
CONFIG --> R20
R20 --> R20_UTIL
R20 --> R20_OUT1
R20 --> R20_OUT2
R20_OUT2 --> R30
HV --> R30
CONFIG --> R30
R30 --> R30_UTIL
R30 --> R30_OUT
R20_OUT1 --> R40
GJ --> R40
CONFIG --> R40
R40 --> R40_UTIL
R40 --> R40_OUT
R40_OUT --> R80
R30_OUT --> R80
GJ --> R80
HV --> R80
CONFIG --> R80
R80 --> R80_SPLIT
R80_SPLIT -->|PROJECT maps to<br/>agronomic_support| R80_AGRO
R80_SPLIT -->|PROJECT maps to<br/>cane_supply| R80_CANE
R80_AGRO --> R80_OUT1
R80_AGRO --> R80_OUT2
R80_CANE --> R80_OUT1
R80_CANE --> R80_OUT2
R80_OUT2 --> R90
R40_OUT --> R90
GJ --> R90
R90 --> R90_OUT
R80_OUT2 --> R91
R40_OUT --> R91
GJ --> R91
R91 --> R91_OUT
R90 -.->|Both execution| SOBIT
R90 -.->|models support| DEVLAP
R91 -.->|all stages| SOBIT
R91 -.->|end-to-end| DEVLAP
SOBIT --> SOBIT_EXEC
DEVLAP --> DEVLAP_EXEC
%% ===== STYLING =====
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
classDef python fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
classDef stage_r fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
classDef branch fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000
classDef util fill:#ede7f6,stroke:#512da8,stroke-width:1.5px,color:#000
classDef output fill:#f1f8e9,stroke:#558b2f,stroke-width:2px,color:#000
classDef exec fill:#e0f2f1,stroke:#00695c,stroke-width:2px,color:#000
class INPUTS input
class STAGE00 python
class STAGE10,STAGE20,STAGE30,STAGE40,STAGE80,STAGE90,STAGE91 stage_r
class R80_SPLIT,R80_AGRO,R80_CANE branch
class R20_UTIL,R30_UTIL,R40_UTIL util
class PY_OUT,R10_OUT,R20_OUT1,R20_OUT2,R30_OUT,R40_OUT,R80_OUT1,R80_OUT2,R90_OUT,R91_OUT output
class EXEC,SOBIT,DEVLAP exec
```
---
### Key Diagram Features
**Color Coding**:
- 🔵 Blue = External inputs (API, files)
- 🟠 Orange = Python stage
- 🟢 Green = R processing stages
- 🔴 Red/Pink = Branching logic (client types)
- 🟡 Yellow = Utility functions
- 🟢 Light Green = Data outputs
- 🟦 Teal = Execution models
**Critical Paths**:
1. **Unified Stages (0040)**: All projects run identically
2. **Branching Point (Stage 80)**: `parameters_project.R` determines client type → sources appropriate utilities → renders appropriate report (90 or 91)
3. **Execution Models**: Both SOBIT and Dev Laptop can run all stages; differ in orchestration
**Client Type Routing**:
- PROJECT="angata" → CLIENT_TYPE="cane_supply" → Stage 91 report
- PROJECT="aura"/"chemba" → CLIENT_TYPE="agronomic_support" → Stage 90 report
## Data Processing Pipeline
### Stage 1: Satellite Data Acquisition (Python)
- **Script**: `python_app/01_planet_download.py`
- **Inputs**: API credentials, field boundaries (GeoJSON), date range
- **Outputs**: Daily merged GeoTIFFs
- **File Location**: `laravel_app/storage/app/{project}/merged_tif/`
- **File Format**: `YYYY-MM-DD.tif` (4 bands: Red, Green, Blue, NIR, uint16)
- **Processing**:
- Downloads from Sentinel Hub BYOC collection
- Applies cloud masking (UDM1 band)
- Merges tiles into daily mosaics
- Stores at 3m resolution
### Stage 2: Canopy Index (CI) Extraction
- **Script**: `r_app/02_ci_extraction.R`
- **Utility Functions**: `ci_extraction_utils.R` (handles tile detection, RDS I/O)
- **Inputs**: Daily GeoTIFFs, field boundaries (GeoJSON)
- **Outputs**:
- Daily extractions (RDS): `Data/extracted_ci/daily_vals/extracted_{date}_{suffix}.rds`
- Cumulative dataset (RDS): `Data/extracted_ci/cumulative_vals/combined_CI_data.rds`
- **File Format**:
- Daily: Per-field statistics (mean CI, count, notNA pixels)
- Cumulative: Wide format with fields as rows, dates as columns
- **Processing**:
- Calculates CI = (NIR / Green) - 1
- Extracts stats per field using field geometry
- Handles missing pixels (clouds → NA values)
- Supports both full rasters and tile-based extraction
- **Key Parameters**:
- CI formula: `(NIR / Green) - 1`
- Min valid pixels: 100 per field
- Cloud masking: UDM1 != 0
### Stage 3: Growth Model Interpolation
- **Script**: `r_app/03_interpolate_growth_model.R`
- **Utility Functions**: `growth_model_utils.R` (interpolation, seasonal grouping)
- **Inputs**:
- Combined CI data (RDS from Stage 2)
- Harvest data with season dates (Excel)
- **Outputs**: Interpolated growth model (RDS)
- **File Location**: `Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds`
- **File Format**: Long-format data frame with columns:
- `Date`, `DOY` (Day of Year), `field`, `subField`, `value`, `season`
- `CI_per_day`, `cumulative_CI`, `FitData` (interpolated indicator)
- **Processing**:
- Filters CI by season dates
- Linear interpolation across gaps: `approxfun()`
- Calculates daily changes and cumulative sums
- Groups by field and season year
- **Key Calculations**:
- `CI_per_day` = today's CI - yesterday's CI
- `cumulative_CI` = rolling sum of daily CI
### Stage 4: Weekly Mosaic Creation
- **Script**: `r_app/04_mosaic_creation.R`
- **Utility Functions**: `mosaic_creation_utils.R`, `mosaic_creation_tile_utils.R`
- **Inputs**:
- Daily VRTs or GeoTIFFs from Stage 1
- Field boundaries
- **Outputs**: Weekly composite mosaic (GeoTIFF)
- **File Location**: `weekly_mosaic/week_{WW}_{YYYY}.tif`
- **File Format**: 5-band GeoTIFF (R, G, B, NIR, CI), same spatial extent as daily images
- **Processing**:
- Assesses cloud coverage per daily image
- Selects images with acceptable cloud coverage (<45%)
- Composites using MAX function (retains highest CI value)
- Outputs single weekly composite
- **Key Parameters**:
- Cloud threshold (strict): <5% missing pixels
- Cloud threshold (relaxed): <45% missing pixels
- Composite function: MAX across selected images
### Stage 5: Field Analysis & KPI Calculation
- **Script**: `r_app/09_field_analysis_weekly.R` or `09b_field_analysis_weekly.R` (parallel version)
- **Utility Functions**: `field_analysis_utils.R`, tile extraction functions
- **Inputs**:
- Current week mosaic (GeoTIFF)
- Previous week mosaic (GeoTIFF)
- Interpolated growth model (RDS)
- Field boundaries (GeoJSON)
- Harvest data (Excel)
- **Outputs**:
- Excel file: `reports/{project}_field_analysis_week{WW}.xlsx`
- RDS summary data: `reports/kpis/{project}_kpi_summary_tables_week{WW}.rds`
- **File Format (Excel)**:
- Sheet 1: Field-level data with CI metrics, phase, status triggers
- Sheet 2: Summary statistics (monitored area, cloud coverage, phase distribution)
- **Processing** (per field):
- Extracts CI from current and previous week mosaics
- Calculates field-level statistics: mean, std dev, CV (coefficient of variation)
- Assigns growth phase based on field age (Germination, Tillering, Grand Growth, Maturation)
- Detects status triggers (rapid growth, disease signals, weed pressure, harvest imminence)
- Assesses cloud coverage per field
- Parallel processing using `furrr` for 1000+ fields
- **Key Calculations**:
- **Uniformity (CV)**: std_dev / mean, thresholds: <0.15 excellent, <0.25 good
- **Change**: (current_mean - previous_mean) / previous_mean
- **Phase age**: weeks since planting (from harvest.xlsx season_start)
- **Cloud coverage %**: (non-NA pixels / total pixels in field) * 100
- **Status Triggers** (non-exclusive):
- Germination Started: 10% of field CI > 2.0
- Rapid Growth: CI increase > 0.5 units week-over-week
- Slow Growth: CI increase < 0.1 units week-over-week
- Non-Uniform Growth: CV > 0.25 (field heterogeneity)
- Weed Pressure: Rapid increase (>2.0 CI/week) with moderate area (<25%)
- Harvest Imminence: Age > 240 days + CI plateau detected
### Stage 6: Report Generation
- **Script**: `r_app/10_CI_report_with_kpis_simple.Rmd` (RMarkdown)
- **Utility Functions**: `report_utils.R` (doc building, table formatting)
- **Inputs**:
- Weekly mosaics (GeoTIFFs)
- KPI data and field analysis (RDS)
- Field boundaries, project config
- **Outputs**:
- **Word document** (PRIMARY OUTPUT): `reports/SmartCane_Report_week{WW}_{YYYY}.docx`
- **HTML version** (optional): `reports/SmartCane_Report_week{WW}_{YYYY}.html`
- **Report Contents**:
- Executive summary (KPI overview, monitored area, cloud coverage)
- Phase distribution tables and visualizations
- Status trigger summary (fields with active triggers)
- Field-by-field detail pages with CI metrics
- Interpretation guides for agronomic thresholds
- **Report Generation Technology**:
- RMarkdown (`.Rmd`) rendered to Word via `officer` and `flextable` packages
- Tables with automatic width/height fitting
- Column interpretations embedded in reports
- Areas reported in both hectares and acres
---
## File Storage Structure
All data persists to the file system. No database writes occur during analysis—only reads for metadata.
```
laravel_app/storage/app/{project}/
├── Data/
│ ├── pivot.geojson # Field boundaries (read-only)
│ ├── pivot_2.geojson # ESA variant with extra fields
│ ├── harvest.xlsx # Season dates & yield data (read-only)
│ ├── vrt/ # Virtual raster files (daily VRTs)
│ │ └── YYYY-MM-DD.vrt
│ ├── extracted_ci/
│ │ ├── daily_vals/
│ │ │ └── extracted_YYYY-MM-DD_{suffix}.rds # Daily field stats
│ │ └── cumulative_vals/
│ │ ├── combined_CI_data.rds # Cumulative CI (wide)
│ │ └── All_pivots_Cumulative_CI_quadrant_year_v2.rds # Interpolated
│ └── daily_tiles_split/ # (Optional) Per-field tile processing
│ ├── per_field/
│ │ └── YYYY-MM-DD/ # Date-specific folders
│ │ └── {FIELD}_{YYYY-MM-DD}.tif # Per-field daily
├── merged_tif/ # Raw daily satellite images (Stage 1 output)
│ └── YYYY-MM-DD.tif # 4 bands: R, G, B, NIR
├── merged_final_tif/ # (Optional) Processed daily images
│ └── YYYY-MM-DD.tif # 5 bands: R, G, B, NIR, CI
├── weekly_mosaic/ # Weekly composite mosaics (Stage 4 output)
│ └── week_WW_YYYY.tif # 5 bands, ISO week numbering
└── reports/
├── SmartCane_Report_week{WW}_{YYYY}.docx # PRIMARY OUTPUT (Stage 6)
├── SmartCane_Report_week{WW}_{YYYY}.html # Alternative format
├── {project}_field_analysis_week{WW}.xlsx # PRIMARY OUTPUT (Stage 5)
├── {project}_harvest_predictions_week{WW}.xlsx # Harvest tracking
├── {project}_cloud_coverage_week{WW}.rds # Per-field cloud stats
├── {project}_kpi_summary_tables_week{WW}.rds # Summary data (consumed by reports)
└── kpis/
└── week_WW_YYYY/
└── *.csv # Individual KPI exports
```
### Data Types by File
| File Extension | Purpose | Stage | Example Files |
|---|---|---|---|
| `.tif` | Geospatial raster imagery | 1, 4 | `YYYY-MM-DD.tif`, `week_41_2025.tif` |
| `.vrt` | Virtual raster (pointer to TIFFs) | 2 | `YYYY-MM-DD.vrt` |
| `.rds` | R serialized data (binary format) | 2, 3, 5, 6 | `combined_CI_data.rds`, `kpi_results_week41.rds` |
| `.geojson` | Field boundaries (read-only) | Input | `pivot.geojson` |
| `.xlsx` | Excel reports & harvest data | 5, 6 (output), Input (harvest) | `field_analysis_week41.xlsx` |
| `.docx` | Word reports (final output) | 6 | `SmartCane_Report_week41_2025.docx` |
| `.html` | HTML reports (alternative) | 6 | `SmartCane_Report_week41_2025.html` |
| `.csv` | Summary tables (for external use) | 5, 6 | `field_details.csv`, `kpi_summary.csv` |
---
## Script Dependency Map
```
01_create_master_grid_and_split_tiffs.R (Optional)
└→ [Utility] parameters_project.R
02_ci_extraction.R
├→ [Utility] parameters_project.R
└→ [Utility] ci_extraction_utils.R
└ Functions: find_satellite_images(), process_satellite_images(),
process_ci_values(), process_ci_values_from_tiles()
03_interpolate_growth_model.R
├→ [Utility] parameters_project.R
└→ [Utility] growth_model_utils.R
└ Functions: load_combined_ci_data(), generate_interpolated_ci_data(),
calculate_growth_metrics(), save_growth_model()
04_mosaic_creation.R
├→ [Utility] parameters_project.R
└→ [Utility] mosaic_creation_utils.R
└ Functions: create_weekly_mosaic_from_tiles(), save_mosaic(),
assess_cloud_coverage()
09_field_analysis_weekly.R (or 09b_field_analysis_weekly.R - parallel version)
├→ [Utility] parameters_project.R
├→ [Utility] field_analysis_utils.R
└→ Outputs: Excel files, RDS summary files
└ Functions: load_ci_data(), analyze_field_stats(),
assign_growth_phase(), detect_triggers(),
export_to_excel()
10_CI_report_with_kpis_simple.Rmd (RMarkdown → rendered to .docx/.html)
├→ [Utility] parameters_project.R
├→ [Utility] report_utils.R
└→ [Packages] officer, flextable
└ Functions: body_add_flextable(), add_paragraph(),
officer::read_docx(), save_docx()
```
### Utility Files Description
- **`parameters_project.R`**: Loads project configuration (paths, field boundaries, harvest data, project metadata)
- **`ci_extraction_utils.R`**: CI calculation, field masking, RDS I/O for daily & cumulative CI data
- **`growth_model_utils.R`**: Linear interpolation, seasonal grouping, daily metrics calculation
- **`mosaic_creation_utils.R`**: Weekly mosaic compositing, cloud assessment, raster masking
- **`field_analysis_utils.R`**: Per-field statistics, phase assignment, trigger detection, Excel export
- **`report_utils.R`**: RMarkdown helpers, table formatting, Word document building via `officer` package
---
## Data Type Reference
### RDS (R Data Serialization)
RDS files store R data objects in binary format. They preserve data types, dimensions, and structure perfectly. Key RDS files in the pipeline:
| File | Structure | Rows | Columns | Use |
|---|---|---|---|---|
| `combined_CI_data.rds` | Data frame (wide format) | # fields | # dates | All-time CI by field |
| `All_pivots_Cumulative_CI_quadrant_year_v2.rds` | Data frame (long format) | ~1M+ rows | 11 columns | Interpolated daily CI, used for yield models |
| `kpi_summary_tables_week{WW}.rds` | List of data frames | — | varies | Field KPIs, phase dist., triggers |
| `cloud_coverage_week{WW}.rds` | Data frame | # fields | 4 columns | Per-field cloud %, category |
### Excel (.xlsx)
Primary output format for stakeholder consumption:
| Sheet | Content | Rows | Columns | Key Data |
|---|---|---|---|---|
| Field Data | Field-by-field analysis | # fields | ~15 | CI mean/std, phase, status, cloud% |
| Summary | Farm-wide statistics | 10-20 | 3 | Monitored area (ha/acres), cloud dist., phases |
### Word (.docx)
Executive report format via RMarkdown → `officer`:
- Title page with metadata (project, week, date, total fields, acreage)
- Executive summary with KPIs
- Phase analysis section with distribution tables
- Status trigger summary
- Field-by-field detail pages
- Interpretation guides
---
## Key Calculations & Thresholds
### Canopy Index (CI)
```
CI = (NIR / Green) - 1
Range: -1 to +∞
Interpretation:
CI < 0 → Non-vegetated (water, bare soil)
0 < CI < 1 → Sparse vegetation (early growth)
1 < CI < 2 → Moderate vegetation
CI > 2 → Dense vegetation (mature crop)
```
### Growth Phase Assignment (Age-Based)
Based on weeks since planting (`season_start` from harvest.xlsx):
| Phase | Age Range | Characteristics |
|---|---|---|
| Germination | 0-6 weeks | Variable emergence, low CI |
| Tillering | 6-18 weeks | Shoot development, increasing CI |
| Grand Growth | 18-35 weeks | Peak growth, high CI accumulation |
| Maturation | 35+ weeks | Sugar accumulation, plateau or decline |
### Field Uniformity (Coefficient of Variation)
```
CV = std_dev / mean
Interpretation:
CV < 0.15 → Excellent uniformity
CV < 0.25 → Good uniformity
CV < 0.35 → Moderate uniformity
CV ≥ 0.35 → Poor uniformity (management attention needed)
```
### Cloud Coverage Classification (Per-Field)
```
cloud_pct = (non_NA_pixels / total_pixels) * 100
Categories:
≥99.5% → Clear view (usable for analysis)
0-99.5% → Partial coverage (biased estimates)
0% → No image available (excluded from analysis)
```
### Status Triggers (Non-Exclusive)
Fields can have multiple simultaneous triggers:
| Trigger | Detection Method | Data Used |
|---|---|---|
| **Germination Started** | 10% of field CI > 2.0 | Current week CI extraction |
| **Rapid Growth** | Week-over-week increase > 0.5 CI units | Mosaic-based extraction |
| **Slow Growth** | Week-over-week increase < 0.1 CI units | Mosaic-based extraction |
| **Non-Uniform** | CV > 0.25 | Spatial stats per field |
| **Weed Pressure** | Rapid increase (>2.0 CI/week) + area <25% | Spatial clustering analysis |
| **Harvest Imminence** | Age > 240 days + CI plateau | Temporal analysis, phase assignment |
---
## Processing Configuration & Parameters
All parameters are configurable via command-line arguments or environment variables:
### Download Stage (Python)
- `DATE`: End date for download (YYYY-MM-DD), default: today
- `DAYS`: Days lookback, default: 7
- `resolution`: Output resolution in meters, default: 3
- `max_threads`: Concurrent download threads, default: 15
- Grid split: `(5, 5)` bounding boxes (hardcoded)
### CI Extraction Stage (R)
- `end_date`: End date (YYYY-MM-DD)
- `offset`: Days lookback (default: 7)
- `project_dir`: Project directory name (required)
- `data_source`: Source folder (merged_tif or merged_final_tif)
- Auto-detection: If `daily_tiles_split/` exists, uses tile-based processing
### Mosaic Creation Stage (R)
- `end_date`: End date
- `offset`: Days lookback
- `project_dir`: Project directory
- `file_name`: Custom output filename (optional)
- Cloud thresholds: 5% (strict), 45% (relaxed) - hardcoded
### Field Analysis Stage (R)
- `end_date`: End date
- `project_dir`: Project directory
- Parallel workers: Auto-detected via `future::plan()` or user-configurable
- Thresholds: CV, change, weed detection - configurable in code
---
## Database Usage
The system does NOT write to the database during analysis. Database tables (`project_reports`, `project_mosaics`, `project_mailings`) are maintained by the Laravel application for:
- Report metadata tracking
- Email delivery history
- Report version control
File system is the single source of truth for all analysis data.
FieldBoundaries --> KPIScript
HarvestData --> KPIScript
InterpolatedModel --> KPIScript
KPIScript --> KPI1
KPIScript --> KPI2
KPIScript --> KPI3
KPIScript --> KPI4
KPIScript --> KPI5
KPIScript --> KPI6
KPI1 & KPI2 & KPI3 & KPI4 & KPI5 & KPI6 --> KPIParams
KPIParams --> KPIResults
WeeklyMosaic --> ReportScript
KPIResults --> ReportScript
FieldBoundaries --> ReportScript
ReportScript --> Visualizations
Visualizations --> FinalReport
FinalReport --> EmailDelivery
FinalReport --> WebDashboard
Laravel --> ShellScripts
ShellScripts -.->|Triggers| Download
ShellScripts -.->|Triggers| CIScript
ShellScripts -.->|Triggers| GrowthScript
ShellScripts -.->|Triggers| MosaicScript
ShellScripts -.->|Triggers| KPIScript
ShellScripts -.->|Triggers| ReportScript
%% ===== STYLING =====
style INPUTS fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style DOWNLOAD fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style CI_EXTRACTION fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style GROWTH_MODEL fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style WEEKLY_MOSAIC fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style KPI_CALC fill:#e0f2f1,stroke:#00796b,stroke-width:2px
style REPORTING fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style OUTPUTS fill:#ffebee,stroke:#c62828,stroke-width:2px
style DailyMosaics fill:#ffccbc,stroke:#333,stroke-width:1px
style CombinedCI fill:#ffccbc,stroke:#333,stroke-width:1px
style InterpolatedModel fill:#ffccbc,stroke:#333,stroke-width:1px
style WeeklyMosaic fill:#ffccbc,stroke:#333,stroke-width:1px
style KPIResults fill:#ffccbc,stroke:#333,stroke-width:1px
style FinalReport fill:#ffccbc,stroke:#333,stroke-width:1px
```
### Overall System Architecture
This diagram provides a high-level overview of the complete SmartCane system, showing how major components interact. It focuses on the system boundaries and main data flows between the Python API Downloader, R Processing Engine, Laravel Web App, and data storage components. This view helps understand how the system works as a whole.
```mermaid
%% Overall System Architecture
graph TD
A["fa:fa-satellite External Satellite Data Providers API"] --> PyDL["fa:fa-download Python API Downloader"];
C["fa:fa-users Users: Farm Data Input e.g., GeoJSON, Excel"] --> D{"fa:fa-laptop-code Laravel Web App"};
subgraph SmartCane System
PyDL --> G["fa:fa-folder-open File System: Raw Satellite Imagery, Rasters, RDS, Reports, Boundaries"];
E["fa:fa-cogs R Processing Engine"] -- Reads --> G;
E -- Writes --> G;
D -- Manages/Triggers --> F["fa:fa-terminal Shell Script Orchestration"];
F -- Executes --> PyDL;
F -- Executes --> E;
D -- Manages/Accesses --> G;
D -- Reads/Writes --> H["fa:fa-database Database: Project Metadata, Users, Schedules"];
E -- Generates --> I["fa:fa-file-alt Agronomic Reports: DOCX, HTML"];
D -- Accesses/Delivers --> I;
end
D --> J["fa:fa-desktop Users: Web Interface (future)"];
I -- Via Email (SMTP) --> K["fa:fa-envelope Users: Email Reports"];
style E fill:#f9f,stroke:#333,stroke-width:2px
style D fill:#bbf,stroke:#333,stroke-width:2px
style PyDL fill:#ffdd57,stroke:#333,stroke-width:2px
```
### R Processing Engine Detail
This diagram zooms in on the R Processing Engine subsystem, detailing the internal components and data flow. It shows how raw satellite imagery and field data progress through various R scripts to produce crop indices and reports. The diagram highlights the data transformation pipeline within this analytical core of the SmartCane system.
```mermaid
%% R Processing Engine Detail
graph TD
subgraph R Processing Engine
direction TB
subgraph Inputs
SatelliteImages["fa:fa-image Raw Satellite Imagery"]
FieldBoundaries["fa:fa-map-marker-alt Field Boundaries .geojson"]
HarvestData["fa:fa-file-excel Harvest Data .xlsx"]
ProjectParams["fa:fa-file-code Project Parameters .R"]
end
subgraph Core R Scripts & Processes
ParamConfig("fa:fa-cogs parameters_project.R")
MosaicScript("fa:fa-images mosaic_creation.R")
CIExtractionScript("fa:fa-microscope ci_extraction.R")
ReportUtils("fa:fa-tools executive_report_utils.R")
DashboardRmd("fa:fa-tachometer-alt CI_report_dashboard_planet_enhanced.Rmd")
SummaryRmd("fa:fa-list-alt CI_report_executive_summary.Rmd")
end
subgraph Outputs
WeeklyMosaics["fa:fa-file-image Weekly Mosaics .tif"]
CIDataRDS["fa:fa-database CI Data .rds"]
CIRasters["fa:fa-layer-group CI Rasters .tif"]
DashboardReport["fa:fa-chart-bar Dashboard Report .docx/.html"]
SummaryReport["fa:fa-file-invoice Executive Summary .docx/.html"]
end
%% Data Flow
ProjectParams --> ParamConfig;
SatelliteImages --> MosaicScript;
FieldBoundaries --> MosaicScript;
ParamConfig --> MosaicScript;
MosaicScript --> WeeklyMosaics;
WeeklyMosaics --> CIExtractionScript;
FieldBoundaries --> CIExtractionScript;
ParamConfig --> CIExtractionScript;
CIExtractionScript --> CIDataRDS;
CIExtractionScript --> CIRasters;
CIDataRDS --> ReportUtils;
CIRasters --> ReportUtils;
HarvestData --> ReportUtils;
ParamConfig --> ReportUtils;
ReportUtils --> DashboardRmd;
ReportUtils --> SummaryRmd;
ParamConfig --> DashboardRmd;
ParamConfig --> SummaryRmd;
DashboardRmd --> DashboardReport;
SummaryRmd --> SummaryReport;
end
ShellOrchestration["fa:fa-terminal Shell Scripts e.g., build_mosaic.sh, build_report.sh"] -->|Triggers| R_Processing_Engine["fa:fa-cogs R Processing Engine"]
style R_Processing_Engine fill:#f9f,stroke:#333,stroke-width:2px
style Inputs fill:#ccf,stroke:#333,stroke-width:1px
style Outputs fill:#cfc,stroke:#333,stroke-width:1px
style Core_R_Scripts_Processes fill:#ffc,stroke:#333,stroke-width:1px
```
### Python API Downloader Detail
This diagram focuses on the Python API Downloader subsystem, showing its internal components and workflow. It illustrates how API credentials, field boundaries, and other inputs are processed through various Python functions to download, process, and prepare satellite imagery. This view reveals the technical implementation details of the data acquisition layer.
```mermaid
%% Python API Downloader Detail
graph TD
subgraph Python API Downloader
direction TB
subgraph Inputs_Py [Inputs]
APICreds["fa:fa-key API Credentials (SH_CLIENT_ID, SH_CLIENT_SECRET)"]
DateRangeParams["fa:fa-calendar-alt Date Range Parameters (days_needed, specific_date)"]
GeoJSONInput["fa:fa-map-marker-alt Field Boundaries (pivot.geojson)"]
ProjectConfig["fa:fa-cogs Project Configuration (project_name, paths)"]
EvalScripts["fa:fa-file-code Evalscripts (JS for cloud masking & band selection)"]
end
subgraph Core_Python_Logic_Py [Core Python Logic & Libraries]
SetupConfig["fa:fa-cog SentinelHubConfig & BYOC Definition"]
DateSlotGen["fa:fa-calendar-check Date Slot Generation (slots)"]
GeoProcessing["fa:fa-map GeoJSON Parsing & BBox Splitting (geopandas, BBoxSplitter)"]
AvailabilityCheck["fa:fa-search-location Image Availability Check (SentinelHubCatalog)"]
RequestHandler["fa:fa-paper-plane Request Generation (SentinelHubRequest, get_true_color_request_day)"]
DownloadClient["fa:fa-cloud-download-alt Image Download (SentinelHubDownloadClient, download_function)"]
MergeUtility["fa:fa-object-group Tile Merging (gdal.BuildVRT, gdal.Translate, merge_files)"]
CleanupUtility["fa:fa-trash-alt Intermediate File Cleanup (empty_folders)"]
end
subgraph Outputs_Py [Outputs]
RawSatImages["fa:fa-file-image Raw Downloaded Satellite Imagery Tiles (response.tiff in dated subfolders)"]
MergedTifs["fa:fa-images Merged TIFs (merged_tif/{slot}.tif)"]
VirtualRasters["fa:fa-layer-group Virtual Rasters (merged_virtual/merged{slot}.vrt)"]
DownloadLogs["fa:fa-file-alt Console Output Logs (print statements)"]
end
ExternalSatAPI["fa:fa-satellite External Satellite Data Providers API (Planet via Sentinel Hub)"]
%% Data Flow for Python Downloader
APICreds --> SetupConfig;
DateRangeParams --> DateSlotGen;
GeoJSONInput --> GeoProcessing;
ProjectConfig --> SetupConfig;
ProjectConfig --> GeoProcessing;
ProjectConfig --> MergeUtility;
ProjectConfig --> CleanupUtility;
EvalScripts --> RequestHandler;
DateSlotGen -- Available Slots --> AvailabilityCheck;
GeoProcessing -- BBox List --> AvailabilityCheck;
SetupConfig --> AvailabilityCheck;
AvailabilityCheck -- Filtered Slots & BBoxes --> RequestHandler;
RequestHandler -- Download Requests --> DownloadClient;
SetupConfig --> DownloadClient;
DownloadClient -- Downloads Data From --> ExternalSatAPI;
ExternalSatAPI -- Returns Image Data --> DownloadClient;
DownloadClient -- Writes --> RawSatImages;
DownloadClient -- Generates --> DownloadLogs;
RawSatImages --> MergeUtility;
MergeUtility -- Writes --> MergedTifs;
MergeUtility -- Writes --> VirtualRasters;
end
ShellOrchestratorPy["fa:fa-terminal Shell Scripts (e.g., runpython.sh triggering planet_download.ipynb)"] -->|Triggers| Python_API_Downloader["fa:fa-download Python API Downloader"];
style Python_API_Downloader fill:#ffdd57,stroke:#333,stroke-width:2px
style Inputs_Py fill:#cdeeff,stroke:#333,stroke-width:1px
style Outputs_Py fill:#d4efdf,stroke:#333,stroke-width:1px
style Core_Python_Logic_Py fill:#fff5cc,stroke:#333,stroke-width:1px
style ExternalSatAPI fill:#f5b7b1,stroke:#333,stroke-width:2px
```
### SmartCane Engine Integration Diagram
This diagram illustrates the integration of Python and R components within the SmartCane Engine. Unlike the first diagram that shows the overall system, this one specifically focuses on how the two processing components interact with each other and the rest of the system. It emphasizes the orchestration layer and data flows between the core processing components and external systems.
```mermaid
%% SmartCane Engine Integration
graph TD
%% External Systems & Users
Users_DataInput["fa:fa-user Users: Farm Data Input (GeoJSON, Excel, etc.)"] --> Laravel_WebApp;
ExternalSatAPI["fa:fa-satellite External Satellite Data Providers API"];
%% Main Application Components
Laravel_WebApp["fa:fa-globe Laravel Web App (Frontend & Control Plane)"];
Shell_Orchestration["fa:fa-terminal Shell Script Orchestration (e.g., runcane.sh, runpython.sh, build_mosaic.sh)"]; subgraph SmartCane_Engine ["SmartCane Engine (Data Processing Core)"]
direction TB
Python_Downloader["fa:fa-download Python API Downloader"];
R_Engine["fa:fa-chart-line R Processing Engine"];
end
%% Data Storage
FileSystem["fa:fa-folder File System (Raw Imagery, Rasters, RDS, Reports, Boundaries)"];
Database["fa:fa-database Database (Project Metadata, Users, Schedules)"];
%% User Outputs
Users_WebView["fa:fa-desktop Users: Web Interface (future)"];
Users_EmailReports["fa:fa-envelope Users: Email Reports (Agronomic Reports)"];
AgronomicReports["fa:fa-file-alt Agronomic Reports (DOCX, HTML)"];
%% --- Data Flows & Interactions ---
%% Laravel to Orchestration & Engine
Laravel_WebApp -- Manages/Triggers --> Shell_Orchestration;
Shell_Orchestration -- Executes --> Python_Downloader;
Shell_Orchestration -- Executes --> R_Engine;
%% Python Downloader within Engine
ExternalSatAPI -- Satellite Data --> Python_Downloader;
Python_Downloader -- Writes Raw Data --> FileSystem;
%% Inputs to Python (simplified for this view - details in Python-specific diagram)
%% Laravel_WebApp -- Provides Config/Boundaries --> Python_Downloader;
%% R Engine within Engine
%% Inputs to R (simplified - details in R-specific diagram)
%% Laravel_WebApp -- Provides Config/Boundaries --> R_Engine;
R_Engine -- Reads Processed Data/Imagery --> FileSystem;
R_Engine -- Writes Derived Products --> FileSystem;
R_Engine -- Generates --> AgronomicReports;
%% Laravel interaction with Data Storage
Laravel_WebApp -- Manages/Accesses --> FileSystem;
Laravel_WebApp -- Reads/Writes --> Database;
%% Output Delivery
Laravel_WebApp --> Users_WebView;
AgronomicReports --> Users_EmailReports;
%% Assuming a mechanism like SMTP, potentially triggered by Laravel or R-Engine completion
Laravel_WebApp -- Delivers/Displays --> AgronomicReports;
%% Styling
style SmartCane_Engine fill:#e6ffe6,stroke:#333,stroke-width:2px
style Python_Downloader fill:#ffdd57,stroke:#333,stroke-width:2px
style R_Engine fill:#f9f,stroke:#333,stroke-width:2px
style Laravel_WebApp fill:#bbf,stroke:#333,stroke-width:2px
style Shell_Orchestration fill:#f0ad4e,stroke:#333,stroke-width:2px
style FileSystem fill:#d1e0e0,stroke:#333,stroke-width:1px
style Database fill:#d1e0e0,stroke:#333,stroke-width:1px
style ExternalSatAPI fill:#f5b7b1,stroke:#333,stroke-width:2px
style AgronomicReports fill:#d4efdf,stroke:#333,stroke-width:1px
```
## Future Directions
The SmartCane platform is poised for significant evolution, with several key enhancements and new capabilities planned to further empower users and expand its utility:
- **Advanced Management Dashboard**: Development of a more comprehensive and interactive management dashboard to provide users with deeper insights and greater control over their operations.
- **Enhanced Yield Prediction Models**: Improving the accuracy and granularity of yield predictions by incorporating more variables and advanced machine learning techniques.
- **Integrated Weather and Irrigation Advice**: Leveraging weather forecast data and soil moisture information (potentially from new data sources) to provide precise irrigation scheduling and weather-related agronomic advice.
- **AI-Guided Agronomic Advice**: Implementing sophisticated AI algorithms to analyze integrated data (satellite, weather, soil, farm practices) and offer tailored, actionable agronomic recommendations.
- **Automated Advice Generation**: Developing capabilities for the system to automatically generate and disseminate critical advice and alerts to users based on real-time data analysis.
- **Expanded Data Source Integration**:
- **Radar Data**: Incorporating radar satellite imagery (e.g., Sentinel-1) for all-weather monitoring capabilities, particularly useful during cloudy seasons for assessing crop structure, soil moisture, and biomass.
- **IoT and Ground Sensors**: Integrating data from in-field IoT devices and soil sensors for highly localized and continuous monitoring of environmental and soil conditions.
- **Client-Facing Portal**: Exploration and potential development of a client-facing portal to allow end-users direct access to their data, dashboards, and reports, complementing the current internal management interface.
These future developments aim to transform SmartCane into an even more powerful decision support system, fostering sustainable and efficient agricultural practices.
## Conclusion and Integration Summary
The SmartCane system architecture demonstrates a well-integrated solution that combines different technologies and subsystems to solve complex agricultural challenges. Here is a summary of how the key subsystems work together:
### Subsystem Integration
1. **Data Flow Sequence**
- The Laravel Web App initiates the workflow and manages user interactions
- Shell scripts orchestrate the execution sequence of the processing subsystems
- The Python API Downloader acquires raw data from external sources
- The R Processing Engine transforms this data into actionable insights
- Results flow back to users through the web interface and email reports
2. **Technology Integration**
- **Python + R**: Different programming languages are leveraged for their respective strengths—Python for API communication and data acquisition, R for statistical analysis and report generation
- **Laravel + Processing Engine**: Clear separation between web presentation layer and computational backend
- **File System + Database**: Hybrid data storage approach with file system for imagery and reports, database for metadata and user information
3. **Key Integration Mechanisms**
- **File System Bridge**: The different subsystems primarily communicate through standardized file formats (GeoTIFF, GeoJSON, RDS, DOCX)
- **Shell Script Orchestration**: Acts as the "glue" between subsystems, ensuring proper execution sequence and environment setup
- **Standardized Data Formats**: Use of widely-accepted geospatial and data formats enables interoperability
4. **Extensibility and Scalability**
- The modular architecture allows for replacement or enhancement of individual components
- The clear subsystem boundaries enable parallel development and testing
- Standard interfaces simplify integration of new data sources, algorithms, or output methods
The SmartCane architecture balances complexity with maintainability by using well-established technologies and clear boundaries between subsystems. The separation of concerns between data acquisition, processing, and presentation layers ensures that changes in one area minimally impact others, while the consistent data flow pathways ensure that information moves smoothly through the system.