SmartCane/.github/copilot-instructions.md

# SmartCane Copilot Instructions for AI Coding Agents

## Architecture Overview

**SmartCane** is a multi-stage agricultural intelligence platform that processes satellite imagery into crop health analysis and harvest predictions. The system spans **Python, R, and PHP**, with a file-based processing architecture.

### Core Data Pipeline
```
[Python] Download 4-band imagery
    → [R Stage 01] Create tile grid & split into 25 tiles/day
    → [R Stage 02] Extract CI (Canopy Index) per field
    → [R Stage 03] Growth model interpolation (smooth time series)
    → [R Stage 04] Create weekly 5-band mosaics
    → [R Stage 05] Field-level KPI calculation & alerting
    → [R Stage 06] Generate Word/HTML reports
```

**Key Data Flow**:
- Raw 4-band GeoTIFFs (RGB+NIR, uint16) downloaded from Planet API via Python
- Stored in: `laravel_app/storage/app/{project}/merged_tif/`
- Field boundaries (`pivot.geojson`) used across all stages for masking/analysis
- Outputs: RDS (intermediate data), TIF (rasters), Excel/Word (reports)

### Main Components
| Component | Purpose | Key Files |
|-----------|---------|-----------|
| `r_app/` | Core 6-stage R pipeline | `01-10_*.R`, `*_utils.R`, `parameters_project.R` |
| `python_app/` | Satellite download & ML harvest prediction | `00_download_8band_pu_optimized.py`, `01-02_harvest_*.py` |
| `r_app/experiments/sar_dashboard/` | SAR (Sentinel-1 radar) analysis | `download_s1_*.py`, `generate_sar_report.R` |
| `laravel_app/` | Web dashboard (optional integration) | Standard Laravel structure |

## Critical Developer Workflows

### 1. R Package Setup (DO FIRST)
```powershell
Rscript r_app/package_manager.R
# Manages renv.lock; commit this but NOT renv/ folder
```

### 2. Full Pipeline (Typical Weekly Run)
```powershell
# 2a. Download satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata  # or chemba, xinavane, etc.

# 2b. Run full R pipeline
cd ../r_app
Rscript 01_create_master_grid_and_split_tiffs.R
Rscript 02_ci_extraction.R
Rscript 03_interpolate_growth_model.R
Rscript 04_mosaic_creation.R
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd
```

### 3. Python Data Download
- **Script**: `00_download_8band_pu_optimized.py`
- **Usage**: `python 00_download_8band_pu_optimized.py [PROJECT] [--options]`
- **Key Options**: `--date`, `--resolution`, `--clear-all`
- **Batch Mode**: `python download_planet_missing_dates.py --start 2025-11-01 --end 2025-12-24 --project angata`
- **Output**: `laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif` (4-band uint16)
- **Cost**: ~1,500-2,000 PU/date (optimized for cloud masking & bbox reduction)

### 4. Stage-Specific Execution
```powershell
# Only KPI + reporting (if earlier stages done):
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd

# Crop messaging/alerts (WhatsApp-ready output):
Rscript 06_crop_messaging.R [week] [prev_week] [estate_name]

# Experimental harvest prediction:
Rscript 11_yield_prediction_comparison.R
Rscript 12_temporal_yield_forecasting.R

# SAR (radar) analysis:
cd r_app/experiments/sar_dashboard
python download_s1_simba.py && Rscript generate_sar_report.R simba
```

## Project-Specific Conventions

### Configuration & Parameterization
- **Central config**: `r_app/parameters_project.R` — sets PROJECT, paths, field boundaries, data_dir
- **Project names**: angata, chemba, xinavane, esa, simba (affects file paths, field boundaries, thresholds)
- **Hard-coded dependencies are problematic** (see SC-50, SC-60) — use parameterized paths instead of `"pivot.geojson"` literals
- All scripts source `parameters_project.R` at start to get global config

### Data Storage & Naming
- **Daily TIFFs**: `merged_tif/{YYYY-MM-DD}.tif` (4 bands: R,G,B,NIR)
- **Tiles**: `daily_tiles_split/{YYYY-MM-DD}/{YYYY-MM-DD}_{TILE_ID}.tif` (25 tiles per day)
- **CI data (RDS)**: `combined_CI/combined_CI_data.rds` (cumulative, wide format: fields × dates)
- **Weekly mosaic**: `weekly_mosaic/week_{WW}.tif` (5 bands: R,G,B,NIR,CI)
- **Outputs**: `output/` for reports; Excel/Word saved with date/week naming

### Field Uniformity & Alerting
- **Two-dimensional analysis**: Time (week-over-week trends) + Space (within-field homogeneity)
- **Uniformity metric**: Coefficient of Variation (CV) — CV < 0.15 (good), < 0.08 (excellent), > 0.25 (poor)
- **Change thresholds**: +0.5 (increase alert), -0.5 (decrease alert) — tunable in code
- **Alert categories**: 🚨 URGENT, ⚠️ ALERT, ✅ POSITIVE, 💡 OPPORTUNITY
- **Cloud handling**: When CI=0 (no data), skip temporal analysis, spatial-only assessment

### RDS File Conventions
- **Wide format**: Rows = fields, Columns = dates (CI values)
- **Key files**:
  - `combined_CI_data.rds` — all fields, all dates, cumulative
  - `All_pivots_Cumulative_CI_quadrant_year_v2.rds` — growth model output with quadrant-level analysis
  - `{project}_kpi_summary_tables_week{WW}.rds` — weekly KPI results
- **RDS I/O**: Managed by utility functions (`ci_extraction_utils.R`, `growth_model_utils.R`)

### Word Report Output
- Templates in `r_app/` (e.g., `10_CI_report_with_kpis_simple.Rmd`)
- Use flextable for split-column tables (wide data rendered as multiple tables)
- Include interpretation guides (thresholds, units: hectares + acres)
- Filenames: `SmartCane_Report_week{WW}_{YYYY}.docx`

## Integration Points & Dependencies

### R ↔ Python
- Python downloads → R expects `merged_tif/` filled with 4-band TIFFs
- R harvest prediction (`11_*.R`, `12_*.R`) uses ML models from Python (`model_307.pt`, `model_config.json`, scalers)

### R ↔ GeoJSON (Field Boundaries)
- **File**: `laravel_app/storage/app/{project}/pivot.geojson`
- **Used in**: All stages (download, CI extraction, masking, reporting)
- **Critical**: Ensure geometry is current; affects all per-field statistics

### R ↔ Harvest Data
- **File**: `laravel_app/storage/app/{project}/harvest.xlsx`
- **Used in**: Growth model (Stage 03), field analysis (Stage 05), reporting
- **Format**: Date columns, field identifiers, harvest event flags

### SAR (Separate Ecosystem, experimental)
- Independent Python download + R reporting pipeline
- Data stored: `r_app/experiments/sar_dashboard/data/{client}/weekly_SAR_mosaic/`
- Outputs: Word reports with VV/VH backscatter, RVI index, harvest detection
- Field boundaries: `r_app/experiments/pivot.geojson` (project-specific)

## Common Patterns & Gotchas

### Pattern: Utility Functions for Reusable Logic
- `ci_extraction_utils.R` — tile detection, RDS I/O, CI calculation variants
- `growth_model_utils.R` — interpolation, smoothing, gap-filling
- `kpi_utils.R` — threshold-based alerting, uniformity metrics
- **Why**: Keeps main scripts readable, enables testing individual logic

### Pattern: Source Config Once
```r
source("parameters_project.R")  # Sets PROJECT, data_dir, field_boundaries_path, etc.
# All downstream code uses these globals
```

### Gotcha: File Path Dependencies
- **Problem**: Hard-coded paths like `file.path("laravel_app/storage/app", PROJECT, "pivot.geojson")`
- **Better**: Pass as parameters or configure in `parameters_project.R`
- **Benefit**: Code reusable across projects; easier testing

### Gotcha: Cloud Masking in CI Extraction
- If `CI == 0` globally for a date → entire date flagged as cloudy
- Growth model drops entire date rather than interpolating; impacts trend analysis

## AI Agent Behavior & Approach

### Terminal Commands
- **R execution on Windows**: Use PowerShell `&` operator with full R path
  - Syntax: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" script.R`
  - Example: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" 01_create_master_grid_and_split_tiffs.R`
  - The `&` operator invokes commands with spaces in the path
- Verify R path is correct for the Windows installation before running

### Testing & Temporary Code
- **Test file naming**: Always include `test` in filename to signal removal (e.g., `test_ci_extraction.R`, `test_download.py`)
- **Test locations**:
  - R tests: `r_app/test/test_*.R`
  - Python tests: `python_app/test/test_*.py`
- **Default test project**: Use `angata` for development/testing unless specified otherwise
- **Cleanup**: Test files are temporary—learnings are copied to production scripts, then test files are deleted
- **Do NOT commit** test files to main branch

### Output & Documentation
- **NO auto-generated summary files**: Do NOT create README.md, SUMMARY.txt, or any markdown/text summaries unless explicitly requested
- **Chat-based summaries only**: Briefly summarize results and findings directly in chat conversation
- This keeps the repo clean and avoids clutter; user will ask for docs if needed

### Critical Thinking & Partnership
- **Ask clarifying questions** before implementing:
  - "Why do you want to modify this stage? What problem are we solving?"
  - "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
  - "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- **Suggest alternatives**:
  - If a request seems inefficient, propose better options
  - Point out if changes might affect other stages or projects
  - Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- **Challenge assumptions**:
  - "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
  - "Is this a one-time fix or a pattern we should refactor?"
  - "Will this break existing reports or KPI calculations?"
- **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests

## File Structure (Key Locations)

```
r_app/
  ├── 01_create_master_grid_and_split_tiffs.R
  ├── 02_ci_extraction.R
  ├── 03_interpolate_growth_model.R
  ├── 04_mosaic_creation.R
  ├── 06_crop_messaging.R
  ├── 09_field_analysis_weekly.R
  ├── 10_CI_report_with_kpis_simple.Rmd
  ├── 11_yield_prediction_comparison.R
  ├── 12_temporal_yield_forecasting.R
  ├── *_utils.R (ci_extraction, growth_model, kpi, report, crop_messaging)
  ├── parameters_project.R
  ├── package_manager.R
  ├── renv.lock
  ├── system_architecture/system_architecture.md (full architecture doc)
  └── experiments/sar_dashboard/
        ├── download_s1_*.py
        ├── generate_sar_report.R
        └── sar_dashboard_utils.R

python_app/
  ├── 00_download_8band_pu_optimized.py
  ├── download_planet_missing_dates.py
  ├── 01_harvest_baseline_prediction.py
  ├── 02_harvest_imminent_weekly.py
  ├── model_307.pt
  └── requirements_*.txt

laravel_app/
  ├── storage/app/{project}/
  │   ├── merged_tif/
  │   ├── daily_tiles_split/
  │   ├── combined_CI/
  │   └── weekly_mosaic/
  └── ... (standard Laravel)

output/
  └── (all generated reports, Excel, Word, HTML)
```

### Critical Thinking & Partnership
- **Ask clarifying questions** before implementing:
  - "Why do you want to modify this stage? What problem are we solving?"
  - "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
  - "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- **Suggest alternatives**:
  - If a request seems inefficient, propose better options
  - Point out if changes might affect other stages or projects
  - Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- **Challenge assumptions**:
  - "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
  - "Is this a one-time fix or a pattern we should refactor?"
  - "Will this break existing reports or KPI calculations?"
- **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests

## Linear Issue Integration

### Creating Issues
To create a Linear issue, simply ask in the chat:

```
"Create a Linear issue: Fix hard-coded paths in Stage 02 CI extraction"
```

Provide context like:
- **Title**: Clear, action-oriented (what needs doing)
- **Description**: Problem statement, impact, acceptance criteria
- **Project**: Which project (Inception Phase Angata, General backlog, etc.)
- **Priority**: High, Medium, Low
- **Assignee**: Who should work on it (e.g., yourself, Dimitra)
- **Related issues**: Reference other issues if it blocks/relates to them

Example:
```
Create Linear issue:
Title: Only create tiles that overlap with GeoJSON boundaries
Description: Master grid creates all 25 tiles even when empty. Filter by field geometry to save storage.
Project: Inception Phase Angata
Priority: Medium
Related: SC-50 (parameterization work)
```

### Referencing Issues
In chat, reference issues by their ID to include full context:

```
"Work on SC-59 - update the system architecture documentation"
```

This pulls the issue details into the conversation so I understand the full scope.

### Workflow
1. **Create issue** → Clear task definition
2. **Reference issue in chat** → I fetch details automatically
3. **Ask for implementation** → I work toward issue's acceptance criteria
4. **Close issue** → Mark as done in Linear, summarize in chat


## Debugging & Troubleshooting

### Data Validation Checkpoints
After each major stage, verify:
- **Post-Download**: merged_tif/ contains expected date ranges; file sizes ~150-300MB each
- **Post-CI Extraction**: combined_CI_data.rds dimensions match (# fields × # dates); no all-NA columns
- **Post-Growth Model**: Interpolated values are within expected CI range; no unexpected gaps
- **Pre-Reporting**: Weekly mosaic TIF has 5 bands; field analysis RDS has KPI columns present

## Next Steps for AI Agents

1. **Understanding a Script**: Check `parameters_project.R` first for config, then trace utility functions
2. **Adding Features**: Determine which stage (01-06 or experimental) it belongs to; follow existing pattern in that stage
3. **Testing**: Use standalone test data in `r_app/experiments/` or small date ranges with `--start/--end` flags
4. **Documentation**: Update `r_app/system_architecture/system_architecture.md` when architecture changes
5. **Refactoring**: Avoid hard-coded paths; parameterize and test across multiple projects (angata, esa, etc.)

---

_For detailed system architecture, see `r_app/system_architecture/system_architecture.md`.