SmartCane/.github/copilot-instructions.md

323 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SmartCane Copilot Instructions for AI Coding Agents
## Architecture Overview
**SmartCane** is a multi-stage agricultural intelligence platform that processes satellite imagery into crop health analysis and harvest predictions. The system spans **Python, R, and PHP**, with a file-based processing architecture.
### Core Data Pipeline
```
[Python] Download 4-band imagery
→ [R Stage 01] Create tile grid & split into 25 tiles/day
→ [R Stage 02] Extract CI (Canopy Index) per field
→ [R Stage 03] Growth model interpolation (smooth time series)
→ [R Stage 04] Create weekly 5-band mosaics
→ [R Stage 05] Field-level KPI calculation & alerting
→ [R Stage 06] Generate Word/HTML reports
```
**Key Data Flow**:
- Raw 4-band GeoTIFFs (RGB+NIR, uint16) downloaded from Planet API via Python
- Stored in: `laravel_app/storage/app/{project}/merged_tif/`
- Field boundaries (`pivot.geojson`) used across all stages for masking/analysis
- Outputs: RDS (intermediate data), TIF (rasters), Excel/Word (reports)
### Main Components
| Component | Purpose | Key Files |
|-----------|---------|-----------|
| `r_app/` | Core 6-stage R pipeline | `01-10_*.R`, `*_utils.R`, `parameters_project.R` |
| `python_app/` | Satellite download & ML harvest prediction | `00_download_8band_pu_optimized.py`, `01-02_harvest_*.py` |
| `r_app/experiments/sar_dashboard/` | SAR (Sentinel-1 radar) analysis | `download_s1_*.py`, `generate_sar_report.R` |
| `laravel_app/` | Web dashboard (optional integration) | Standard Laravel structure |
## Critical Developer Workflows
### 1. R Package Setup (DO FIRST)
```powershell
Rscript r_app/package_manager.R
# Manages renv.lock; commit this but NOT renv/ folder
```
### 2. Full Pipeline (Typical Weekly Run)
```powershell
# 2a. Download satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata # or chemba, xinavane, etc.
# 2b. Run full R pipeline
cd ../r_app
Rscript 01_create_master_grid_and_split_tiffs.R
Rscript 02_ci_extraction.R
Rscript 03_interpolate_growth_model.R
Rscript 04_mosaic_creation.R
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd
```
### 3. Python Data Download
- **Script**: `00_download_8band_pu_optimized.py`
- **Usage**: `python 00_download_8band_pu_optimized.py [PROJECT] [--options]`
- **Key Options**: `--date`, `--resolution`, `--clear-all`
- **Batch Mode**: `python download_planet_missing_dates.py --start 2025-11-01 --end 2025-12-24 --project angata`
- **Output**: `laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif` (4-band uint16)
- **Cost**: ~1,500-2,000 PU/date (optimized for cloud masking & bbox reduction)
### 4. Stage-Specific Execution
```powershell
# Only KPI + reporting (if earlier stages done):
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd
# Crop messaging/alerts (WhatsApp-ready output):
Rscript 06_crop_messaging.R [week] [prev_week] [estate_name]
# Experimental harvest prediction:
Rscript 11_yield_prediction_comparison.R
Rscript 12_temporal_yield_forecasting.R
# SAR (radar) analysis:
cd r_app/experiments/sar_dashboard
python download_s1_simba.py && Rscript generate_sar_report.R simba
```
## Project-Specific Conventions
### Configuration & Parameterization
- **Central config**: `r_app/parameters_project.R` — sets PROJECT, paths, field boundaries, data_dir
- **Project names**: angata, chemba, xinavane, esa, simba (affects file paths, field boundaries, thresholds)
- **Hard-coded dependencies are problematic** (see SC-50, SC-60) — use parameterized paths instead of `"pivot.geojson"` literals
- All scripts source `parameters_project.R` at start to get global config
### Data Storage & Naming
- **Daily TIFFs**: `merged_tif/{YYYY-MM-DD}.tif` (4 bands: R,G,B,NIR)
- **Tiles**: `daily_tiles_split/{YYYY-MM-DD}/{YYYY-MM-DD}_{TILE_ID}.tif` (25 tiles per day)
- **CI data (RDS)**: `combined_CI/combined_CI_data.rds` (cumulative, wide format: fields × dates)
- **Weekly mosaic**: `weekly_mosaic/week_{WW}.tif` (5 bands: R,G,B,NIR,CI)
- **Outputs**: `output/` for reports; Excel/Word saved with date/week naming
### Field Uniformity & Alerting
- **Two-dimensional analysis**: Time (week-over-week trends) + Space (within-field homogeneity)
- **Uniformity metric**: Coefficient of Variation (CV) — CV < 0.15 (good), < 0.08 (excellent), > 0.25 (poor)
- **Change thresholds**: +0.5 (increase alert), -0.5 (decrease alert) — tunable in code
- **Alert categories**: 🚨 URGENT, ⚠️ ALERT, ✅ POSITIVE, 💡 OPPORTUNITY
- **Cloud handling**: When CI=0 (no data), skip temporal analysis, spatial-only assessment
### RDS File Conventions
- **Wide format**: Rows = fields, Columns = dates (CI values)
- **Key files**:
- `combined_CI_data.rds` — all fields, all dates, cumulative
- `All_pivots_Cumulative_CI_quadrant_year_v2.rds` — growth model output with quadrant-level analysis
- `{project}_kpi_summary_tables_week{WW}.rds` — weekly KPI results
- **RDS I/O**: Managed by utility functions (`ci_extraction_utils.R`, `growth_model_utils.R`)
### Word Report Output
- Templates in `r_app/` (e.g., `10_CI_report_with_kpis_simple.Rmd`)
- Use flextable for split-column tables (wide data rendered as multiple tables)
- Include interpretation guides (thresholds, units: hectares + acres)
- Filenames: `SmartCane_Report_week{WW}_{YYYY}.docx`
## Integration Points & Dependencies
### R ↔ Python
- Python downloads → R expects `merged_tif/` filled with 4-band TIFFs
- R harvest prediction (`11_*.R`, `12_*.R`) uses ML models from Python (`model_307.pt`, `model_config.json`, scalers)
### R ↔ GeoJSON (Field Boundaries)
- **File**: `laravel_app/storage/app/{project}/pivot.geojson`
- **Used in**: All stages (download, CI extraction, masking, reporting)
- **Critical**: Ensure geometry is current; affects all per-field statistics
### R ↔ Harvest Data
- **File**: `laravel_app/storage/app/{project}/harvest.xlsx`
- **Used in**: Growth model (Stage 03), field analysis (Stage 05), reporting
- **Format**: Date columns, field identifiers, harvest event flags
### SAR (Separate Ecosystem, experimental)
- Independent Python download + R reporting pipeline
- Data stored: `r_app/experiments/sar_dashboard/data/{client}/weekly_SAR_mosaic/`
- Outputs: Word reports with VV/VH backscatter, RVI index, harvest detection
- Field boundaries: `r_app/experiments/pivot.geojson` (project-specific)
## Common Patterns & Gotchas
### Pattern: Utility Functions for Reusable Logic
- `ci_extraction_utils.R` — tile detection, RDS I/O, CI calculation variants
- `growth_model_utils.R` — interpolation, smoothing, gap-filling
- `kpi_utils.R` — threshold-based alerting, uniformity metrics
- **Why**: Keeps main scripts readable, enables testing individual logic
### Pattern: Source Config Once
```r
source("parameters_project.R") # Sets PROJECT, data_dir, field_boundaries_path, etc.
# All downstream code uses these globals
```
### Gotcha: File Path Dependencies
- **Problem**: Hard-coded paths like `file.path("laravel_app/storage/app", PROJECT, "pivot.geojson")`
- **Better**: Pass as parameters or configure in `parameters_project.R`
- **Benefit**: Code reusable across projects; easier testing
### Gotcha: Cloud Masking in CI Extraction
- If `CI == 0` globally for a date → entire date flagged as cloudy
- Growth model drops entire date rather than interpolating; impacts trend analysis
## AI Agent Behavior & Approach
### Terminal Commands
- **R execution on Windows**: Use PowerShell `&` operator with full R path
- Syntax: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" script.R`
- Example: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" 01_create_master_grid_and_split_tiffs.R`
- The `&` operator invokes commands with spaces in the path
- Verify R path is correct for the Windows installation before running
### Testing & Temporary Code
- **Test file naming**: Always include `test` in filename to signal removal (e.g., `test_ci_extraction.R`, `test_download.py`)
- **Test locations**:
- R tests: `r_app/test/test_*.R`
- Python tests: `python_app/test/test_*.py`
- **Default test project**: Use `angata` for development/testing unless specified otherwise
- **Cleanup**: Test files are temporary—learnings are copied to production scripts, then test files are deleted
- **Do NOT commit** test files to main branch
### Output & Documentation
- **NO auto-generated summary files**: Do NOT create README.md, SUMMARY.txt, or any markdown/text summaries unless explicitly requested
- **Chat-based summaries only**: Briefly summarize results and findings directly in chat conversation
- This keeps the repo clean and avoids clutter; user will ask for docs if needed
### Critical Thinking & Partnership
- **Ask clarifying questions** before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- **Suggest alternatives**:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- **Challenge assumptions**:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
- **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests
## File Structure (Key Locations)
```
r_app/
├── 01_create_master_grid_and_split_tiffs.R
├── 02_ci_extraction.R
├── 03_interpolate_growth_model.R
├── 04_mosaic_creation.R
├── 06_crop_messaging.R
├── 09_field_analysis_weekly.R
├── 10_CI_report_with_kpis_simple.Rmd
├── 11_yield_prediction_comparison.R
├── 12_temporal_yield_forecasting.R
├── *_utils.R (ci_extraction, growth_model, kpi, report, crop_messaging)
├── parameters_project.R
├── package_manager.R
├── renv.lock
├── system_architecture/system_architecture.md (full architecture doc)
└── experiments/sar_dashboard/
├── download_s1_*.py
├── generate_sar_report.R
└── sar_dashboard_utils.R
python_app/
├── 00_download_8band_pu_optimized.py
├── download_planet_missing_dates.py
├── 01_harvest_baseline_prediction.py
├── 02_harvest_imminent_weekly.py
├── model_307.pt
└── requirements_*.txt
laravel_app/
├── storage/app/{project}/
│ ├── merged_tif/
│ ├── daily_tiles_split/
│ ├── combined_CI/
│ └── weekly_mosaic/
└── ... (standard Laravel)
output/
└── (all generated reports, Excel, Word, HTML)
```
### Critical Thinking & Partnership
- **Ask clarifying questions** before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- **Suggest alternatives**:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- **Challenge assumptions**:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
- **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests
## Linear Issue Integration
### Creating Issues
To create a Linear issue, simply ask in the chat:
```
"Create a Linear issue: Fix hard-coded paths in Stage 02 CI extraction"
```
Provide context like:
- **Title**: Clear, action-oriented (what needs doing)
- **Description**: Problem statement, impact, acceptance criteria
- **Project**: Which project (Inception Phase Angata, General backlog, etc.)
- **Priority**: High, Medium, Low
- **Assignee**: Who should work on it (e.g., yourself, Dimitra)
- **Related issues**: Reference other issues if it blocks/relates to them
Example:
```
Create Linear issue:
Title: Only create tiles that overlap with GeoJSON boundaries
Description: Master grid creates all 25 tiles even when empty. Filter by field geometry to save storage.
Project: Inception Phase Angata
Priority: Medium
Related: SC-50 (parameterization work)
```
### Referencing Issues
In chat, reference issues by their ID to include full context:
```
"Work on SC-59 - update the system architecture documentation"
```
This pulls the issue details into the conversation so I understand the full scope.
### Workflow
1. **Create issue** → Clear task definition
2. **Reference issue in chat** → I fetch details automatically
3. **Ask for implementation** → I work toward issue's acceptance criteria
4. **Close issue** → Mark as done in Linear, summarize in chat
## Debugging & Troubleshooting
### Data Validation Checkpoints
After each major stage, verify:
- **Post-Download**: merged_tif/ contains expected date ranges; file sizes ~150-300MB each
- **Post-CI Extraction**: combined_CI_data.rds dimensions match (# fields × # dates); no all-NA columns
- **Post-Growth Model**: Interpolated values are within expected CI range; no unexpected gaps
- **Pre-Reporting**: Weekly mosaic TIF has 5 bands; field analysis RDS has KPI columns present
## Next Steps for AI Agents
1. **Understanding a Script**: Check `parameters_project.R` first for config, then trace utility functions
2. **Adding Features**: Determine which stage (01-06 or experimental) it belongs to; follow existing pattern in that stage
3. **Testing**: Use standalone test data in `r_app/experiments/` or small date ranges with `--start/--end` flags
4. **Documentation**: Update `r_app/system_architecture/system_architecture.md` when architecture changes
5. **Refactoring**: Avoid hard-coded paths; parameterize and test across multiple projects (angata, esa, etc.)
---
_For detailed system architecture, see `r_app/system_architecture/system_architecture.md`.