# SmartCane Copilot Instructions for AI Coding Agents ## Architecture Overview **SmartCane** is a multi-stage agricultural intelligence platform that processes satellite imagery into crop health analysis and harvest predictions. The system spans **Python, R, and PHP**, with a file-based processing architecture. ### Core Data Pipeline ``` [Python] Download 4-band imagery → [R Stage 01] Create tile grid & split into 25 tiles/day → [R Stage 02] Extract CI (Canopy Index) per field → [R Stage 03] Growth model interpolation (smooth time series) → [R Stage 04] Create weekly 5-band mosaics → [R Stage 05] Field-level KPI calculation & alerting → [R Stage 06] Generate Word/HTML reports ``` **Key Data Flow**: - Raw 4-band GeoTIFFs (RGB+NIR, uint16) downloaded from Planet API via Python - Stored in: `laravel_app/storage/app/{project}/merged_tif/` - Field boundaries (`pivot.geojson`) used across all stages for masking/analysis - Outputs: RDS (intermediate data), TIF (rasters), Excel/Word (reports) ### Main Components | Component | Purpose | Key Files | |-----------|---------|-----------| | `r_app/` | Core 6-stage R pipeline | `01-10_*.R`, `*_utils.R`, `parameters_project.R` | | `python_app/` | Satellite download & ML harvest prediction | `00_download_8band_pu_optimized.py`, `01-02_harvest_*.py` | | `r_app/experiments/sar_dashboard/` | SAR (Sentinel-1 radar) analysis | `download_s1_*.py`, `generate_sar_report.R` | | `laravel_app/` | Web dashboard (optional integration) | Standard Laravel structure | ## Critical Developer Workflows ### 1. R Package Setup (DO FIRST) ```powershell Rscript r_app/package_manager.R # Manages renv.lock; commit this but NOT renv/ folder ``` ### 2. Full Pipeline (Typical Weekly Run) ```powershell # 2a. Download satellite data cd python_app python 00_download_8band_pu_optimized.py angata # or chemba, xinavane, etc. # 2b. Run full R pipeline cd ../r_app Rscript 01_create_master_grid_and_split_tiffs.R Rscript 02_ci_extraction.R Rscript 03_interpolate_growth_model.R Rscript 04_mosaic_creation.R Rscript 09_field_analysis_weekly.R Rscript 10_CI_report_with_kpis_simple.Rmd ``` ### 3. Python Data Download - **Script**: `00_download_8band_pu_optimized.py` - **Usage**: `python 00_download_8band_pu_optimized.py [PROJECT] [--options]` - **Key Options**: `--date`, `--resolution`, `--clear-all` - **Batch Mode**: `python download_planet_missing_dates.py --start 2025-11-01 --end 2025-12-24 --project angata` - **Output**: `laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif` (4-band uint16) - **Cost**: ~1,500-2,000 PU/date (optimized for cloud masking & bbox reduction) ### 4. Stage-Specific Execution ```powershell # Only KPI + reporting (if earlier stages done): Rscript 09_field_analysis_weekly.R Rscript 10_CI_report_with_kpis_simple.Rmd # Crop messaging/alerts (WhatsApp-ready output): Rscript 06_crop_messaging.R [week] [prev_week] [estate_name] # Experimental harvest prediction: Rscript 11_yield_prediction_comparison.R Rscript 12_temporal_yield_forecasting.R # SAR (radar) analysis: cd r_app/experiments/sar_dashboard python download_s1_simba.py && Rscript generate_sar_report.R simba ``` ## Project-Specific Conventions ### Configuration & Parameterization - **Central config**: `r_app/parameters_project.R` — sets PROJECT, paths, field boundaries, data_dir - **Project names**: angata, chemba, xinavane, esa, simba (affects file paths, field boundaries, thresholds) - **Hard-coded dependencies are problematic** (see SC-50, SC-60) — use parameterized paths instead of `"pivot.geojson"` literals - All scripts source `parameters_project.R` at start to get global config ### Data Storage & Naming - **Daily TIFFs**: `merged_tif/{YYYY-MM-DD}.tif` (4 bands: R,G,B,NIR) - **Tiles**: `daily_tiles_split/{YYYY-MM-DD}/{YYYY-MM-DD}_{TILE_ID}.tif` (25 tiles per day) - **CI data (RDS)**: `combined_CI/combined_CI_data.rds` (cumulative, wide format: fields × dates) - **Weekly mosaic**: `weekly_mosaic/week_{WW}.tif` (5 bands: R,G,B,NIR,CI) - **Outputs**: `output/` for reports; Excel/Word saved with date/week naming ### Field Uniformity & Alerting - **Two-dimensional analysis**: Time (week-over-week trends) + Space (within-field homogeneity) - **Uniformity metric**: Coefficient of Variation (CV) — CV < 0.15 (good), < 0.08 (excellent), > 0.25 (poor) - **Change thresholds**: +0.5 (increase alert), -0.5 (decrease alert) — tunable in code - **Alert categories**: 🚨 URGENT, ⚠️ ALERT, ✅ POSITIVE, 💡 OPPORTUNITY - **Cloud handling**: When CI=0 (no data), skip temporal analysis, spatial-only assessment ### RDS File Conventions - **Wide format**: Rows = fields, Columns = dates (CI values) - **Key files**: - `combined_CI_data.rds` — all fields, all dates, cumulative - `All_pivots_Cumulative_CI_quadrant_year_v2.rds` — growth model output with quadrant-level analysis - `{project}_kpi_summary_tables_week{WW}.rds` — weekly KPI results - **RDS I/O**: Managed by utility functions (`ci_extraction_utils.R`, `growth_model_utils.R`) ### Word Report Output - Templates in `r_app/` (e.g., `10_CI_report_with_kpis_simple.Rmd`) - Use flextable for split-column tables (wide data rendered as multiple tables) - Include interpretation guides (thresholds, units: hectares + acres) - Filenames: `SmartCane_Report_week{WW}_{YYYY}.docx` ## Integration Points & Dependencies ### R ↔ Python - Python downloads → R expects `merged_tif/` filled with 4-band TIFFs - R harvest prediction (`11_*.R`, `12_*.R`) uses ML models from Python (`model_307.pt`, `model_config.json`, scalers) ### R ↔ GeoJSON (Field Boundaries) - **File**: `laravel_app/storage/app/{project}/pivot.geojson` - **Used in**: All stages (download, CI extraction, masking, reporting) - **Critical**: Ensure geometry is current; affects all per-field statistics ### R ↔ Harvest Data - **File**: `laravel_app/storage/app/{project}/harvest.xlsx` - **Used in**: Growth model (Stage 03), field analysis (Stage 05), reporting - **Format**: Date columns, field identifiers, harvest event flags ### SAR (Separate Ecosystem, experimental) - Independent Python download + R reporting pipeline - Data stored: `r_app/experiments/sar_dashboard/data/{client}/weekly_SAR_mosaic/` - Outputs: Word reports with VV/VH backscatter, RVI index, harvest detection - Field boundaries: `r_app/experiments/pivot.geojson` (project-specific) ## Common Patterns & Gotchas ### Pattern: Utility Functions for Reusable Logic - `ci_extraction_utils.R` — tile detection, RDS I/O, CI calculation variants - `growth_model_utils.R` — interpolation, smoothing, gap-filling - `kpi_utils.R` — threshold-based alerting, uniformity metrics - **Why**: Keeps main scripts readable, enables testing individual logic ### Pattern: Source Config Once ```r source("parameters_project.R") # Sets PROJECT, data_dir, field_boundaries_path, etc. # All downstream code uses these globals ``` ### Gotcha: File Path Dependencies - **Problem**: Hard-coded paths like `file.path("laravel_app/storage/app", PROJECT, "pivot.geojson")` - **Better**: Pass as parameters or configure in `parameters_project.R` - **Benefit**: Code reusable across projects; easier testing ### Gotcha: Cloud Masking in CI Extraction - If `CI == 0` globally for a date → entire date flagged as cloudy - Growth model drops entire date rather than interpolating; impacts trend analysis ## AI Agent Behavior & Approach ### Terminal Commands - **R execution on Windows**: Use PowerShell `&` operator with full R path - Syntax: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" script.R` - Example: `& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" 01_create_master_grid_and_split_tiffs.R` - The `&` operator invokes commands with spaces in the path - Verify R path is correct for the Windows installation before running ### Testing & Temporary Code - **Test file naming**: Always include `test` in filename to signal removal (e.g., `test_ci_extraction.R`, `test_download.py`) - **Test locations**: - R tests: `r_app/test/test_*.R` - Python tests: `python_app/test/test_*.py` - **Default test project**: Use `angata` for development/testing unless specified otherwise - **Cleanup**: Test files are temporary—learnings are copied to production scripts, then test files are deleted - **Do NOT commit** test files to main branch ### Output & Documentation - **NO auto-generated summary files**: Do NOT create README.md, SUMMARY.txt, or any markdown/text summaries unless explicitly requested - **Chat-based summaries only**: Briefly summarize results and findings directly in chat conversation - This keeps the repo clean and avoids clutter; user will ask for docs if needed ### Critical Thinking & Partnership - **Ask clarifying questions** before implementing: - "Why do you want to modify this stage? What problem are we solving?" - "Have you considered [alternative approach]? It might be simpler/faster/cleaner" - "Is this for a specific project (angata/esa/chemba)? That affects which configs to change" - **Suggest alternatives**: - If a request seems inefficient, propose better options - Point out if changes might affect other stages or projects - Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project") - **Challenge assumptions**: - "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?" - "Is this a one-time fix or a pattern we should refactor?" - "Will this break existing reports or KPI calculations?" - **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests ## File Structure (Key Locations) ``` r_app/ ├── 01_create_master_grid_and_split_tiffs.R ├── 02_ci_extraction.R ├── 03_interpolate_growth_model.R ├── 04_mosaic_creation.R ├── 06_crop_messaging.R ├── 09_field_analysis_weekly.R ├── 10_CI_report_with_kpis_simple.Rmd ├── 11_yield_prediction_comparison.R ├── 12_temporal_yield_forecasting.R ├── *_utils.R (ci_extraction, growth_model, kpi, report, crop_messaging) ├── parameters_project.R ├── package_manager.R ├── renv.lock ├── system_architecture/system_architecture.md (full architecture doc) └── experiments/sar_dashboard/ ├── download_s1_*.py ├── generate_sar_report.R └── sar_dashboard_utils.R python_app/ ├── 00_download_8band_pu_optimized.py ├── download_planet_missing_dates.py ├── 01_harvest_baseline_prediction.py ├── 02_harvest_imminent_weekly.py ├── model_307.pt └── requirements_*.txt laravel_app/ ├── storage/app/{project}/ │ ├── merged_tif/ │ ├── daily_tiles_split/ │ ├── combined_CI/ │ └── weekly_mosaic/ └── ... (standard Laravel) output/ └── (all generated reports, Excel, Word, HTML) ``` ### Critical Thinking & Partnership - **Ask clarifying questions** before implementing: - "Why do you want to modify this stage? What problem are we solving?" - "Have you considered [alternative approach]? It might be simpler/faster/cleaner" - "Is this for a specific project (angata/esa/chemba)? That affects which configs to change" - **Suggest alternatives**: - If a request seems inefficient, propose better options - Point out if changes might affect other stages or projects - Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project") - **Challenge assumptions**: - "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?" - "Is this a one-time fix or a pattern we should refactor?" - "Will this break existing reports or KPI calculations?" - **Be a thinking partner, not an order-taker**: Help you make better decisions, not just execute requests ## Linear Issue Integration ### Creating Issues To create a Linear issue, simply ask in the chat: ``` "Create a Linear issue: Fix hard-coded paths in Stage 02 CI extraction" ``` Provide context like: - **Title**: Clear, action-oriented (what needs doing) - **Description**: Problem statement, impact, acceptance criteria - **Project**: Which project (Inception Phase Angata, General backlog, etc.) - **Priority**: High, Medium, Low - **Assignee**: Who should work on it (e.g., yourself, Dimitra) - **Related issues**: Reference other issues if it blocks/relates to them Example: ``` Create Linear issue: Title: Only create tiles that overlap with GeoJSON boundaries Description: Master grid creates all 25 tiles even when empty. Filter by field geometry to save storage. Project: Inception Phase Angata Priority: Medium Related: SC-50 (parameterization work) ``` ### Referencing Issues In chat, reference issues by their ID to include full context: ``` "Work on SC-59 - update the system architecture documentation" ``` This pulls the issue details into the conversation so I understand the full scope. ### Workflow 1. **Create issue** → Clear task definition 2. **Reference issue in chat** → I fetch details automatically 3. **Ask for implementation** → I work toward issue's acceptance criteria 4. **Close issue** → Mark as done in Linear, summarize in chat ## Debugging & Troubleshooting ### Data Validation Checkpoints After each major stage, verify: - **Post-Download**: merged_tif/ contains expected date ranges; file sizes ~150-300MB each - **Post-CI Extraction**: combined_CI_data.rds dimensions match (# fields × # dates); no all-NA columns - **Post-Growth Model**: Interpolated values are within expected CI range; no unexpected gaps - **Pre-Reporting**: Weekly mosaic TIF has 5 bands; field analysis RDS has KPI columns present ## Next Steps for AI Agents 1. **Understanding a Script**: Check `parameters_project.R` first for config, then trace utility functions 2. **Adding Features**: Determine which stage (01-06 or experimental) it belongs to; follow existing pattern in that stage 3. **Testing**: Use standalone test data in `r_app/experiments/` or small date ranges with `--start/--end` flags 4. **Documentation**: Update `r_app/system_architecture/system_architecture.md` when architecture changes 5. **Refactoring**: Avoid hard-coded paths; parameterize and test across multiple projects (angata, esa, etc.) --- _For detailed system architecture, see `r_app/system_architecture/system_architecture.md`.