- Added a progress bar to the tile download process for better user feedback. - Simplified the tile overlap checking logic in the R script to improve performance and readability.
14 KiB
14 KiB
SmartCane Copilot Instructions for AI Coding Agents
Architecture Overview
SmartCane is a multi-stage agricultural intelligence platform that processes satellite imagery into crop health analysis and harvest predictions. The system spans Python, R, and PHP, with a file-based processing architecture.
Core Data Pipeline
[Python] Download 4-band imagery
→ [R Stage 01] Create tile grid & split into 25 tiles/day
→ [R Stage 02] Extract CI (Canopy Index) per field
→ [R Stage 03] Growth model interpolation (smooth time series)
→ [R Stage 04] Create weekly 5-band mosaics
→ [R Stage 05] Field-level KPI calculation & alerting
→ [R Stage 06] Generate Word/HTML reports
Key Data Flow:
- Raw 4-band GeoTIFFs (RGB+NIR, uint16) downloaded from Planet API via Python
- Stored in:
laravel_app/storage/app/{project}/merged_tif/ - Field boundaries (
pivot.geojson) used across all stages for masking/analysis - Outputs: RDS (intermediate data), TIF (rasters), Excel/Word (reports)
Main Components
| Component | Purpose | Key Files |
|---|---|---|
r_app/ |
Core 6-stage R pipeline | 01-10_*.R, *_utils.R, parameters_project.R |
python_app/ |
Satellite download & ML harvest prediction | 00_download_8band_pu_optimized.py, 01-02_harvest_*.py |
r_app/experiments/sar_dashboard/ |
SAR (Sentinel-1 radar) analysis | download_s1_*.py, generate_sar_report.R |
laravel_app/ |
Web dashboard (optional integration) | Standard Laravel structure |
Critical Developer Workflows
1. R Package Setup (DO FIRST)
Rscript r_app/package_manager.R
# Manages renv.lock; commit this but NOT renv/ folder
2. Full Pipeline (Typical Weekly Run)
# 2a. Download satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata # or chemba, xinavane, etc.
# 2b. Run full R pipeline
cd ../r_app
Rscript 01_create_master_grid_and_split_tiffs.R
Rscript 02_ci_extraction.R
Rscript 03_interpolate_growth_model.R
Rscript 04_mosaic_creation.R
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd
3. Python Data Download
- Script:
00_download_8band_pu_optimized.py - Usage:
python 00_download_8band_pu_optimized.py [PROJECT] [--options] - Key Options:
--date,--resolution,--clear-all - Batch Mode:
python download_planet_missing_dates.py --start 2025-11-01 --end 2025-12-24 --project angata - Output:
laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif(4-band uint16) - Cost: ~1,500-2,000 PU/date (optimized for cloud masking & bbox reduction)
4. Stage-Specific Execution
# Only KPI + reporting (if earlier stages done):
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd
# Crop messaging/alerts (WhatsApp-ready output):
Rscript 06_crop_messaging.R [week] [prev_week] [estate_name]
# Experimental harvest prediction:
Rscript 11_yield_prediction_comparison.R
Rscript 12_temporal_yield_forecasting.R
# SAR (radar) analysis:
cd r_app/experiments/sar_dashboard
python download_s1_simba.py && Rscript generate_sar_report.R simba
Project-Specific Conventions
Configuration & Parameterization
- Central config:
r_app/parameters_project.R— sets PROJECT, paths, field boundaries, data_dir - Project names: angata, chemba, xinavane, esa, simba (affects file paths, field boundaries, thresholds)
- Hard-coded dependencies are problematic (see SC-50, SC-60) — use parameterized paths instead of
"pivot.geojson"literals - All scripts source
parameters_project.Rat start to get global config
Data Storage & Naming
- Daily TIFFs:
merged_tif/{YYYY-MM-DD}.tif(4 bands: R,G,B,NIR) - Tiles:
daily_tiles_split/{YYYY-MM-DD}/{YYYY-MM-DD}_{TILE_ID}.tif(25 tiles per day) - CI data (RDS):
combined_CI/combined_CI_data.rds(cumulative, wide format: fields × dates) - Weekly mosaic:
weekly_mosaic/week_{WW}.tif(5 bands: R,G,B,NIR,CI) - Outputs:
output/for reports; Excel/Word saved with date/week naming
Field Uniformity & Alerting
- Two-dimensional analysis: Time (week-over-week trends) + Space (within-field homogeneity)
- Uniformity metric: Coefficient of Variation (CV) — CV < 0.15 (good), < 0.08 (excellent), > 0.25 (poor)
- Change thresholds: +0.5 (increase alert), -0.5 (decrease alert) — tunable in code
- Alert categories: 🚨 URGENT, ⚠️ ALERT, ✅ POSITIVE, 💡 OPPORTUNITY
- Cloud handling: When CI=0 (no data), skip temporal analysis, spatial-only assessment
RDS File Conventions
- Wide format: Rows = fields, Columns = dates (CI values)
- Key files:
combined_CI_data.rds— all fields, all dates, cumulativeAll_pivots_Cumulative_CI_quadrant_year_v2.rds— growth model output with quadrant-level analysis{project}_kpi_summary_tables_week{WW}.rds— weekly KPI results
- RDS I/O: Managed by utility functions (
ci_extraction_utils.R,growth_model_utils.R)
Word Report Output
- Templates in
r_app/(e.g.,10_CI_report_with_kpis_simple.Rmd) - Use flextable for split-column tables (wide data rendered as multiple tables)
- Include interpretation guides (thresholds, units: hectares + acres)
- Filenames:
SmartCane_Report_week{WW}_{YYYY}.docx
Integration Points & Dependencies
R ↔ Python
- Python downloads → R expects
merged_tif/filled with 4-band TIFFs - R harvest prediction (
11_*.R,12_*.R) uses ML models from Python (model_307.pt,model_config.json, scalers)
R ↔ GeoJSON (Field Boundaries)
- File:
laravel_app/storage/app/{project}/pivot.geojson - Used in: All stages (download, CI extraction, masking, reporting)
- Critical: Ensure geometry is current; affects all per-field statistics
R ↔ Harvest Data
- File:
laravel_app/storage/app/{project}/harvest.xlsx - Used in: Growth model (Stage 03), field analysis (Stage 05), reporting
- Format: Date columns, field identifiers, harvest event flags
SAR (Separate Ecosystem, experimental)
- Independent Python download + R reporting pipeline
- Data stored:
r_app/experiments/sar_dashboard/data/{client}/weekly_SAR_mosaic/ - Outputs: Word reports with VV/VH backscatter, RVI index, harvest detection
- Field boundaries:
r_app/experiments/pivot.geojson(project-specific)
Common Patterns & Gotchas
Pattern: Utility Functions for Reusable Logic
ci_extraction_utils.R— tile detection, RDS I/O, CI calculation variantsgrowth_model_utils.R— interpolation, smoothing, gap-fillingkpi_utils.R— threshold-based alerting, uniformity metrics- Why: Keeps main scripts readable, enables testing individual logic
Pattern: Source Config Once
source("parameters_project.R") # Sets PROJECT, data_dir, field_boundaries_path, etc.
# All downstream code uses these globals
Gotcha: File Path Dependencies
- Problem: Hard-coded paths like
file.path("laravel_app/storage/app", PROJECT, "pivot.geojson") - Better: Pass as parameters or configure in
parameters_project.R - Benefit: Code reusable across projects; easier testing
Gotcha: Cloud Masking in CI Extraction
- If
CI == 0globally for a date → entire date flagged as cloudy - Growth model drops entire date rather than interpolating; impacts trend analysis
AI Agent Behavior & Approach
Terminal Commands
- R execution on Windows: Use PowerShell
&operator with full R path- Syntax:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" script.R - Example:
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" 01_create_master_grid_and_split_tiffs.R - The
&operator invokes commands with spaces in the path
- Syntax:
- Verify R path is correct for the Windows installation before running
Testing & Temporary Code
- Test file naming: Always include
testin filename to signal removal (e.g.,test_ci_extraction.R,test_download.py) - Test locations:
- R tests:
r_app/test/test_*.R - Python tests:
python_app/test/test_*.py
- R tests:
- Default test project: Use
angatafor development/testing unless specified otherwise - Cleanup: Test files are temporary—learnings are copied to production scripts, then test files are deleted
- Do NOT commit test files to main branch
Output & Documentation
- NO auto-generated summary files: Do NOT create README.md, SUMMARY.txt, or any markdown/text summaries unless explicitly requested
- Chat-based summaries only: Briefly summarize results and findings directly in chat conversation
- This keeps the repo clean and avoids clutter; user will ask for docs if needed
Critical Thinking & Partnership
- Ask clarifying questions before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- Suggest alternatives:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- Challenge assumptions:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
- Be a thinking partner, not an order-taker: Help you make better decisions, not just execute requests
File Structure (Key Locations)
r_app/
├── 01_create_master_grid_and_split_tiffs.R
├── 02_ci_extraction.R
├── 03_interpolate_growth_model.R
├── 04_mosaic_creation.R
├── 06_crop_messaging.R
├── 09_field_analysis_weekly.R
├── 10_CI_report_with_kpis_simple.Rmd
├── 11_yield_prediction_comparison.R
├── 12_temporal_yield_forecasting.R
├── *_utils.R (ci_extraction, growth_model, kpi, report, crop_messaging)
├── parameters_project.R
├── package_manager.R
├── renv.lock
├── system_architecture/system_architecture.md (full architecture doc)
└── experiments/sar_dashboard/
├── download_s1_*.py
├── generate_sar_report.R
└── sar_dashboard_utils.R
python_app/
├── 00_download_8band_pu_optimized.py
├── download_planet_missing_dates.py
├── 01_harvest_baseline_prediction.py
├── 02_harvest_imminent_weekly.py
├── model_307.pt
└── requirements_*.txt
laravel_app/
├── storage/app/{project}/
│ ├── merged_tif/
│ ├── daily_tiles_split/
│ ├── combined_CI/
│ └── weekly_mosaic/
└── ... (standard Laravel)
output/
└── (all generated reports, Excel, Word, HTML)
Critical Thinking & Partnership
- Ask clarifying questions before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
- Suggest alternatives:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
- Challenge assumptions:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
- Be a thinking partner, not an order-taker: Help you make better decisions, not just execute requests
Linear Issue Integration
Creating Issues
To create a Linear issue, simply ask in the chat:
"Create a Linear issue: Fix hard-coded paths in Stage 02 CI extraction"
Provide context like:
- Title: Clear, action-oriented (what needs doing)
- Description: Problem statement, impact, acceptance criteria
- Project: Which project (Inception Phase Angata, General backlog, etc.)
- Priority: High, Medium, Low
- Assignee: Who should work on it (e.g., yourself, Dimitra)
- Related issues: Reference other issues if it blocks/relates to them
Example:
Create Linear issue:
Title: Only create tiles that overlap with GeoJSON boundaries
Description: Master grid creates all 25 tiles even when empty. Filter by field geometry to save storage.
Project: Inception Phase Angata
Priority: Medium
Related: SC-50 (parameterization work)
Referencing Issues
In chat, reference issues by their ID to include full context:
"Work on SC-59 - update the system architecture documentation"
This pulls the issue details into the conversation so I understand the full scope.
Workflow
- Create issue → Clear task definition
- Reference issue in chat → I fetch details automatically
- Ask for implementation → I work toward issue's acceptance criteria
- Close issue → Mark as done in Linear, summarize in chat
Debugging & Troubleshooting
Data Validation Checkpoints
After each major stage, verify:
- Post-Download: merged_tif/ contains expected date ranges; file sizes ~150-300MB each
- Post-CI Extraction: combined_CI_data.rds dimensions match (# fields × # dates); no all-NA columns
- Post-Growth Model: Interpolated values are within expected CI range; no unexpected gaps
- Pre-Reporting: Weekly mosaic TIF has 5 bands; field analysis RDS has KPI columns present
Next Steps for AI Agents
- Understanding a Script: Check
parameters_project.Rfirst for config, then trace utility functions - Adding Features: Determine which stage (01-06 or experimental) it belongs to; follow existing pattern in that stage
- Testing: Use standalone test data in
r_app/experiments/or small date ranges with--start/--endflags - Documentation: Update
r_app/system_architecture/system_architecture.mdwhen architecture changes - Refactoring: Avoid hard-coded paths; parameterize and test across multiple projects (angata, esa, etc.)
For detailed system architecture, see r_app/system_architecture/system_architecture.md. For related Linear issues (code quality, architecture docs), see SC-59, SC-60, SC-61.