Timon fc7e5f1ee0 Enhance download progress feedback and optimize tile overlap checks

- Added a progress bar to the tile download process for better user feedback.
- Simplified the tile overlap checking logic in the R script to improve performance and readability.

2026-01-13 11:30:38 +01:00

14 KiB

Raw Blame History

SmartCane Copilot Instructions for AI Coding Agents

Architecture Overview

SmartCane is a multi-stage agricultural intelligence platform that processes satellite imagery into crop health analysis and harvest predictions. The system spans Python, R, and PHP, with a file-based processing architecture.

Core Data Pipeline

[Python] Download 4-band imagery
    → [R Stage 01] Create tile grid & split into 25 tiles/day
    → [R Stage 02] Extract CI (Canopy Index) per field
    → [R Stage 03] Growth model interpolation (smooth time series)
    → [R Stage 04] Create weekly 5-band mosaics
    → [R Stage 05] Field-level KPI calculation & alerting
    → [R Stage 06] Generate Word/HTML reports

Key Data Flow:

Raw 4-band GeoTIFFs (RGB+NIR, uint16) downloaded from Planet API via Python
Stored in: laravel_app/storage/app/{project}/merged_tif/
Field boundaries (pivot.geojson) used across all stages for masking/analysis
Outputs: RDS (intermediate data), TIF (rasters), Excel/Word (reports)

Main Components

Component	Purpose	Key Files
`r_app/`	Core 6-stage R pipeline	`01-10_.R`, `_utils.R`, `parameters_project.R`
`python_app/`	Satellite download & ML harvest prediction	`00_download_8band_pu_optimized.py`, `01-02_harvest_*.py`
`r_app/experiments/sar_dashboard/`	SAR (Sentinel-1 radar) analysis	`download_s1_*.py`, `generate_sar_report.R`
`laravel_app/`	Web dashboard (optional integration)	Standard Laravel structure

Critical Developer Workflows

1. R Package Setup (DO FIRST)

Rscript r_app/package_manager.R
# Manages renv.lock; commit this but NOT renv/ folder

2. Full Pipeline (Typical Weekly Run)

# 2a. Download satellite data
cd python_app
python 00_download_8band_pu_optimized.py angata  # or chemba, xinavane, etc.

# 2b. Run full R pipeline
cd ../r_app
Rscript 01_create_master_grid_and_split_tiffs.R
Rscript 02_ci_extraction.R
Rscript 03_interpolate_growth_model.R
Rscript 04_mosaic_creation.R
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd

3. Python Data Download

Script: 00_download_8band_pu_optimized.py
Usage: python 00_download_8band_pu_optimized.py [PROJECT] [--options]
Key Options: --date, --resolution, --clear-all
Batch Mode: python download_planet_missing_dates.py --start 2025-11-01 --end 2025-12-24 --project angata
Output: laravel_app/storage/app/{project}/merged_tif/{YYYY-MM-DD}.tif (4-band uint16)
Cost: ~1,500-2,000 PU/date (optimized for cloud masking & bbox reduction)

4. Stage-Specific Execution

# Only KPI + reporting (if earlier stages done):
Rscript 09_field_analysis_weekly.R
Rscript 10_CI_report_with_kpis_simple.Rmd

# Crop messaging/alerts (WhatsApp-ready output):
Rscript 06_crop_messaging.R [week] [prev_week] [estate_name]

# Experimental harvest prediction:
Rscript 11_yield_prediction_comparison.R
Rscript 12_temporal_yield_forecasting.R

# SAR (radar) analysis:
cd r_app/experiments/sar_dashboard
python download_s1_simba.py && Rscript generate_sar_report.R simba

Project-Specific Conventions

Configuration & Parameterization

Central config: r_app/parameters_project.R — sets PROJECT, paths, field boundaries, data_dir
Project names: angata, chemba, xinavane, esa, simba (affects file paths, field boundaries, thresholds)
Hard-coded dependencies are problematic (see SC-50, SC-60) — use parameterized paths instead of "pivot.geojson" literals
All scripts source parameters_project.R at start to get global config

Data Storage & Naming

Daily TIFFs: merged_tif/{YYYY-MM-DD}.tif (4 bands: R,G,B,NIR)
Tiles: daily_tiles_split/{YYYY-MM-DD}/{YYYY-MM-DD}_{TILE_ID}.tif (25 tiles per day)
CI data (RDS): combined_CI/combined_CI_data.rds (cumulative, wide format: fields × dates)
Weekly mosaic: weekly_mosaic/week_{WW}.tif (5 bands: R,G,B,NIR,CI)
Outputs: output/ for reports; Excel/Word saved with date/week naming

Field Uniformity & Alerting

Two-dimensional analysis: Time (week-over-week trends) + Space (within-field homogeneity)
Uniformity metric: Coefficient of Variation (CV) — CV < 0.15 (good), < 0.08 (excellent), > 0.25 (poor)
Change thresholds: +0.5 (increase alert), -0.5 (decrease alert) — tunable in code
Alert categories: 🚨 URGENT, ⚠️ ALERT, ✅ POSITIVE, 💡 OPPORTUNITY
Cloud handling: When CI=0 (no data), skip temporal analysis, spatial-only assessment

RDS File Conventions

Wide format: Rows = fields, Columns = dates (CI values)
Key files:
- combined_CI_data.rds — all fields, all dates, cumulative
- All_pivots_Cumulative_CI_quadrant_year_v2.rds — growth model output with quadrant-level analysis
- {project}_kpi_summary_tables_week{WW}.rds — weekly KPI results
RDS I/O: Managed by utility functions (ci_extraction_utils.R, growth_model_utils.R)

Word Report Output

Templates in r_app/ (e.g., 10_CI_report_with_kpis_simple.Rmd)
Use flextable for split-column tables (wide data rendered as multiple tables)
Include interpretation guides (thresholds, units: hectares + acres)
Filenames: SmartCane_Report_week{WW}_{YYYY}.docx

Integration Points & Dependencies

R ↔ Python

Python downloads → R expects merged_tif/ filled with 4-band TIFFs
R harvest prediction (11_*.R, 12_*.R) uses ML models from Python (model_307.pt, model_config.json, scalers)

R ↔ GeoJSON (Field Boundaries)

File: laravel_app/storage/app/{project}/pivot.geojson
Used in: All stages (download, CI extraction, masking, reporting)
Critical: Ensure geometry is current; affects all per-field statistics

R ↔ Harvest Data

File: laravel_app/storage/app/{project}/harvest.xlsx
Used in: Growth model (Stage 03), field analysis (Stage 05), reporting
Format: Date columns, field identifiers, harvest event flags

SAR (Separate Ecosystem, experimental)

Independent Python download + R reporting pipeline
Data stored: r_app/experiments/sar_dashboard/data/{client}/weekly_SAR_mosaic/
Outputs: Word reports with VV/VH backscatter, RVI index, harvest detection
Field boundaries: r_app/experiments/pivot.geojson (project-specific)

Common Patterns & Gotchas

Pattern: Utility Functions for Reusable Logic

ci_extraction_utils.R — tile detection, RDS I/O, CI calculation variants
growth_model_utils.R — interpolation, smoothing, gap-filling
kpi_utils.R — threshold-based alerting, uniformity metrics
Why: Keeps main scripts readable, enables testing individual logic

Pattern: Source Config Once

source("parameters_project.R")  # Sets PROJECT, data_dir, field_boundaries_path, etc.
# All downstream code uses these globals

Gotcha: File Path Dependencies

Problem: Hard-coded paths like file.path("laravel_app/storage/app", PROJECT, "pivot.geojson")
Better: Pass as parameters or configure in parameters_project.R
Benefit: Code reusable across projects; easier testing

Gotcha: Cloud Masking in CI Extraction

If CI == 0 globally for a date → entire date flagged as cloudy
Growth model drops entire date rather than interpolating; impacts trend analysis

AI Agent Behavior & Approach

Terminal Commands

R execution on Windows: Use PowerShell & operator with full R path
- Syntax: & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" script.R
- Example: & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" 01_create_master_grid_and_split_tiffs.R
- The & operator invokes commands with spaces in the path
Verify R path is correct for the Windows installation before running

Testing & Temporary Code

Test file naming: Always include test in filename to signal removal (e.g., test_ci_extraction.R, test_download.py)
Test locations:
- R tests: r_app/test/test_*.R
- Python tests: python_app/test/test_*.py
Default test project: Use angata for development/testing unless specified otherwise
Cleanup: Test files are temporary—learnings are copied to production scripts, then test files are deleted
Do NOT commit test files to main branch

Output & Documentation

NO auto-generated summary files: Do NOT create README.md, SUMMARY.txt, or any markdown/text summaries unless explicitly requested
Chat-based summaries only: Briefly summarize results and findings directly in chat conversation
This keeps the repo clean and avoids clutter; user will ask for docs if needed

Critical Thinking & Partnership

Ask clarifying questions before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
Suggest alternatives:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
Challenge assumptions:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
Be a thinking partner, not an order-taker: Help you make better decisions, not just execute requests

File Structure (Key Locations)

r_app/
  ├── 01_create_master_grid_and_split_tiffs.R
  ├── 02_ci_extraction.R
  ├── 03_interpolate_growth_model.R
  ├── 04_mosaic_creation.R
  ├── 06_crop_messaging.R
  ├── 09_field_analysis_weekly.R
  ├── 10_CI_report_with_kpis_simple.Rmd
  ├── 11_yield_prediction_comparison.R
  ├── 12_temporal_yield_forecasting.R
  ├── *_utils.R (ci_extraction, growth_model, kpi, report, crop_messaging)
  ├── parameters_project.R
  ├── package_manager.R
  ├── renv.lock
  ├── system_architecture/system_architecture.md (full architecture doc)
  └── experiments/sar_dashboard/
        ├── download_s1_*.py
        ├── generate_sar_report.R
        └── sar_dashboard_utils.R

python_app/
  ├── 00_download_8band_pu_optimized.py
  ├── download_planet_missing_dates.py
  ├── 01_harvest_baseline_prediction.py
  ├── 02_harvest_imminent_weekly.py
  ├── model_307.pt
  └── requirements_*.txt

laravel_app/
  ├── storage/app/{project}/
  │   ├── merged_tif/
  │   ├── daily_tiles_split/
  │   ├── combined_CI/
  │   └── weekly_mosaic/
  └── ... (standard Laravel)

output/
  └── (all generated reports, Excel, Word, HTML)

Critical Thinking & Partnership

Ask clarifying questions before implementing:
- "Why do you want to modify this stage? What problem are we solving?"
- "Have you considered [alternative approach]? It might be simpler/faster/cleaner"
- "Is this for a specific project (angata/esa/chemba)? That affects which configs to change"
Suggest alternatives:
- If a request seems inefficient, propose better options
- Point out if changes might affect other stages or projects
- Highlight potential issues (e.g., "This change requires updating parameters_project.R for each project")
Challenge assumptions:
- "Do we need to modify Stage 02, or would a configuration change in parameters_project.R work?"
- "Is this a one-time fix or a pattern we should refactor?"
- "Will this break existing reports or KPI calculations?"
Be a thinking partner, not an order-taker: Help you make better decisions, not just execute requests

Linear Issue Integration

Creating Issues

To create a Linear issue, simply ask in the chat:

"Create a Linear issue: Fix hard-coded paths in Stage 02 CI extraction"

Provide context like:

Title: Clear, action-oriented (what needs doing)
Description: Problem statement, impact, acceptance criteria
Project: Which project (Inception Phase Angata, General backlog, etc.)
Priority: High, Medium, Low
Assignee: Who should work on it (e.g., yourself, Dimitra)
Related issues: Reference other issues if it blocks/relates to them

Example:

Create Linear issue:
Title: Only create tiles that overlap with GeoJSON boundaries
Description: Master grid creates all 25 tiles even when empty. Filter by field geometry to save storage.
Project: Inception Phase Angata
Priority: Medium
Related: SC-50 (parameterization work)

Referencing Issues

In chat, reference issues by their ID to include full context:

"Work on SC-59 - update the system architecture documentation"

This pulls the issue details into the conversation so I understand the full scope.

Workflow

Create issue → Clear task definition
Reference issue in chat → I fetch details automatically
Ask for implementation → I work toward issue's acceptance criteria
Close issue → Mark as done in Linear, summarize in chat

Debugging & Troubleshooting

Data Validation Checkpoints

After each major stage, verify:

Post-Download: merged_tif/ contains expected date ranges; file sizes ~150-300MB each
Post-CI Extraction: combined_CI_data.rds dimensions match (# fields × # dates); no all-NA columns
Post-Growth Model: Interpolated values are within expected CI range; no unexpected gaps
Pre-Reporting: Weekly mosaic TIF has 5 bands; field analysis RDS has KPI columns present

Next Steps for AI Agents

Understanding a Script: Check parameters_project.R first for config, then trace utility functions
Adding Features: Determine which stage (01-06 or experimental) it belongs to; follow existing pattern in that stage
Testing: Use standalone test data in r_app/experiments/ or small date ranges with --start/--end flags
Documentation: Update r_app/system_architecture/system_architecture.md when architecture changes
Refactoring: Avoid hard-coded paths; parameterize and test across multiple projects (angata, esa, etc.)

For detailed system architecture, see r_app/system_architecture/system_architecture.md. For related Linear issues (code quality, architecture docs), see SC-59, SC-60, SC-61.

14 KiB Raw Blame History Unescape Escape

SmartCane Copilot Instructions for AI Coding Agents

Architecture Overview

Core Data Pipeline

Main Components

Critical Developer Workflows

1. R Package Setup (DO FIRST)

2. Full Pipeline (Typical Weekly Run)

3. Python Data Download

4. Stage-Specific Execution

Project-Specific Conventions

Configuration & Parameterization

Data Storage & Naming

Field Uniformity & Alerting

RDS File Conventions

Word Report Output

Integration Points & Dependencies

R ↔ Python

R ↔ GeoJSON (Field Boundaries)

R ↔ Harvest Data

SAR (Separate Ecosystem, experimental)

Common Patterns & Gotchas

Pattern: Utility Functions for Reusable Logic

Pattern: Source Config Once

Gotcha: File Path Dependencies

Gotcha: Cloud Masking in CI Extraction

AI Agent Behavior & Approach

Terminal Commands

Testing & Temporary Code

Output & Documentation

Critical Thinking & Partnership

File Structure (Key Locations)

Critical Thinking & Partnership

Linear Issue Integration

Creating Issues

Referencing Issues

Workflow

Debugging & Troubleshooting

Data Validation Checkpoints

Next Steps for AI Agents

14 KiB

Raw Blame History