SmartCane/r_app/system_architecture/REVIEW_SUMMARY.md
2026-01-06 14:17:37 +01:00

8.8 KiB

SmartCane Code Review & Architecture Update Summary

Date: 2025-10-14
Reviewer: GitHub Copilot
Requested by: Timon


What Was Done

1. Comprehensive Quality Check

Reviewed all main processing scripts and their utility functions:

  • 01_planet_download.py / .ipynb
  • 02_ci_extraction.R + ci_extraction_utils.R
  • 03_interpolate_growth_model.R + growth_model_utils.R
  • 04_mosaic_creation.R + mosaic_creation_utils.R
  • 09_calculate_kpis.R + kpi_utils.R
  • crop_messaging_utils.R
  • parameters_project.R

Focus Areas:

  • Hardcoded values vs parameterization
  • Function purity (no embedded data)
  • Code reusability
  • Error handling
  • Documentation quality

2. Quality Check Report Created

File: r_app/system_architecture/QUALITY_CHECK_REPORT.md

Key Findings:

  • Overall Grade: B+ (Good, with room for improvement)
  • Urgent Issues: 2 (API credentials, threshold inconsistency)
  • High Priority: 3 (cloud thresholds, KPI thresholds, placeholder data)
  • Medium/Low: Various code cleanup and documentation items

Best Practices Found:

  • growth_model_utils.R - Exemplary code (A+)
  • ci_extraction_utils.R - Excellent parameterization (A+)
  • R scripts generally better than Python script

Critical Issues:

  1. 🚨 SECURITY: API credentials hardcoded in 01_planet_download.py
  2. ⚠️ CONSISTENCY: Different thresholds for same metrics in crop_messaging_utils.R vs kpi_utils.R
  3. ⚠️ HARDCODED VALUES: Cloud coverage thresholds (5%, 45%) embedded in code
  4. ⚠️ HARDCODED VALUES: All KPI classification thresholds in case_when statements
  5. ⚠️ PLACEHOLDER DATA: Field sizes generated randomly instead of calculated

3. System Architecture Documentation Enhanced

File: r_app/system_architecture/system_architecture.md

Added Sections:

A. Detailed Data Flow Documentation

  • 8 processing stages with full details
  • Inputs, outputs, and intermediate data for each stage
  • Parameters and thresholds used at each step
  • File naming conventions and directory structure
  • Database vs file system storage decisions

B. Comprehensive Pipeline Diagram

  • New Mermaid diagram showing complete data flow
  • All 6 processing stages visualized
  • Intermediate data products shown
  • Parameters annotated on diagram
  • Color-coded by stage type

C. Data Transformation Tracking

  • How data changes format at each stage
  • Wide ↔ long format conversions
  • Raster → statistics extractions
  • 4-band → 5-band transformations
  • Daily → weekly aggregations

D. Parameters Reference Table

Complete listing of:

  • Resolution settings
  • Threshold values
  • Cloud coverage limits
  • KPI classification boundaries
  • Temporal parameters (days, weeks)

Key Insights for Your Colleague

Understanding the Data Flow

  1. Start Point: Raw satellite images (4 bands: R, G, B, NIR)
  2. First Transform: Calculate CI = NIR/Green - 1 → 5-band rasters
  3. Second Transform: Extract statistics per field → RDS files
  4. Third Transform: Interpolate sparse data → continuous growth model
  5. Fourth Transform: Composite daily images → weekly mosaics
  6. Fifth Transform: Calculate 6 KPIs from mosaics + growth model
  7. Final Output: Word/HTML reports with visualizations

Where to Make Changes

If you want to change...

  1. Cloud coverage tolerance:

    • Currently: 5% (strict), 45% (relaxed)
    • File: mosaic_creation_utils.R lines 158-159
    • Recommendation: Move to parameters_project.R
  2. KPI thresholds (field uniformity, weed risk, etc.):

    • Currently: Hardcoded in kpi_utils.R case_when statements
    • Recommendation: Create analysis_constants.R file
    • Will affect reporting and classification
  3. Satellite resolution:

    • Currently: 3 meters/pixel
    • File: 01_planet_download.py line 126
    • Recommendation: Add to config or command-line arg
  4. CI formula:

    • Currently: (NIR / Green) - 1
    • File: ci_extraction_utils.R line 92
    • Note: This is agronomically specific, change with caution
  5. Week numbering system:

    • Currently: ISO 8601 weeks
    • Files: All mosaic and KPI scripts
    • Note: Would require changes across multiple scripts

Intermediate Data You Can Inspect

All stored in: laravel_app/storage/app/{project}/

  1. Raw daily images: merged_tif/{date}.tif (after download)
  2. Processed CI rasters: merged_final_tif/{date}.tif (after CI extraction)
  3. Daily CI statistics: Data/extracted_ci/daily_vals/extracted_{date}.rds
  4. Cumulative CI data: Data/extracted_ci/cumulative_vals/combined_CI_data.rds
  5. Growth model: Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
  6. Weekly mosaics: weekly_mosaic/week_{WW}_{YYYY}.tif
  7. KPI results: reports/kpis/kpi_results_week{WW}.rds

Parameters That Control Behavior

Download Stage:

  • DAYS env var → lookback period
  • DATE env var → end date
  • PROJECT_DIR env var → which project
  • resolution = 3 → image resolution

CI Extraction Stage:

  • end_date arg → processing date
  • offset arg → days to look back
  • project_dir arg → project name
  • min_valid_pixels = 100 → quality threshold

Mosaic Stage:

  • CLOUD_THRESHOLD_STRICT = 5% → preferred images
  • CLOUD_THRESHOLD_RELAXED = 45% → acceptable images
  • ISO week numbering → file naming

KPI Stage:

  • CV < 0.15 → Excellent uniformity
  • CV < 0.25 → Good uniformity
  • >2.0 CI/week → Weed detection
  • 240 days → Canopy closure age
  • ±0.5 CI → Significant change

Recommendations for Improvement

Immediate Actions (Before Next Run)

  1. Fix API credentials: Move to environment variables

    export SENTINEL_HUB_CLIENT_ID="your-id-here"
    export SENTINEL_HUB_CLIENT_SECRET="your-secret-here"
    
  2. Unify thresholds: Create shared constants file

    # r_app/analysis_constants.R
    UNIFORMITY_EXCELLENT <- 0.15
    UNIFORMITY_GOOD <- 0.25
    UNIFORMITY_MODERATE <- 0.35
    # ... etc
    

Short-Term Improvements

  1. Extract cloud thresholds to configuration
  2. Replace placeholder field sizes with actual calculations
  3. Add validation for input data (dates, files exist, etc.)
  4. Clean up commented code throughout

Long-Term Enhancements

  1. Configuration system: YAML/JSON for project-specific settings
  2. Unit tests: For utility functions
  3. Logging improvements: More detailed progress tracking
  4. Documentation: Add agronomic justification for thresholds

Files Created/Modified

Created:

  1. r_app/system_architecture/QUALITY_CHECK_REPORT.md (comprehensive quality analysis)
  2. r_app/system_architecture/REVIEW_SUMMARY.md (this file)

Modified:

  1. r_app/system_architecture/system_architecture.md:
    • Added detailed data flow section (8 stages)
    • Added comprehensive pipeline diagram
    • Added parameters reference table
    • Added data transformation tracking
    • Added file system structure

Next Steps

For You (Timon):

  1. Review QUALITY_CHECK_REPORT.md for detailed findings
  2. Prioritize urgent fixes (API credentials, threshold consolidation)
  3. Decide on configuration approach (constants file vs YAML)
  4. Plan timeline for improvements

For Your Colleague:

  1. Read updated system_architecture.md for full system understanding
  2. Use the data flow diagram to trace processing steps
  3. Refer to "Where to Make Changes" section when modifying code
  4. Check "Intermediate Data" section when debugging

For the Team:

  1. Discuss threshold standardization approach
  2. Review and approve configuration strategy
  3. Plan testing for any threshold changes
  4. Document agronomic basis for current thresholds

Questions Answered

Are all functions actual functions?
Yes! Functions are well-parameterized. Only minor issues found (mostly constant definitions).

Is there hardcoded data in functions?
Some hardcoded thresholds in kpi_utils.R case_when statements. Most other functions are clean.

Can graphs work on anything?
Yes, visualization functions accept data as parameters, no hardcoded columns.

What data flows where?
Fully documented in updated system_architecture.md with detailed 8-stage pipeline.

What parameters are used?
Complete reference table added showing all configurable parameters by stage.

Where are intermediate steps saved?
Full file system structure documented with all intermediate data locations.

Where can changes be made?
"Where to Make Changes" section provides specific files and line numbers.


Contact

For questions about this review:

  • Review created by: GitHub Copilot
  • Date: October 14, 2025
  • Based on SmartCane codebase version as of Oct 2025

End of Summary