Timon/SmartCane

Fork 0

Timon d22dc2f96e commit all stuff

2026-01-06 14:17:37 +01:00

8.8 KiB

Raw Blame History

SmartCane Code Review & Architecture Update Summary

Date: 2025-10-14
Reviewer: GitHub Copilot
Requested by: Timon

What Was Done

1. Comprehensive Quality Check ✅

Reviewed all main processing scripts and their utility functions:

✅ 01_planet_download.py / .ipynb
✅ 02_ci_extraction.R + ci_extraction_utils.R
✅ 03_interpolate_growth_model.R + growth_model_utils.R
✅ 04_mosaic_creation.R + mosaic_creation_utils.R
✅ 09_calculate_kpis.R + kpi_utils.R
✅ crop_messaging_utils.R
✅ parameters_project.R

Focus Areas:

Hardcoded values vs parameterization
Function purity (no embedded data)
Code reusability
Error handling
Documentation quality

2. Quality Check Report Created ✅

File: r_app/system_architecture/QUALITY_CHECK_REPORT.md

Key Findings:

Overall Grade: B+ (Good, with room for improvement)
Urgent Issues: 2 (API credentials, threshold inconsistency)
High Priority: 3 (cloud thresholds, KPI thresholds, placeholder data)
Medium/Low: Various code cleanup and documentation items

Best Practices Found:

growth_model_utils.R - Exemplary code (A+)
ci_extraction_utils.R - Excellent parameterization (A+)
R scripts generally better than Python script

Critical Issues:

🚨 SECURITY: API credentials hardcoded in 01_planet_download.py
⚠️ CONSISTENCY: Different thresholds for same metrics in crop_messaging_utils.R vs kpi_utils.R
⚠️ HARDCODED VALUES: Cloud coverage thresholds (5%, 45%) embedded in code
⚠️ HARDCODED VALUES: All KPI classification thresholds in case_when statements
⚠️ PLACEHOLDER DATA: Field sizes generated randomly instead of calculated

3. System Architecture Documentation Enhanced ✅

File: r_app/system_architecture/system_architecture.md

Added Sections:

A. Detailed Data Flow Documentation

8 processing stages with full details
Inputs, outputs, and intermediate data for each stage
Parameters and thresholds used at each step
File naming conventions and directory structure
Database vs file system storage decisions

B. Comprehensive Pipeline Diagram

New Mermaid diagram showing complete data flow
All 6 processing stages visualized
Intermediate data products shown
Parameters annotated on diagram
Color-coded by stage type

C. Data Transformation Tracking

How data changes format at each stage
Wide ↔ long format conversions
Raster → statistics extractions
4-band → 5-band transformations
Daily → weekly aggregations

D. Parameters Reference Table

Complete listing of:

Resolution settings
Threshold values
Cloud coverage limits
KPI classification boundaries
Temporal parameters (days, weeks)

Key Insights for Your Colleague

Understanding the Data Flow

Start Point: Raw satellite images (4 bands: R, G, B, NIR)
First Transform: Calculate CI = NIR/Green - 1 → 5-band rasters
Second Transform: Extract statistics per field → RDS files
Third Transform: Interpolate sparse data → continuous growth model
Fourth Transform: Composite daily images → weekly mosaics
Fifth Transform: Calculate 6 KPIs from mosaics + growth model
Final Output: Word/HTML reports with visualizations

Where to Make Changes

If you want to change...

Cloud coverage tolerance:
- Currently: 5% (strict), 45% (relaxed)
- File: mosaic_creation_utils.R lines 158-159
- Recommendation: Move to parameters_project.R
KPI thresholds (field uniformity, weed risk, etc.):
- Currently: Hardcoded in kpi_utils.R case_when statements
- Recommendation: Create analysis_constants.R file
- Will affect reporting and classification
Satellite resolution:
- Currently: 3 meters/pixel
- File: 01_planet_download.py line 126
- Recommendation: Add to config or command-line arg
CI formula:
- Currently: (NIR / Green) - 1
- File: ci_extraction_utils.R line 92
- Note: This is agronomically specific, change with caution
Week numbering system:
- Currently: ISO 8601 weeks
- Files: All mosaic and KPI scripts
- Note: Would require changes across multiple scripts

Intermediate Data You Can Inspect

All stored in: laravel_app/storage/app/{project}/

Raw daily images: merged_tif/{date}.tif (after download)
Processed CI rasters: merged_final_tif/{date}.tif (after CI extraction)
Daily CI statistics: Data/extracted_ci/daily_vals/extracted_{date}.rds
Cumulative CI data: Data/extracted_ci/cumulative_vals/combined_CI_data.rds
Growth model: Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds
Weekly mosaics: weekly_mosaic/week_{WW}_{YYYY}.tif
KPI results: reports/kpis/kpi_results_week{WW}.rds

Parameters That Control Behavior

Download Stage:

DAYS env var → lookback period
DATE env var → end date
PROJECT_DIR env var → which project
resolution = 3 → image resolution

CI Extraction Stage:

end_date arg → processing date
offset arg → days to look back
project_dir arg → project name
min_valid_pixels = 100 → quality threshold

Mosaic Stage:

CLOUD_THRESHOLD_STRICT = 5% → preferred images
CLOUD_THRESHOLD_RELAXED = 45% → acceptable images
ISO week numbering → file naming

KPI Stage:

CV < 0.15 → Excellent uniformity
CV < 0.25 → Good uniformity
>2.0 CI/week → Weed detection
240 days → Canopy closure age
±0.5 CI → Significant change

Recommendations for Improvement

Immediate Actions (Before Next Run)

Fix API credentials: Move to environment variables

export SENTINEL_HUB_CLIENT_ID="your-id-here"
export SENTINEL_HUB_CLIENT_SECRET="your-secret-here"

Unify thresholds: Create shared constants file

# r_app/analysis_constants.R
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35
# ... etc

Short-Term Improvements

Extract cloud thresholds to configuration
Replace placeholder field sizes with actual calculations
Add validation for input data (dates, files exist, etc.)
Clean up commented code throughout

Long-Term Enhancements

Configuration system: YAML/JSON for project-specific settings
Unit tests: For utility functions
Logging improvements: More detailed progress tracking
Documentation: Add agronomic justification for thresholds

Files Created/Modified

Created:

✅ r_app/system_architecture/QUALITY_CHECK_REPORT.md (comprehensive quality analysis)
✅ r_app/system_architecture/REVIEW_SUMMARY.md (this file)

Modified:

✅ r_app/system_architecture/system_architecture.md:
- Added detailed data flow section (8 stages)
- Added comprehensive pipeline diagram
- Added parameters reference table
- Added data transformation tracking
- Added file system structure

Next Steps

For You (Timon):

Review QUALITY_CHECK_REPORT.md for detailed findings
Prioritize urgent fixes (API credentials, threshold consolidation)
Decide on configuration approach (constants file vs YAML)
Plan timeline for improvements

For Your Colleague:

Read updated system_architecture.md for full system understanding
Use the data flow diagram to trace processing steps
Refer to "Where to Make Changes" section when modifying code
Check "Intermediate Data" section when debugging

For the Team:

Discuss threshold standardization approach
Review and approve configuration strategy
Plan testing for any threshold changes
Document agronomic basis for current thresholds

Questions Answered

✅ Are all functions actual functions?
Yes! Functions are well-parameterized. Only minor issues found (mostly constant definitions).

✅ Is there hardcoded data in functions?
Some hardcoded thresholds in kpi_utils.R case_when statements. Most other functions are clean.

✅ Can graphs work on anything?
Yes, visualization functions accept data as parameters, no hardcoded columns.

✅ What data flows where?
Fully documented in updated system_architecture.md with detailed 8-stage pipeline.

✅ What parameters are used?
Complete reference table added showing all configurable parameters by stage.

✅ Where are intermediate steps saved?
Full file system structure documented with all intermediate data locations.

✅ Where can changes be made?
"Where to Make Changes" section provides specific files and line numbers.

Contact

For questions about this review:

Review created by: GitHub Copilot
Date: October 14, 2025
Based on SmartCane codebase version as of Oct 2025

End of Summary

8.8 KiB Raw Blame History