8.8 KiB
SmartCane Code Review & Architecture Update Summary
Date: 2025-10-14
Reviewer: GitHub Copilot
Requested by: Timon
What Was Done
1. Comprehensive Quality Check ✅
Reviewed all main processing scripts and their utility functions:
- ✅
01_planet_download.py/.ipynb - ✅
02_ci_extraction.R+ci_extraction_utils.R - ✅
03_interpolate_growth_model.R+growth_model_utils.R - ✅
04_mosaic_creation.R+mosaic_creation_utils.R - ✅
09_calculate_kpis.R+kpi_utils.R - ✅
crop_messaging_utils.R - ✅
parameters_project.R
Focus Areas:
- Hardcoded values vs parameterization
- Function purity (no embedded data)
- Code reusability
- Error handling
- Documentation quality
2. Quality Check Report Created ✅
File: r_app/system_architecture/QUALITY_CHECK_REPORT.md
Key Findings:
- Overall Grade: B+ (Good, with room for improvement)
- Urgent Issues: 2 (API credentials, threshold inconsistency)
- High Priority: 3 (cloud thresholds, KPI thresholds, placeholder data)
- Medium/Low: Various code cleanup and documentation items
Best Practices Found:
growth_model_utils.R- Exemplary code (A+)ci_extraction_utils.R- Excellent parameterization (A+)- R scripts generally better than Python script
Critical Issues:
- 🚨 SECURITY: API credentials hardcoded in
01_planet_download.py - ⚠️ CONSISTENCY: Different thresholds for same metrics in
crop_messaging_utils.Rvskpi_utils.R - ⚠️ HARDCODED VALUES: Cloud coverage thresholds (5%, 45%) embedded in code
- ⚠️ HARDCODED VALUES: All KPI classification thresholds in
case_whenstatements - ⚠️ PLACEHOLDER DATA: Field sizes generated randomly instead of calculated
3. System Architecture Documentation Enhanced ✅
File: r_app/system_architecture/system_architecture.md
Added Sections:
A. Detailed Data Flow Documentation
- 8 processing stages with full details
- Inputs, outputs, and intermediate data for each stage
- Parameters and thresholds used at each step
- File naming conventions and directory structure
- Database vs file system storage decisions
B. Comprehensive Pipeline Diagram
- New Mermaid diagram showing complete data flow
- All 6 processing stages visualized
- Intermediate data products shown
- Parameters annotated on diagram
- Color-coded by stage type
C. Data Transformation Tracking
- How data changes format at each stage
- Wide ↔ long format conversions
- Raster → statistics extractions
- 4-band → 5-band transformations
- Daily → weekly aggregations
D. Parameters Reference Table
Complete listing of:
- Resolution settings
- Threshold values
- Cloud coverage limits
- KPI classification boundaries
- Temporal parameters (days, weeks)
Key Insights for Your Colleague
Understanding the Data Flow
- Start Point: Raw satellite images (4 bands: R, G, B, NIR)
- First Transform: Calculate CI = NIR/Green - 1 → 5-band rasters
- Second Transform: Extract statistics per field → RDS files
- Third Transform: Interpolate sparse data → continuous growth model
- Fourth Transform: Composite daily images → weekly mosaics
- Fifth Transform: Calculate 6 KPIs from mosaics + growth model
- Final Output: Word/HTML reports with visualizations
Where to Make Changes
If you want to change...
-
Cloud coverage tolerance:
- Currently: 5% (strict), 45% (relaxed)
- File:
mosaic_creation_utils.Rlines 158-159 - Recommendation: Move to
parameters_project.R
-
KPI thresholds (field uniformity, weed risk, etc.):
- Currently: Hardcoded in
kpi_utils.Rcase_whenstatements - Recommendation: Create
analysis_constants.Rfile - Will affect reporting and classification
- Currently: Hardcoded in
-
Satellite resolution:
- Currently: 3 meters/pixel
- File:
01_planet_download.pyline 126 - Recommendation: Add to config or command-line arg
-
CI formula:
- Currently:
(NIR / Green) - 1 - File:
ci_extraction_utils.Rline 92 - Note: This is agronomically specific, change with caution
- Currently:
-
Week numbering system:
- Currently: ISO 8601 weeks
- Files: All mosaic and KPI scripts
- Note: Would require changes across multiple scripts
Intermediate Data You Can Inspect
All stored in: laravel_app/storage/app/{project}/
- Raw daily images:
merged_tif/{date}.tif(after download) - Processed CI rasters:
merged_final_tif/{date}.tif(after CI extraction) - Daily CI statistics:
Data/extracted_ci/daily_vals/extracted_{date}.rds - Cumulative CI data:
Data/extracted_ci/cumulative_vals/combined_CI_data.rds - Growth model:
Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds - Weekly mosaics:
weekly_mosaic/week_{WW}_{YYYY}.tif - KPI results:
reports/kpis/kpi_results_week{WW}.rds
Parameters That Control Behavior
Download Stage:
DAYSenv var → lookback periodDATEenv var → end datePROJECT_DIRenv var → which projectresolution = 3→ image resolution
CI Extraction Stage:
end_datearg → processing dateoffsetarg → days to look backproject_dirarg → project namemin_valid_pixels = 100→ quality threshold
Mosaic Stage:
CLOUD_THRESHOLD_STRICT = 5%→ preferred imagesCLOUD_THRESHOLD_RELAXED = 45%→ acceptable images- ISO week numbering → file naming
KPI Stage:
CV < 0.15→ Excellent uniformityCV < 0.25→ Good uniformity>2.0 CI/week→ Weed detection240 days→ Canopy closure age±0.5 CI→ Significant change
Recommendations for Improvement
Immediate Actions (Before Next Run)
-
Fix API credentials: Move to environment variables
export SENTINEL_HUB_CLIENT_ID="your-id-here" export SENTINEL_HUB_CLIENT_SECRET="your-secret-here" -
Unify thresholds: Create shared constants file
# r_app/analysis_constants.R UNIFORMITY_EXCELLENT <- 0.15 UNIFORMITY_GOOD <- 0.25 UNIFORMITY_MODERATE <- 0.35 # ... etc
Short-Term Improvements
- Extract cloud thresholds to configuration
- Replace placeholder field sizes with actual calculations
- Add validation for input data (dates, files exist, etc.)
- Clean up commented code throughout
Long-Term Enhancements
- Configuration system: YAML/JSON for project-specific settings
- Unit tests: For utility functions
- Logging improvements: More detailed progress tracking
- Documentation: Add agronomic justification for thresholds
Files Created/Modified
Created:
- ✅
r_app/system_architecture/QUALITY_CHECK_REPORT.md(comprehensive quality analysis) - ✅
r_app/system_architecture/REVIEW_SUMMARY.md(this file)
Modified:
- ✅
r_app/system_architecture/system_architecture.md:- Added detailed data flow section (8 stages)
- Added comprehensive pipeline diagram
- Added parameters reference table
- Added data transformation tracking
- Added file system structure
Next Steps
For You (Timon):
- Review
QUALITY_CHECK_REPORT.mdfor detailed findings - Prioritize urgent fixes (API credentials, threshold consolidation)
- Decide on configuration approach (constants file vs YAML)
- Plan timeline for improvements
For Your Colleague:
- Read updated
system_architecture.mdfor full system understanding - Use the data flow diagram to trace processing steps
- Refer to "Where to Make Changes" section when modifying code
- Check "Intermediate Data" section when debugging
For the Team:
- Discuss threshold standardization approach
- Review and approve configuration strategy
- Plan testing for any threshold changes
- Document agronomic basis for current thresholds
Questions Answered
✅ Are all functions actual functions?
Yes! Functions are well-parameterized. Only minor issues found (mostly constant definitions).
✅ Is there hardcoded data in functions?
Some hardcoded thresholds in kpi_utils.R case_when statements. Most other functions are clean.
✅ Can graphs work on anything?
Yes, visualization functions accept data as parameters, no hardcoded columns.
✅ What data flows where?
Fully documented in updated system_architecture.md with detailed 8-stage pipeline.
✅ What parameters are used?
Complete reference table added showing all configurable parameters by stage.
✅ Where are intermediate steps saved?
Full file system structure documented with all intermediate data locations.
✅ Where can changes be made?
"Where to Make Changes" section provides specific files and line numbers.
Contact
For questions about this review:
- Review created by: GitHub Copilot
- Date: October 14, 2025
- Based on SmartCane codebase version as of Oct 2025
End of Summary