277 lines
8.8 KiB
Markdown
277 lines
8.8 KiB
Markdown
# SmartCane Code Review & Architecture Update Summary
|
|
|
|
**Date**: 2025-10-14
|
|
**Reviewer**: GitHub Copilot
|
|
**Requested by**: Timon
|
|
|
|
---
|
|
|
|
## What Was Done
|
|
|
|
### 1. Comprehensive Quality Check ✅
|
|
|
|
Reviewed all main processing scripts and their utility functions:
|
|
- ✅ `01_planet_download.py` / `.ipynb`
|
|
- ✅ `02_ci_extraction.R` + `ci_extraction_utils.R`
|
|
- ✅ `03_interpolate_growth_model.R` + `growth_model_utils.R`
|
|
- ✅ `04_mosaic_creation.R` + `mosaic_creation_utils.R`
|
|
- ✅ `09_calculate_kpis.R` + `kpi_utils.R`
|
|
- ✅ `crop_messaging_utils.R`
|
|
- ✅ `parameters_project.R`
|
|
|
|
**Focus Areas**:
|
|
- Hardcoded values vs parameterization
|
|
- Function purity (no embedded data)
|
|
- Code reusability
|
|
- Error handling
|
|
- Documentation quality
|
|
|
|
### 2. Quality Check Report Created ✅
|
|
|
|
**File**: `r_app/system_architecture/QUALITY_CHECK_REPORT.md`
|
|
|
|
**Key Findings**:
|
|
- **Overall Grade**: B+ (Good, with room for improvement)
|
|
- **Urgent Issues**: 2 (API credentials, threshold inconsistency)
|
|
- **High Priority**: 3 (cloud thresholds, KPI thresholds, placeholder data)
|
|
- **Medium/Low**: Various code cleanup and documentation items
|
|
|
|
**Best Practices Found**:
|
|
- `growth_model_utils.R` - Exemplary code (A+)
|
|
- `ci_extraction_utils.R` - Excellent parameterization (A+)
|
|
- R scripts generally better than Python script
|
|
|
|
**Critical Issues**:
|
|
1. 🚨 **SECURITY**: API credentials hardcoded in `01_planet_download.py`
|
|
2. ⚠️ **CONSISTENCY**: Different thresholds for same metrics in `crop_messaging_utils.R` vs `kpi_utils.R`
|
|
3. ⚠️ **HARDCODED VALUES**: Cloud coverage thresholds (5%, 45%) embedded in code
|
|
4. ⚠️ **HARDCODED VALUES**: All KPI classification thresholds in `case_when` statements
|
|
5. ⚠️ **PLACEHOLDER DATA**: Field sizes generated randomly instead of calculated
|
|
|
|
### 3. System Architecture Documentation Enhanced ✅
|
|
|
|
**File**: `r_app/system_architecture/system_architecture.md`
|
|
|
|
**Added Sections**:
|
|
|
|
#### A. Detailed Data Flow Documentation
|
|
- 8 processing stages with full details
|
|
- Inputs, outputs, and intermediate data for each stage
|
|
- Parameters and thresholds used at each step
|
|
- File naming conventions and directory structure
|
|
- Database vs file system storage decisions
|
|
|
|
#### B. Comprehensive Pipeline Diagram
|
|
- New Mermaid diagram showing complete data flow
|
|
- All 6 processing stages visualized
|
|
- Intermediate data products shown
|
|
- Parameters annotated on diagram
|
|
- Color-coded by stage type
|
|
|
|
#### C. Data Transformation Tracking
|
|
- How data changes format at each stage
|
|
- Wide ↔ long format conversions
|
|
- Raster → statistics extractions
|
|
- 4-band → 5-band transformations
|
|
- Daily → weekly aggregations
|
|
|
|
#### D. Parameters Reference Table
|
|
Complete listing of:
|
|
- Resolution settings
|
|
- Threshold values
|
|
- Cloud coverage limits
|
|
- KPI classification boundaries
|
|
- Temporal parameters (days, weeks)
|
|
|
|
---
|
|
|
|
## Key Insights for Your Colleague
|
|
|
|
### Understanding the Data Flow
|
|
|
|
1. **Start Point**: Raw satellite images (4 bands: R, G, B, NIR)
|
|
2. **First Transform**: Calculate CI = NIR/Green - 1 → 5-band rasters
|
|
3. **Second Transform**: Extract statistics per field → RDS files
|
|
4. **Third Transform**: Interpolate sparse data → continuous growth model
|
|
5. **Fourth Transform**: Composite daily images → weekly mosaics
|
|
6. **Fifth Transform**: Calculate 6 KPIs from mosaics + growth model
|
|
7. **Final Output**: Word/HTML reports with visualizations
|
|
|
|
### Where to Make Changes
|
|
|
|
**If you want to change...**
|
|
|
|
1. **Cloud coverage tolerance**:
|
|
- Currently: 5% (strict), 45% (relaxed)
|
|
- File: `mosaic_creation_utils.R` lines 158-159
|
|
- Recommendation: Move to `parameters_project.R`
|
|
|
|
2. **KPI thresholds** (field uniformity, weed risk, etc.):
|
|
- Currently: Hardcoded in `kpi_utils.R` `case_when` statements
|
|
- Recommendation: Create `analysis_constants.R` file
|
|
- Will affect reporting and classification
|
|
|
|
3. **Satellite resolution**:
|
|
- Currently: 3 meters/pixel
|
|
- File: `01_planet_download.py` line 126
|
|
- Recommendation: Add to config or command-line arg
|
|
|
|
4. **CI formula**:
|
|
- Currently: `(NIR / Green) - 1`
|
|
- File: `ci_extraction_utils.R` line 92
|
|
- Note: This is agronomically specific, change with caution
|
|
|
|
5. **Week numbering system**:
|
|
- Currently: ISO 8601 weeks
|
|
- Files: All mosaic and KPI scripts
|
|
- Note: Would require changes across multiple scripts
|
|
|
|
### Intermediate Data You Can Inspect
|
|
|
|
All stored in: `laravel_app/storage/app/{project}/`
|
|
|
|
1. **Raw daily images**: `merged_tif/{date}.tif` (after download)
|
|
2. **Processed CI rasters**: `merged_final_tif/{date}.tif` (after CI extraction)
|
|
3. **Daily CI statistics**: `Data/extracted_ci/daily_vals/extracted_{date}.rds`
|
|
4. **Cumulative CI data**: `Data/extracted_ci/cumulative_vals/combined_CI_data.rds`
|
|
5. **Growth model**: `Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds`
|
|
6. **Weekly mosaics**: `weekly_mosaic/week_{WW}_{YYYY}.tif`
|
|
7. **KPI results**: `reports/kpis/kpi_results_week{WW}.rds`
|
|
|
|
### Parameters That Control Behavior
|
|
|
|
**Download Stage**:
|
|
- `DAYS` env var → lookback period
|
|
- `DATE` env var → end date
|
|
- `PROJECT_DIR` env var → which project
|
|
- `resolution = 3` → image resolution
|
|
|
|
**CI Extraction Stage**:
|
|
- `end_date` arg → processing date
|
|
- `offset` arg → days to look back
|
|
- `project_dir` arg → project name
|
|
- `min_valid_pixels = 100` → quality threshold
|
|
|
|
**Mosaic Stage**:
|
|
- `CLOUD_THRESHOLD_STRICT = 5%` → preferred images
|
|
- `CLOUD_THRESHOLD_RELAXED = 45%` → acceptable images
|
|
- ISO week numbering → file naming
|
|
|
|
**KPI Stage**:
|
|
- `CV < 0.15` → Excellent uniformity
|
|
- `CV < 0.25` → Good uniformity
|
|
- `>2.0 CI/week` → Weed detection
|
|
- `240 days` → Canopy closure age
|
|
- `±0.5 CI` → Significant change
|
|
|
|
---
|
|
|
|
## Recommendations for Improvement
|
|
|
|
### Immediate Actions (Before Next Run)
|
|
|
|
1. **Fix API credentials**: Move to environment variables
|
|
```bash
|
|
export SENTINEL_HUB_CLIENT_ID="your-id-here"
|
|
export SENTINEL_HUB_CLIENT_SECRET="your-secret-here"
|
|
```
|
|
|
|
2. **Unify thresholds**: Create shared constants file
|
|
```r
|
|
# r_app/analysis_constants.R
|
|
UNIFORMITY_EXCELLENT <- 0.15
|
|
UNIFORMITY_GOOD <- 0.25
|
|
UNIFORMITY_MODERATE <- 0.35
|
|
# ... etc
|
|
```
|
|
|
|
### Short-Term Improvements
|
|
|
|
1. **Extract cloud thresholds** to configuration
|
|
2. **Replace placeholder field sizes** with actual calculations
|
|
3. **Add validation** for input data (dates, files exist, etc.)
|
|
4. **Clean up commented code** throughout
|
|
|
|
### Long-Term Enhancements
|
|
|
|
1. **Configuration system**: YAML/JSON for project-specific settings
|
|
2. **Unit tests**: For utility functions
|
|
3. **Logging improvements**: More detailed progress tracking
|
|
4. **Documentation**: Add agronomic justification for thresholds
|
|
|
|
---
|
|
|
|
## Files Created/Modified
|
|
|
|
### Created:
|
|
1. ✅ `r_app/system_architecture/QUALITY_CHECK_REPORT.md` (comprehensive quality analysis)
|
|
2. ✅ `r_app/system_architecture/REVIEW_SUMMARY.md` (this file)
|
|
|
|
### Modified:
|
|
1. ✅ `r_app/system_architecture/system_architecture.md`:
|
|
- Added detailed data flow section (8 stages)
|
|
- Added comprehensive pipeline diagram
|
|
- Added parameters reference table
|
|
- Added data transformation tracking
|
|
- Added file system structure
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### For You (Timon):
|
|
1. Review `QUALITY_CHECK_REPORT.md` for detailed findings
|
|
2. Prioritize urgent fixes (API credentials, threshold consolidation)
|
|
3. Decide on configuration approach (constants file vs YAML)
|
|
4. Plan timeline for improvements
|
|
|
|
### For Your Colleague:
|
|
1. Read updated `system_architecture.md` for full system understanding
|
|
2. Use the data flow diagram to trace processing steps
|
|
3. Refer to "Where to Make Changes" section when modifying code
|
|
4. Check "Intermediate Data" section when debugging
|
|
|
|
### For the Team:
|
|
1. Discuss threshold standardization approach
|
|
2. Review and approve configuration strategy
|
|
3. Plan testing for any threshold changes
|
|
4. Document agronomic basis for current thresholds
|
|
|
|
---
|
|
|
|
## Questions Answered
|
|
|
|
✅ **Are all functions actual functions?**
|
|
Yes! Functions are well-parameterized. Only minor issues found (mostly constant definitions).
|
|
|
|
✅ **Is there hardcoded data in functions?**
|
|
Some hardcoded thresholds in `kpi_utils.R` case_when statements. Most other functions are clean.
|
|
|
|
✅ **Can graphs work on anything?**
|
|
Yes, visualization functions accept data as parameters, no hardcoded columns.
|
|
|
|
✅ **What data flows where?**
|
|
Fully documented in updated system_architecture.md with detailed 8-stage pipeline.
|
|
|
|
✅ **What parameters are used?**
|
|
Complete reference table added showing all configurable parameters by stage.
|
|
|
|
✅ **Where are intermediate steps saved?**
|
|
Full file system structure documented with all intermediate data locations.
|
|
|
|
✅ **Where can changes be made?**
|
|
"Where to Make Changes" section provides specific files and line numbers.
|
|
|
|
---
|
|
|
|
## Contact
|
|
|
|
For questions about this review:
|
|
- Review created by: GitHub Copilot
|
|
- Date: October 14, 2025
|
|
- Based on SmartCane codebase version as of Oct 2025
|
|
|
|
---
|
|
|
|
**End of Summary**
|