SmartCane/webapps/docs/REVIEW_SUMMARY.md

# SmartCane Code Review & Architecture Update Summary

**Date**: 2025-10-14
**Reviewer**: GitHub Copilot
**Requested by**: Timon

---

## What Was Done

### 1. Comprehensive Quality Check ✅

Reviewed all main processing scripts and their utility functions:
- ✅ `01_planet_download.py` / `.ipynb`
- ✅ `02_ci_extraction.R` + `ci_extraction_utils.R`
- ✅ `03_interpolate_growth_model.R` + `growth_model_utils.R`
- ✅ `04_mosaic_creation.R` + `mosaic_creation_utils.R`
- ✅ `09_calculate_kpis.R` + `kpi_utils.R`
- ✅ `crop_messaging_utils.R`
- ✅ `parameters_project.R`

**Focus Areas**:
- Hardcoded values vs parameterization
- Function purity (no embedded data)
- Code reusability
- Error handling
- Documentation quality

### 2. Quality Check Report Created ✅

**File**: `r_app/system_architecture/QUALITY_CHECK_REPORT.md`

**Key Findings**:
- **Overall Grade**: B+ (Good, with room for improvement)
- **Urgent Issues**: 2 (API credentials, threshold inconsistency)
- **High Priority**: 3 (cloud thresholds, KPI thresholds, placeholder data)
- **Medium/Low**: Various code cleanup and documentation items

**Best Practices Found**:
- `growth_model_utils.R` - Exemplary code (A+)
- `ci_extraction_utils.R` - Excellent parameterization (A+)
- R scripts generally better than Python script

**Critical Issues**:
1. 🚨 **SECURITY**: API credentials hardcoded in `01_planet_download.py`
2. ⚠️ **CONSISTENCY**: Different thresholds for same metrics in `crop_messaging_utils.R` vs `kpi_utils.R`
3. ⚠️ **HARDCODED VALUES**: Cloud coverage thresholds (5%, 45%) embedded in code
4. ⚠️ **HARDCODED VALUES**: All KPI classification thresholds in `case_when` statements
5. ⚠️ **PLACEHOLDER DATA**: Field sizes generated randomly instead of calculated

### 3. System Architecture Documentation Enhanced ✅

**File**: `r_app/system_architecture/system_architecture.md`

**Added Sections**:

#### A. Detailed Data Flow Documentation
- 8 processing stages with full details
- Inputs, outputs, and intermediate data for each stage
- Parameters and thresholds used at each step
- File naming conventions and directory structure
- Database vs file system storage decisions

#### B. Comprehensive Pipeline Diagram
- New Mermaid diagram showing complete data flow
- All 6 processing stages visualized
- Intermediate data products shown
- Parameters annotated on diagram
- Color-coded by stage type

#### C. Data Transformation Tracking
- How data changes format at each stage
- Wide ↔ long format conversions
- Raster → statistics extractions
- 4-band → 5-band transformations
- Daily → weekly aggregations

#### D. Parameters Reference Table
Complete listing of:
- Resolution settings
- Threshold values
- Cloud coverage limits
- KPI classification boundaries
- Temporal parameters (days, weeks)

---

## Key Insights for Your Colleague

### Understanding the Data Flow

1. **Start Point**: Raw satellite images (4 bands: R, G, B, NIR)
2. **First Transform**: Calculate CI = NIR/Green - 1 → 5-band rasters
3. **Second Transform**: Extract statistics per field → RDS files
4. **Third Transform**: Interpolate sparse data → continuous growth model
5. **Fourth Transform**: Composite daily images → weekly mosaics
6. **Fifth Transform**: Calculate 6 KPIs from mosaics + growth model
7. **Final Output**: Word/HTML reports with visualizations

### Where to Make Changes

**If you want to change...**

1. **Cloud coverage tolerance**:
   - Currently: 5% (strict), 45% (relaxed)
   - File: `mosaic_creation_utils.R` lines 158-159
   - Recommendation: Move to `parameters_project.R`

2. **KPI thresholds** (field uniformity, weed risk, etc.):
   - Currently: Hardcoded in `kpi_utils.R` `case_when` statements
   - Recommendation: Create `analysis_constants.R` file
   - Will affect reporting and classification

3. **Satellite resolution**:
   - Currently: 3 meters/pixel
   - File: `01_planet_download.py` line 126
   - Recommendation: Add to config or command-line arg

4. **CI formula**:
   - Currently: `(NIR / Green) - 1`
   - File: `ci_extraction_utils.R` line 92
   - Note: This is agronomically specific, change with caution

5. **Week numbering system**:
   - Currently: ISO 8601 weeks
   - Files: All mosaic and KPI scripts
   - Note: Would require changes across multiple scripts

### Intermediate Data You Can Inspect

All stored in: `laravel_app/storage/app/{project}/`

1. **Raw daily images**: `merged_tif/{date}.tif` (after download)
2. **Processed CI rasters**: `merged_final_tif/{date}.tif` (after CI extraction)
3. **Daily CI statistics**: `Data/extracted_ci/daily_vals/extracted_{date}.rds`
4. **Cumulative CI data**: `Data/extracted_ci/cumulative_vals/combined_CI_data.rds`
5. **Growth model**: `Data/extracted_ci/cumulative_vals/All_pivots_Cumulative_CI_quadrant_year_v2.rds`
6. **Weekly mosaics**: `weekly_mosaic/week_{WW}_{YYYY}.tif`
7. **KPI results**: `reports/kpis/kpi_results_week{WW}.rds`

### Parameters That Control Behavior

**Download Stage**:
- `DAYS` env var → lookback period
- `DATE` env var → end date
- `PROJECT_DIR` env var → which project
- `resolution = 3` → image resolution

**CI Extraction Stage**:
- `end_date` arg → processing date
- `offset` arg → days to look back
- `project_dir` arg → project name
- `min_valid_pixels = 100` → quality threshold

**Mosaic Stage**:
- `CLOUD_THRESHOLD_STRICT = 5%` → preferred images
- `CLOUD_THRESHOLD_RELAXED = 45%` → acceptable images
- ISO week numbering → file naming

**KPI Stage**:
- `CV < 0.15` → Excellent uniformity
- `CV < 0.25` → Good uniformity
- `>2.0 CI/week` → Weed detection
- `240 days` → Canopy closure age
- `±0.5 CI` → Significant change

---

## Recommendations for Improvement

### Immediate Actions (Before Next Run)

1. **Fix API credentials**: Move to environment variables
   ```bash
   export SENTINEL_HUB_CLIENT_ID="your-id-here"
   export SENTINEL_HUB_CLIENT_SECRET="your-secret-here"
   ```

2. **Unify thresholds**: Create shared constants file
   ```r
   # r_app/analysis_constants.R
   UNIFORMITY_EXCELLENT <- 0.15
   UNIFORMITY_GOOD <- 0.25
   UNIFORMITY_MODERATE <- 0.35
   # ... etc
   ```

### Short-Term Improvements

1. **Extract cloud thresholds** to configuration
2. **Replace placeholder field sizes** with actual calculations
3. **Add validation** for input data (dates, files exist, etc.)
4. **Clean up commented code** throughout

### Long-Term Enhancements

1. **Configuration system**: YAML/JSON for project-specific settings
2. **Unit tests**: For utility functions
3. **Logging improvements**: More detailed progress tracking
4. **Documentation**: Add agronomic justification for thresholds

---

## Files Created/Modified

### Created:
1. ✅ `r_app/system_architecture/QUALITY_CHECK_REPORT.md` (comprehensive quality analysis)
2. ✅ `r_app/system_architecture/REVIEW_SUMMARY.md` (this file)

### Modified:
1. ✅ `r_app/system_architecture/system_architecture.md`:
   - Added detailed data flow section (8 stages)
   - Added comprehensive pipeline diagram
   - Added parameters reference table
   - Added data transformation tracking
   - Added file system structure

---

## Next Steps

### For You (Timon):
1. Review `QUALITY_CHECK_REPORT.md` for detailed findings
2. Prioritize urgent fixes (API credentials, threshold consolidation)
3. Decide on configuration approach (constants file vs YAML)
4. Plan timeline for improvements

### For Your Colleague:
1. Read updated `system_architecture.md` for full system understanding
2. Use the data flow diagram to trace processing steps
3. Refer to "Where to Make Changes" section when modifying code
4. Check "Intermediate Data" section when debugging

### For the Team:
1. Discuss threshold standardization approach
2. Review and approve configuration strategy
3. Plan testing for any threshold changes
4. Document agronomic basis for current thresholds

---

## Questions Answered

✅ **Are all functions actual functions?**
Yes! Functions are well-parameterized. Only minor issues found (mostly constant definitions).

✅ **Is there hardcoded data in functions?**
Some hardcoded thresholds in `kpi_utils.R` case_when statements. Most other functions are clean.

✅ **Can graphs work on anything?**
Yes, visualization functions accept data as parameters, no hardcoded columns.

✅ **What data flows where?**
Fully documented in updated system_architecture.md with detailed 8-stage pipeline.

✅ **What parameters are used?**
Complete reference table added showing all configurable parameters by stage.

✅ **Where are intermediate steps saved?**
Full file system structure documented with all intermediate data locations.

✅ **Where can changes be made?**
"Where to Make Changes" section provides specific files and line numbers.

---

## Contact

For questions about this review:
- Review created by: GitHub Copilot
- Date: October 14, 2025
- Based on SmartCane codebase version as of Oct 2025

---

**End of Summary**