SmartCane/webapps/docs/QUALITY_CHECK_REPORT.md

# SmartCane Code Quality Check Report
**Date**: 2025-10-14
**Reviewer**: GitHub Copilot
**Scope**: Main processing scripts and utilities

## Executive Summary

The SmartCane codebase demonstrates **generally good quality** with well-structured utility functions, proper error handling, and clear separation of concerns. However, there are **several areas for improvement** including hardcoded values, inconsistent parameterization, and opportunities for better code reusability.

**Overall Grade**: B+ (Good, with room for improvement)

---

## Detailed Findings by Script

### 1. Python: `01_planet_download.py` / `.ipynb`

#### ✅ Strengths
- **Good parameterization**: Most values pulled from environment variables (`os.environ.get()`)
- **Flexible date handling**: Supports both `DATE` env var and `Sys.Date()` fallback
- **Well-structured functions**: Clear separation (`get_true_color_request_day`, `download_function`, `merge_files`)
- **Error handling**: Try-catch blocks for file operations
- **Reusable**: Functions accept parameters rather than hardcoding

#### ⚠️ Issues Found

**HARDCODED VALUES**:
```python
# Line ~13-14: API Credentials HARDCODED (SECURITY RISK!)
config.sh_client_id = '1a72d811-4f0e-4447-8282-df09608cff44'
config.sh_client_secret = 'FcBlRL29i9ZmTzhmKTv1etSMFs5PxSos'
```
**Impact**: CRITICAL - Credentials exposed in code
**Fix**: Move to environment variables or config file

```python
# Line ~16-17: Collection ID hardcoded
collection_id = 'c691479f-358c-46b1-b0f0-e12b70a9856c'
```
**Impact**: Medium - Would need code change for different collections
**Fix**: Add to configuration

```python
# Line ~22-23: Default project hardcoded
project = 'kibos' # or xinavane or chemba_test_8b
```
**Impact**: Low - Has `os.getenv('PROJECT_DIR', project)` fallback
**Recommendation**: Remove commented projects, keep only default

```python
# Line ~25: Days default hardcoded
days = 9 # change back to 28 which is the default. 3 years is 1095 days.
```
**Impact**: Low - Already handled by `os.environ.get("DAYS", days)`
**Recommendation**: Clean up comments, set consistent default

```python
# Line ~126: Resolution hardcoded
resolution = 3
```
**Impact**: Medium - Should be configurable per project
**Fix**: Add as parameter or config variable

**JUPYTER NOTEBOOK SPECIFIC ISSUES**:
```python
# Line ~88 (notebook): Hardcoded project
project = 'esa' #or xinavane or chemba_test_8b
```
**Impact**: Medium - Requires manual editing for each run
**Fix**: Use input cell or environment variable

```python
# Line ~40 (notebook): Conditional geojson path
geojson_file = Path(BASE_PATH /'Data'/ ('pivot_2.geojson' if project == "esa" else 'pivot.geojson'))
```
**Impact**: Low - Project-specific logic, acceptable
**Note**: Consider documenting why ESA needs different file

#### 🔧 Recommendations
1. **URGENT**: Move API credentials to environment variables
2. **HIGH**: Parameterize resolution via config or command-line arg
3. **MEDIUM**: Clean up commented code and standardize defaults
4. **LOW**: Add docstrings to all functions

---

### 2. R: `02_ci_extraction.R`

#### ✅ Strengths
- **Excellent parameterization**: All major values from command-line args
- **Good defaults**: Sensible fallbacks for missing arguments
- **Error handling**: Proper try-catch blocks with fallback paths
- **Modular design**: Clear separation into utils file
- **Logging**: Uses `log_message()` throughout

#### ⚠️ Issues Found

**NONE SIGNIFICANT** - This script is well-written!

Minor observations:
```r
# Line ~40: Commented default date
#end_date <- "2023-10-01"
```
**Impact**: None - Just cleanup needed
**Fix**: Remove or move to documentation

```r
# Line ~64: Assignment to global environment
assign("ci_extraction_script", ci_extraction_script, envir = .GlobalEnv)
```
**Impact**: Low - Legitimate use for flag passing
**Note**: This is acceptable for script orchestration

#### 🔧 Recommendations
1. **LOW**: Remove commented code
2. **LOW**: Consider adding validation for date ranges (e.g., not in future)

---

### 3. R: `ci_extraction_utils.R`

#### ✅ Strengths
- **Pure functions**: No hardcoded data in function bodies!
- **Excellent documentation**: Comprehensive roxygen comments
- **Error handling**: Proper validation and try-catch blocks
- **Terra optimization**: Uses `terra::global()` for efficiency
- **Reusable**: Functions accept all required parameters

#### ⚠️ Issues Found

**POTENTIAL ISSUE**:
```r
# Line ~85: Hardcoded band names assumption
names(loaded_raster) <- c("Red", "Green", "Blue", "NIR")
```
**Impact**: Low - Assumes 4-band raster structure
**Context**: This is CORRECT for Planet data
**Recommendation**: Add validation to check `terra::nlyr(loaded_raster) >= 4`

```r
# Line ~92: Hardcoded CI calculation
CI <- loaded_raster$NIR / loaded_raster$Green - 1
```
**Impact**: None - This is the CI formula
**Note**: Consider adding comment explaining formula choice

**MINOR CONCERN**:
```r
# Line ~145: Hardcoded min_valid_pixels default
min_valid_pixels = 100
```
**Impact**: Low - Good default, but could be configurable
**Recommendation**: Consider making this a parameter in `parameters_project.R`

#### 🔧 Recommendations
1. **MEDIUM**: Add band count validation before naming
2. **LOW**: Document CI formula with reference
3. **LOW**: Consider making `min_valid_pixels` configurable

---

### 4. R: `03_interpolate_growth_model.R`

#### ✅ Strengths
- **Clean script structure**: Minimal code, delegates to utils
- **Good arg handling**: Command-line args with defaults
- **Proper sourcing**: Handles both default and r_app paths
- **Global env assignment**: Justified for config passing

#### ⚠️ Issues Found

**NONE SIGNIFICANT** - Well-designed script!

Minor note:
```r
# Line ~66: Hardcoded filename
"All_pivots_Cumulative_CI_quadrant_year_v2.rds"
```
**Impact**: None - This is the OUTPUT filename
**Note**: Consider making this a constant in `parameters_project.R`

#### 🔧 Recommendations
1. **LOW**: Move output filenames to parameters file for consistency
2. **LOW**: Add progress messages during year processing loop

---

### 5. R: `growth_model_utils.R`

#### ✅ Strengths
- **EXCELLENT**: No hardcoded values in function bodies
- **Pure functions**: All data passed as parameters
- **Good error handling**: Comprehensive try-catch blocks
- **Detailed logging**: Informative messages at each step
- **Well-documented**: Clear roxygen comments

#### ⚠️ Issues Found

**NONE** - This is exemplary code!

#### 🔧 Recommendations
**No changes needed** - This file serves as a model for others

---

### 6. R: `04_mosaic_creation.R`

#### ✅ Strengths
- **Good parameterization**: Command-line args with defaults
- **Flexible**: Custom output filename support
- **Clear delegation**: Processing logic in utils

#### ⚠️ Issues Found

**COMMENTED DEFAULTS**:
```r
# Line ~27: Commented test dates
#end_date <- "2025-07-22"  # Default date for testing
#end_date <-  "2025-07-08"   # Default date for testing
```
**Impact**: None - Just cleanup needed
**Recommendation**: Remove or move to comments block

**WEEK CALCULATION**:
```r
# mosaic_creation_utils.R Line ~37: ISO week vs regular week
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)
```
**Impact**: None - Consistent use of ISO weeks
**Note**: Ensure ALL scripts use same week system (they do)

#### 🔧 Recommendations
1. **LOW**: Clean up commented test dates
2. **LOW**: Add comment explaining ISO week choice

---

### 7. R: `mosaic_creation_utils.R`

#### ✅ Strengths
- **No hardcoded data**: All parameters passed to functions
- **Sophisticated cloud handling**: Multi-threshold approach
- **Terra optimized**: Efficient raster operations
- **Good documentation**: Clear function purposes

#### ⚠️ Issues Found

**MAGIC NUMBERS** (Cloud thresholds):
```r
# Line ~158-159: Cloud coverage thresholds
missing_pixels_df$thres_5perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 5)
missing_pixels_df$thres_40perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 45)
```
**Impact**: Medium - These thresholds determine mosaic quality
**Fix**: Move to `parameters_project.R` as configurable constants:
```r
CLOUD_THRESHOLD_STRICT <- 5  # Percent
CLOUD_THRESHOLD_RELAXED <- 45  # Percent
```

**REPEATED LOGIC**:
The cloud mask selection logic (lines ~203-269) has substantial code duplication across different threshold conditions.
**Recommendation**: Extract to helper function:
```r
create_mosaic_from_rasters <- function(raster_list, cloud_masks = NULL) { ... }
```

#### 🔧 Recommendations
1. **HIGH**: Extract cloud thresholds to constants
2. **MEDIUM**: Reduce code duplication in mosaic creation logic
3. **LOW**: Add more detailed logging for which images are selected

---

### 8. R: `09_calculate_kpis.R`

#### ✅ Strengths
- **Good structure**: Clean main() function
- **Proper delegation**: All logic in utils
- **Comprehensive**: Handles all 6 KPIs
- **Good error messages**: Clear feedback

#### ⚠️ Issues Found

**NONE SIGNIFICANT** - Well-designed orchestration script

#### 🔧 Recommendations
1. **LOW**: Consider adding KPI validation (e.g., check for required input files)

---

### 9. R: `kpi_utils.R`

#### ✅ Strengths
- **Mostly parameterized**: Functions accept necessary inputs
- **Good documentation**: Clear function purposes
- **Terra optimization**: Efficient raster operations

#### ⚠️ Issues Found

**HARDCODED THRESHOLDS** (Multiple locations):

```r
# Line ~57: Field uniformity thresholds
uniformity_level <- dplyr::case_when(
  cv_value < 0.15 ~ "Excellent",
  cv_value < 0.25 ~ "Good",
  cv_value < 0.35 ~ "Moderate",
  TRUE ~ "Poor"
)
```
**Impact**: HIGH - Core KPI classification logic
**Fix**: Move to constants at file top:
```r
UNIFORMITY_EXCELLENT_THRESHOLD <- 0.15
UNIFORMITY_GOOD_THRESHOLD <- 0.25
UNIFORMITY_MODERATE_THRESHOLD <- 0.35
```

```r
# Line ~598: Weed rapid growth threshold (CHANGED FROM 1.5)
rapid_growth_pixels <- sum(ci_change > 2.0)
```
**Impact**: HIGH - Core weed detection logic
**Fix**: Add to constants:
```r
WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week
```

```r
# Line ~606: Weed risk thresholds (UPDATED)
weed_risk <- dplyr::case_when(
  rapid_growth_pct < 10 ~ "Low",
  rapid_growth_pct < 25 ~ "Moderate",
  TRUE ~ "High"
)
```
**Impact**: HIGH - Core weed KPI classification
**Fix**: Add to constants:
```r
WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent
```

```r
# Line ~560: Field age threshold for weed analysis
if (!is.na(field_age) && field_age >= 240) {
```
**Impact**: MEDIUM - Determines weed analysis scope
**Fix**: Add to constants:
```r
CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months
```

```r
# Line ~545: Growth decline risk thresholds
risk_level <- dplyr::case_when(
  risk_score < 0.5 ~ "Low",
  risk_score < 1.5 ~ "Moderate",
  risk_score < 3.0 ~ "High",
  TRUE ~ "Very-high"
)
```
**Impact**: HIGH - Core decline KPI
**Fix**: Add to constants

**MAGIC NUMBER** (Projection):
```r
# Line ~116, ~447: Equal Earth projection
"EPSG:6933"
```
**Impact**: Low - Consistent across codebase
**Recommendation**: Add constant:
```r
EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations
```

**PLACEHOLDER DATA**:
```r
# Line ~864: Fallback field sizes
field_details$`Field Size (ha)` <- round(runif(nrow(field_details), 2.5, 4.5), 1)
```
**Impact**: MEDIUM - Creates fake data if real data missing
**Context**: Comment says "placeholder - can be calculated from field boundaries"
**Fix**: Actually calculate from `field_boundaries`:
```r
field_details$`Field Size (ha)` <- calculate_field_areas(field_boundaries_sf)
```

#### 🔧 Recommendations
1. **URGENT**: Extract ALL KPI thresholds to constants section
2. **HIGH**: Replace placeholder field sizes with actual calculations
3. **MEDIUM**: Add validation that thresholds make logical sense (e.g., GOOD < MODERATE)
4. **LOW**: Add comments explaining threshold choices (agronomic basis)

---

### 10. R: `crop_messaging_utils.R`

#### ✅ Strengths
- **EXCELLENT constants section**: Thresholds defined at top (lines ~28-43)
- **Good documentation**: Clear threshold purposes
- **Pure functions**: Parameters passed correctly
- **Comprehensive spatial analysis**: Moran's I, entropy, CV

#### ⚠️ Issues Found

**INCONSISTENCY WITH KPI_UTILS**:
```r
# crop_messaging_utils.R:
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
UNIFORMITY_THRESHOLD <- 0.15
EXCELLENT_UNIFORMITY_THRESHOLD <- 0.08
POOR_UNIFORMITY_THRESHOLD <- 0.25

# kpi_utils.R: (HARDCODED IN CASE_WHEN)
cv_value < 0.15 ~ "Excellent"
cv_value < 0.25 ~ "Good"
cv_value < 0.35 ~ "Moderate"
```
**Impact**: HIGH - Different thresholds for same metrics!
**Fix**: Create SHARED constants file:
```r
# r_app/analysis_constants.R
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35
```

#### 🔧 Recommendations
1. **URGENT**: Unify threshold constants across `crop_messaging_utils.R` and `kpi_utils.R`
2. **HIGH**: Create shared constants file: `r_app/analysis_constants.R`
3. **MEDIUM**: Source this file in both utils files

---

### 11. R: `parameters_project.R`

#### ✅ Strengths
- **Centralized configuration**: Single source of truth for paths
- **Clean initialization**: Well-structured setup functions
- **Good error handling**: Fallback for missing files
- **CRS validation**: Proper handling of spatial reference systems

#### ⚠️ Issues Found

**HARDCODED HARVEST DATA COLUMNS**:
```r
# Line ~106-113: Column selection and renaming
dplyr::select(c(
  "field", "sub_field", "year", "season_start",
  "season_end", "age", "sub_area", "tonnage_ha"
))
```
**Impact**: Medium - Assumes exact Excel column names
**Recommendation**: Add validation or flexible column matching

**CONDITIONAL LOGIC**:
```r
# Line ~52-56: Project-specific file selection
use_pivot_2 <- exists("project_dir") && project_dir == "esa" &&
               exists("ci_extraction_script")
```
**Impact**: Low - Works but fragile
**Recommendation**: Consider configuration file:
```yaml
projects:
  esa:
    field_boundaries: pivot_2.geojson
  default:
    field_boundaries: pivot.geojson
```

#### 🔧 Recommendations
1. **MEDIUM**: Add harvest data column validation
2. **LOW**: Consider YAML/JSON config for project-specific settings
3. **LOW**: Add project directory validation (exists, writable)

---

## Summary of Critical Issues

### 🚨 URGENT (Fix Immediately)
1. **Security**: API credentials hardcoded in `01_planet_download.py` (lines 13-14)
2. **Consistency**: Threshold mismatch between `crop_messaging_utils.R` and `kpi_utils.R`

### ⚠️ HIGH Priority (Fix Soon)
1. **Parameterization**: Cloud coverage thresholds in `mosaic_creation_utils.R` (lines 158-159)
2. **Parameterization**: All KPI thresholds in `kpi_utils.R` should be constants
3. **Data Quality**: Replace placeholder field sizes with actual calculations in `kpi_utils.R` (line 864)

### 📋 MEDIUM Priority (Improvement)
1. **Configuration**: Make resolution parameterizable in `01_planet_download.py`
2. **Code Quality**: Reduce duplication in mosaic creation logic
3. **Validation**: Add harvest data column validation in `parameters_project.R`

### 💡 LOW Priority (Nice to Have)
1. **Cleanup**: Remove commented code throughout
2. **Documentation**: Add formula references and threshold justifications
3. **Structure**: Consider shared configuration file for project-specific settings

---

## Recommendations for Improvement

### 1. Create Shared Constants File
```r
# r_app/analysis_constants.R

# Crop Index Change Thresholds
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
CI_CHANGE_DECREASE_THRESHOLD <- -0.5

# Field Uniformity Thresholds (CV as decimal)
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35

# Cloud Coverage Thresholds (percent)
CLOUD_THRESHOLD_STRICT <- 5
CLOUD_THRESHOLD_RELAXED <- 45

# KPI Thresholds
WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week
WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent
CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months

# Spatial Analysis Thresholds
MORAN_THRESHOLD_HIGH <- 0.95
MORAN_THRESHOLD_MODERATE <- 0.85
MORAN_THRESHOLD_LOW <- 0.7

# Projections
EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations
WGS84_PROJECTION <- "EPSG:4326"  # WGS84 for lat/lon
```

### 2. Security Improvements
```python
# 01_planet_download.py - Use environment variables
import os
config.sh_client_id = os.getenv('SENTINEL_HUB_CLIENT_ID')
config.sh_client_secret = os.getenv('SENTINEL_HUB_CLIENT_SECRET')

if not config.sh_client_id or not config.sh_client_secret:
    raise ValueError("Sentinel Hub credentials not found in environment variables")
```

### 3. Configuration File Structure
```yaml
# config/projects.yaml
default:
  field_boundaries: pivot.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45

esa:
  field_boundaries: pivot_2.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45
  notes: "Uses pivot_2.geojson for yield prediction extra fields"

kibos:
  field_boundaries: pivot.geojson
  resolution: 3
```

---

## Testing Recommendations

1. **Unit Tests**: Add tests for utility functions
   - `calculate_cv()` with known datasets
   - `calculate_spatial_autocorrelation()` with synthetic fields
   - `categorize_change()` with boundary cases

2. **Integration Tests**: Test full workflows
   - CI extraction with test imagery
   - Mosaic creation with varying cloud cover
   - KPI calculation with known inputs

3. **Validation**: Add data validation
   - Date ranges (not in future, reasonable past)
   - File existence checks before processing
   - CRS compatibility verification

---

## Code Quality Metrics

| Script | Hardcoded Values | Function Quality | Error Handling | Documentation | Grade |
|--------|------------------|------------------|----------------|---------------|-------|
| `01_planet_download.py` | 5 issues | Good | Good | Minimal | B- |
| `02_ci_extraction.R` | 0 issues | Excellent | Excellent | Good | A |
| `ci_extraction_utils.R` | 1 minor | Excellent | Excellent | Excellent | A+ |
| `03_interpolate_growth_model.R` | 0 issues | Excellent | Excellent | Good | A |
| `growth_model_utils.R` | 0 issues | Excellent | Excellent | Excellent | A+ |
| `04_mosaic_creation.R` | 0 issues | Good | Good | Good | A- |
| `mosaic_creation_utils.R` | 2 issues | Good | Good | Good | B+ |
| `09_calculate_kpis.R` | 0 issues | Good | Good | Good | A- |
| `kpi_utils.R` | 8 issues | Good | Good | Good | B |
| `crop_messaging_utils.R` | 1 issue | Excellent | Excellent | Excellent | A |
| `parameters_project.R` | 1 issue | Excellent | Excellent | Good | A |

**Overall Assessment**: The codebase is **well-structured with good practices**, but needs **threshold consolidation** and **security fixes** for API credentials.

---

## Conclusion

The SmartCane codebase demonstrates solid software engineering practices with good separation of concerns, comprehensive error handling, and mostly well-parameterized functions. The primary areas for improvement are:

1. **Consolidating threshold constants** into a shared file
2. **Securing API credentials** via environment variables
3. **Removing hardcoded magic numbers** from KPI calculations
4. **Cleaning up commented code** and test values

The R code quality is particularly strong, with excellent examples in `growth_model_utils.R` and `ci_extraction_utils.R` that could serve as templates for other modules.