# SmartCane Code Quality Check Report **Date**: 2025-10-14 **Reviewer**: GitHub Copilot **Scope**: Main processing scripts and utilities ## Executive Summary The SmartCane codebase demonstrates **generally good quality** with well-structured utility functions, proper error handling, and clear separation of concerns. However, there are **several areas for improvement** including hardcoded values, inconsistent parameterization, and opportunities for better code reusability. **Overall Grade**: B+ (Good, with room for improvement) --- ## Detailed Findings by Script ### 1. Python: `01_planet_download.py` / `.ipynb` #### ✅ Strengths - **Good parameterization**: Most values pulled from environment variables (`os.environ.get()`) - **Flexible date handling**: Supports both `DATE` env var and `Sys.Date()` fallback - **Well-structured functions**: Clear separation (`get_true_color_request_day`, `download_function`, `merge_files`) - **Error handling**: Try-catch blocks for file operations - **Reusable**: Functions accept parameters rather than hardcoding #### ⚠️ Issues Found **HARDCODED VALUES**: ```python # Line ~13-14: API Credentials HARDCODED (SECURITY RISK!) config.sh_client_id = '1a72d811-4f0e-4447-8282-df09608cff44' config.sh_client_secret = 'FcBlRL29i9ZmTzhmKTv1etSMFs5PxSos' ``` **Impact**: CRITICAL - Credentials exposed in code **Fix**: Move to environment variables or config file ```python # Line ~16-17: Collection ID hardcoded collection_id = 'c691479f-358c-46b1-b0f0-e12b70a9856c' ``` **Impact**: Medium - Would need code change for different collections **Fix**: Add to configuration ```python # Line ~22-23: Default project hardcoded project = 'kibos' # or xinavane or chemba_test_8b ``` **Impact**: Low - Has `os.getenv('PROJECT_DIR', project)` fallback **Recommendation**: Remove commented projects, keep only default ```python # Line ~25: Days default hardcoded days = 9 # change back to 28 which is the default. 3 years is 1095 days. ``` **Impact**: Low - Already handled by `os.environ.get("DAYS", days)` **Recommendation**: Clean up comments, set consistent default ```python # Line ~126: Resolution hardcoded resolution = 3 ``` **Impact**: Medium - Should be configurable per project **Fix**: Add as parameter or config variable **JUPYTER NOTEBOOK SPECIFIC ISSUES**: ```python # Line ~88 (notebook): Hardcoded project project = 'esa' #or xinavane or chemba_test_8b ``` **Impact**: Medium - Requires manual editing for each run **Fix**: Use input cell or environment variable ```python # Line ~40 (notebook): Conditional geojson path geojson_file = Path(BASE_PATH /'Data'/ ('pivot_2.geojson' if project == "esa" else 'pivot.geojson')) ``` **Impact**: Low - Project-specific logic, acceptable **Note**: Consider documenting why ESA needs different file #### 🔧 Recommendations 1. **URGENT**: Move API credentials to environment variables 2. **HIGH**: Parameterize resolution via config or command-line arg 3. **MEDIUM**: Clean up commented code and standardize defaults 4. **LOW**: Add docstrings to all functions --- ### 2. R: `02_ci_extraction.R` #### ✅ Strengths - **Excellent parameterization**: All major values from command-line args - **Good defaults**: Sensible fallbacks for missing arguments - **Error handling**: Proper try-catch blocks with fallback paths - **Modular design**: Clear separation into utils file - **Logging**: Uses `log_message()` throughout #### ⚠️ Issues Found **NONE SIGNIFICANT** - This script is well-written! Minor observations: ```r # Line ~40: Commented default date #end_date <- "2023-10-01" ``` **Impact**: None - Just cleanup needed **Fix**: Remove or move to documentation ```r # Line ~64: Assignment to global environment assign("ci_extraction_script", ci_extraction_script, envir = .GlobalEnv) ``` **Impact**: Low - Legitimate use for flag passing **Note**: This is acceptable for script orchestration #### 🔧 Recommendations 1. **LOW**: Remove commented code 2. **LOW**: Consider adding validation for date ranges (e.g., not in future) --- ### 3. R: `ci_extraction_utils.R` #### ✅ Strengths - **Pure functions**: No hardcoded data in function bodies! - **Excellent documentation**: Comprehensive roxygen comments - **Error handling**: Proper validation and try-catch blocks - **Terra optimization**: Uses `terra::global()` for efficiency - **Reusable**: Functions accept all required parameters #### ⚠️ Issues Found **POTENTIAL ISSUE**: ```r # Line ~85: Hardcoded band names assumption names(loaded_raster) <- c("Red", "Green", "Blue", "NIR") ``` **Impact**: Low - Assumes 4-band raster structure **Context**: This is CORRECT for Planet data **Recommendation**: Add validation to check `terra::nlyr(loaded_raster) >= 4` ```r # Line ~92: Hardcoded CI calculation CI <- loaded_raster$NIR / loaded_raster$Green - 1 ``` **Impact**: None - This is the CI formula **Note**: Consider adding comment explaining formula choice **MINOR CONCERN**: ```r # Line ~145: Hardcoded min_valid_pixels default min_valid_pixels = 100 ``` **Impact**: Low - Good default, but could be configurable **Recommendation**: Consider making this a parameter in `parameters_project.R` #### 🔧 Recommendations 1. **MEDIUM**: Add band count validation before naming 2. **LOW**: Document CI formula with reference 3. **LOW**: Consider making `min_valid_pixels` configurable --- ### 4. R: `03_interpolate_growth_model.R` #### ✅ Strengths - **Clean script structure**: Minimal code, delegates to utils - **Good arg handling**: Command-line args with defaults - **Proper sourcing**: Handles both default and r_app paths - **Global env assignment**: Justified for config passing #### ⚠️ Issues Found **NONE SIGNIFICANT** - Well-designed script! Minor note: ```r # Line ~66: Hardcoded filename "All_pivots_Cumulative_CI_quadrant_year_v2.rds" ``` **Impact**: None - This is the OUTPUT filename **Note**: Consider making this a constant in `parameters_project.R` #### 🔧 Recommendations 1. **LOW**: Move output filenames to parameters file for consistency 2. **LOW**: Add progress messages during year processing loop --- ### 5. R: `growth_model_utils.R` #### ✅ Strengths - **EXCELLENT**: No hardcoded values in function bodies - **Pure functions**: All data passed as parameters - **Good error handling**: Comprehensive try-catch blocks - **Detailed logging**: Informative messages at each step - **Well-documented**: Clear roxygen comments #### ⚠️ Issues Found **NONE** - This is exemplary code! #### 🔧 Recommendations **No changes needed** - This file serves as a model for others --- ### 6. R: `04_mosaic_creation.R` #### ✅ Strengths - **Good parameterization**: Command-line args with defaults - **Flexible**: Custom output filename support - **Clear delegation**: Processing logic in utils #### ⚠️ Issues Found **COMMENTED DEFAULTS**: ```r # Line ~27: Commented test dates #end_date <- "2025-07-22" # Default date for testing #end_date <- "2025-07-08" # Default date for testing ``` **Impact**: None - Just cleanup needed **Recommendation**: Remove or move to comments block **WEEK CALCULATION**: ```r # mosaic_creation_utils.R Line ~37: ISO week vs regular week week <- lubridate::isoweek(end_date) year <- lubridate::isoyear(end_date) ``` **Impact**: None - Consistent use of ISO weeks **Note**: Ensure ALL scripts use same week system (they do) #### 🔧 Recommendations 1. **LOW**: Clean up commented test dates 2. **LOW**: Add comment explaining ISO week choice --- ### 7. R: `mosaic_creation_utils.R` #### ✅ Strengths - **No hardcoded data**: All parameters passed to functions - **Sophisticated cloud handling**: Multi-threshold approach - **Terra optimized**: Efficient raster operations - **Good documentation**: Clear function purposes #### ⚠️ Issues Found **MAGIC NUMBERS** (Cloud thresholds): ```r # Line ~158-159: Cloud coverage thresholds missing_pixels_df$thres_5perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 5) missing_pixels_df$thres_40perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 45) ``` **Impact**: Medium - These thresholds determine mosaic quality **Fix**: Move to `parameters_project.R` as configurable constants: ```r CLOUD_THRESHOLD_STRICT <- 5 # Percent CLOUD_THRESHOLD_RELAXED <- 45 # Percent ``` **REPEATED LOGIC**: The cloud mask selection logic (lines ~203-269) has substantial code duplication across different threshold conditions. **Recommendation**: Extract to helper function: ```r create_mosaic_from_rasters <- function(raster_list, cloud_masks = NULL) { ... } ``` #### 🔧 Recommendations 1. **HIGH**: Extract cloud thresholds to constants 2. **MEDIUM**: Reduce code duplication in mosaic creation logic 3. **LOW**: Add more detailed logging for which images are selected --- ### 8. R: `09_calculate_kpis.R` #### ✅ Strengths - **Good structure**: Clean main() function - **Proper delegation**: All logic in utils - **Comprehensive**: Handles all 6 KPIs - **Good error messages**: Clear feedback #### ⚠️ Issues Found **NONE SIGNIFICANT** - Well-designed orchestration script #### 🔧 Recommendations 1. **LOW**: Consider adding KPI validation (e.g., check for required input files) --- ### 9. R: `kpi_utils.R` #### ✅ Strengths - **Mostly parameterized**: Functions accept necessary inputs - **Good documentation**: Clear function purposes - **Terra optimization**: Efficient raster operations #### ⚠️ Issues Found **HARDCODED THRESHOLDS** (Multiple locations): ```r # Line ~57: Field uniformity thresholds uniformity_level <- dplyr::case_when( cv_value < 0.15 ~ "Excellent", cv_value < 0.25 ~ "Good", cv_value < 0.35 ~ "Moderate", TRUE ~ "Poor" ) ``` **Impact**: HIGH - Core KPI classification logic **Fix**: Move to constants at file top: ```r UNIFORMITY_EXCELLENT_THRESHOLD <- 0.15 UNIFORMITY_GOOD_THRESHOLD <- 0.25 UNIFORMITY_MODERATE_THRESHOLD <- 0.35 ``` ```r # Line ~598: Weed rapid growth threshold (CHANGED FROM 1.5) rapid_growth_pixels <- sum(ci_change > 2.0) ``` **Impact**: HIGH - Core weed detection logic **Fix**: Add to constants: ```r WEED_RAPID_GROWTH_THRESHOLD <- 2.0 # CI units per week ``` ```r # Line ~606: Weed risk thresholds (UPDATED) weed_risk <- dplyr::case_when( rapid_growth_pct < 10 ~ "Low", rapid_growth_pct < 25 ~ "Moderate", TRUE ~ "High" ) ``` **Impact**: HIGH - Core weed KPI classification **Fix**: Add to constants: ```r WEED_RISK_LOW_THRESHOLD <- 10 # Percent WEED_RISK_MODERATE_THRESHOLD <- 25 # Percent ``` ```r # Line ~560: Field age threshold for weed analysis if (!is.na(field_age) && field_age >= 240) { ``` **Impact**: MEDIUM - Determines weed analysis scope **Fix**: Add to constants: ```r CANOPY_CLOSURE_AGE_DAYS <- 240 # 8 months ``` ```r # Line ~545: Growth decline risk thresholds risk_level <- dplyr::case_when( risk_score < 0.5 ~ "Low", risk_score < 1.5 ~ "Moderate", risk_score < 3.0 ~ "High", TRUE ~ "Very-high" ) ``` **Impact**: HIGH - Core decline KPI **Fix**: Add to constants **MAGIC NUMBER** (Projection): ```r # Line ~116, ~447: Equal Earth projection "EPSG:6933" ``` **Impact**: Low - Consistent across codebase **Recommendation**: Add constant: ```r EQUAL_AREA_PROJECTION <- "EPSG:6933" # Equal Earth for area calculations ``` **PLACEHOLDER DATA**: ```r # Line ~864: Fallback field sizes field_details$`Field Size (ha)` <- round(runif(nrow(field_details), 2.5, 4.5), 1) ``` **Impact**: MEDIUM - Creates fake data if real data missing **Context**: Comment says "placeholder - can be calculated from field boundaries" **Fix**: Actually calculate from `field_boundaries`: ```r field_details$`Field Size (ha)` <- calculate_field_areas(field_boundaries_sf) ``` #### 🔧 Recommendations 1. **URGENT**: Extract ALL KPI thresholds to constants section 2. **HIGH**: Replace placeholder field sizes with actual calculations 3. **MEDIUM**: Add validation that thresholds make logical sense (e.g., GOOD < MODERATE) 4. **LOW**: Add comments explaining threshold choices (agronomic basis) --- ### 10. R: `crop_messaging_utils.R` #### ✅ Strengths - **EXCELLENT constants section**: Thresholds defined at top (lines ~28-43) - **Good documentation**: Clear threshold purposes - **Pure functions**: Parameters passed correctly - **Comprehensive spatial analysis**: Moran's I, entropy, CV #### ⚠️ Issues Found **INCONSISTENCY WITH KPI_UTILS**: ```r # crop_messaging_utils.R: CI_CHANGE_INCREASE_THRESHOLD <- 0.5 UNIFORMITY_THRESHOLD <- 0.15 EXCELLENT_UNIFORMITY_THRESHOLD <- 0.08 POOR_UNIFORMITY_THRESHOLD <- 0.25 # kpi_utils.R: (HARDCODED IN CASE_WHEN) cv_value < 0.15 ~ "Excellent" cv_value < 0.25 ~ "Good" cv_value < 0.35 ~ "Moderate" ``` **Impact**: HIGH - Different thresholds for same metrics! **Fix**: Create SHARED constants file: ```r # r_app/analysis_constants.R UNIFORMITY_EXCELLENT <- 0.15 UNIFORMITY_GOOD <- 0.25 UNIFORMITY_MODERATE <- 0.35 ``` #### 🔧 Recommendations 1. **URGENT**: Unify threshold constants across `crop_messaging_utils.R` and `kpi_utils.R` 2. **HIGH**: Create shared constants file: `r_app/analysis_constants.R` 3. **MEDIUM**: Source this file in both utils files --- ### 11. R: `parameters_project.R` #### ✅ Strengths - **Centralized configuration**: Single source of truth for paths - **Clean initialization**: Well-structured setup functions - **Good error handling**: Fallback for missing files - **CRS validation**: Proper handling of spatial reference systems #### ⚠️ Issues Found **HARDCODED HARVEST DATA COLUMNS**: ```r # Line ~106-113: Column selection and renaming dplyr::select(c( "field", "sub_field", "year", "season_start", "season_end", "age", "sub_area", "tonnage_ha" )) ``` **Impact**: Medium - Assumes exact Excel column names **Recommendation**: Add validation or flexible column matching **CONDITIONAL LOGIC**: ```r # Line ~52-56: Project-specific file selection use_pivot_2 <- exists("project_dir") && project_dir == "esa" && exists("ci_extraction_script") ``` **Impact**: Low - Works but fragile **Recommendation**: Consider configuration file: ```yaml projects: esa: field_boundaries: pivot_2.geojson default: field_boundaries: pivot.geojson ``` #### 🔧 Recommendations 1. **MEDIUM**: Add harvest data column validation 2. **LOW**: Consider YAML/JSON config for project-specific settings 3. **LOW**: Add project directory validation (exists, writable) --- ## Summary of Critical Issues ### 🚨 URGENT (Fix Immediately) 1. **Security**: API credentials hardcoded in `01_planet_download.py` (lines 13-14) 2. **Consistency**: Threshold mismatch between `crop_messaging_utils.R` and `kpi_utils.R` ### ⚠️ HIGH Priority (Fix Soon) 1. **Parameterization**: Cloud coverage thresholds in `mosaic_creation_utils.R` (lines 158-159) 2. **Parameterization**: All KPI thresholds in `kpi_utils.R` should be constants 3. **Data Quality**: Replace placeholder field sizes with actual calculations in `kpi_utils.R` (line 864) ### 📋 MEDIUM Priority (Improvement) 1. **Configuration**: Make resolution parameterizable in `01_planet_download.py` 2. **Code Quality**: Reduce duplication in mosaic creation logic 3. **Validation**: Add harvest data column validation in `parameters_project.R` ### 💡 LOW Priority (Nice to Have) 1. **Cleanup**: Remove commented code throughout 2. **Documentation**: Add formula references and threshold justifications 3. **Structure**: Consider shared configuration file for project-specific settings --- ## Recommendations for Improvement ### 1. Create Shared Constants File ```r # r_app/analysis_constants.R # Crop Index Change Thresholds CI_CHANGE_INCREASE_THRESHOLD <- 0.5 CI_CHANGE_DECREASE_THRESHOLD <- -0.5 # Field Uniformity Thresholds (CV as decimal) UNIFORMITY_EXCELLENT <- 0.15 UNIFORMITY_GOOD <- 0.25 UNIFORMITY_MODERATE <- 0.35 # Cloud Coverage Thresholds (percent) CLOUD_THRESHOLD_STRICT <- 5 CLOUD_THRESHOLD_RELAXED <- 45 # KPI Thresholds WEED_RAPID_GROWTH_THRESHOLD <- 2.0 # CI units per week WEED_RISK_LOW_THRESHOLD <- 10 # Percent WEED_RISK_MODERATE_THRESHOLD <- 25 # Percent CANOPY_CLOSURE_AGE_DAYS <- 240 # 8 months # Spatial Analysis Thresholds MORAN_THRESHOLD_HIGH <- 0.95 MORAN_THRESHOLD_MODERATE <- 0.85 MORAN_THRESHOLD_LOW <- 0.7 # Projections EQUAL_AREA_PROJECTION <- "EPSG:6933" # Equal Earth for area calculations WGS84_PROJECTION <- "EPSG:4326" # WGS84 for lat/lon ``` ### 2. Security Improvements ```python # 01_planet_download.py - Use environment variables import os config.sh_client_id = os.getenv('SENTINEL_HUB_CLIENT_ID') config.sh_client_secret = os.getenv('SENTINEL_HUB_CLIENT_SECRET') if not config.sh_client_id or not config.sh_client_secret: raise ValueError("Sentinel Hub credentials not found in environment variables") ``` ### 3. Configuration File Structure ```yaml # config/projects.yaml default: field_boundaries: pivot.geojson resolution: 3 cloud_threshold_strict: 5 cloud_threshold_relaxed: 45 esa: field_boundaries: pivot_2.geojson resolution: 3 cloud_threshold_strict: 5 cloud_threshold_relaxed: 45 notes: "Uses pivot_2.geojson for yield prediction extra fields" kibos: field_boundaries: pivot.geojson resolution: 3 ``` --- ## Testing Recommendations 1. **Unit Tests**: Add tests for utility functions - `calculate_cv()` with known datasets - `calculate_spatial_autocorrelation()` with synthetic fields - `categorize_change()` with boundary cases 2. **Integration Tests**: Test full workflows - CI extraction with test imagery - Mosaic creation with varying cloud cover - KPI calculation with known inputs 3. **Validation**: Add data validation - Date ranges (not in future, reasonable past) - File existence checks before processing - CRS compatibility verification --- ## Code Quality Metrics | Script | Hardcoded Values | Function Quality | Error Handling | Documentation | Grade | |--------|------------------|------------------|----------------|---------------|-------| | `01_planet_download.py` | 5 issues | Good | Good | Minimal | B- | | `02_ci_extraction.R` | 0 issues | Excellent | Excellent | Good | A | | `ci_extraction_utils.R` | 1 minor | Excellent | Excellent | Excellent | A+ | | `03_interpolate_growth_model.R` | 0 issues | Excellent | Excellent | Good | A | | `growth_model_utils.R` | 0 issues | Excellent | Excellent | Excellent | A+ | | `04_mosaic_creation.R` | 0 issues | Good | Good | Good | A- | | `mosaic_creation_utils.R` | 2 issues | Good | Good | Good | B+ | | `09_calculate_kpis.R` | 0 issues | Good | Good | Good | A- | | `kpi_utils.R` | 8 issues | Good | Good | Good | B | | `crop_messaging_utils.R` | 1 issue | Excellent | Excellent | Excellent | A | | `parameters_project.R` | 1 issue | Excellent | Excellent | Good | A | **Overall Assessment**: The codebase is **well-structured with good practices**, but needs **threshold consolidation** and **security fixes** for API credentials. --- ## Conclusion The SmartCane codebase demonstrates solid software engineering practices with good separation of concerns, comprehensive error handling, and mostly well-parameterized functions. The primary areas for improvement are: 1. **Consolidating threshold constants** into a shared file 2. **Securing API credentials** via environment variables 3. **Removing hardcoded magic numbers** from KPI calculations 4. **Cleaning up commented code** and test values The R code quality is particularly strong, with excellent examples in `growth_model_utils.R` and `ci_extraction_utils.R` that could serve as templates for other modules.