19 KiB
SmartCane Code Quality Check Report
Date: 2025-10-14
Reviewer: GitHub Copilot
Scope: Main processing scripts and utilities
Executive Summary
The SmartCane codebase demonstrates generally good quality with well-structured utility functions, proper error handling, and clear separation of concerns. However, there are several areas for improvement including hardcoded values, inconsistent parameterization, and opportunities for better code reusability.
Overall Grade: B+ (Good, with room for improvement)
Detailed Findings by Script
1. Python: 01_planet_download.py / .ipynb
✅ Strengths
- Good parameterization: Most values pulled from environment variables (
os.environ.get()) - Flexible date handling: Supports both
DATEenv var andSys.Date()fallback - Well-structured functions: Clear separation (
get_true_color_request_day,download_function,merge_files) - Error handling: Try-catch blocks for file operations
- Reusable: Functions accept parameters rather than hardcoding
⚠️ Issues Found
HARDCODED VALUES:
# Line ~13-14: API Credentials HARDCODED (SECURITY RISK!)
config.sh_client_id = '1a72d811-4f0e-4447-8282-df09608cff44'
config.sh_client_secret = 'FcBlRL29i9ZmTzhmKTv1etSMFs5PxSos'
Impact: CRITICAL - Credentials exposed in code
Fix: Move to environment variables or config file
# Line ~16-17: Collection ID hardcoded
collection_id = 'c691479f-358c-46b1-b0f0-e12b70a9856c'
Impact: Medium - Would need code change for different collections
Fix: Add to configuration
# Line ~22-23: Default project hardcoded
project = 'kibos' # or xinavane or chemba_test_8b
Impact: Low - Has os.getenv('PROJECT_DIR', project) fallback
Recommendation: Remove commented projects, keep only default
# Line ~25: Days default hardcoded
days = 9 # change back to 28 which is the default. 3 years is 1095 days.
Impact: Low - Already handled by os.environ.get("DAYS", days)
Recommendation: Clean up comments, set consistent default
# Line ~126: Resolution hardcoded
resolution = 3
Impact: Medium - Should be configurable per project
Fix: Add as parameter or config variable
JUPYTER NOTEBOOK SPECIFIC ISSUES:
# Line ~88 (notebook): Hardcoded project
project = 'esa' #or xinavane or chemba_test_8b
Impact: Medium - Requires manual editing for each run
Fix: Use input cell or environment variable
# Line ~40 (notebook): Conditional geojson path
geojson_file = Path(BASE_PATH /'Data'/ ('pivot_2.geojson' if project == "esa" else 'pivot.geojson'))
Impact: Low - Project-specific logic, acceptable
Note: Consider documenting why ESA needs different file
🔧 Recommendations
- URGENT: Move API credentials to environment variables
- HIGH: Parameterize resolution via config or command-line arg
- MEDIUM: Clean up commented code and standardize defaults
- LOW: Add docstrings to all functions
2. R: 02_ci_extraction.R
✅ Strengths
- Excellent parameterization: All major values from command-line args
- Good defaults: Sensible fallbacks for missing arguments
- Error handling: Proper try-catch blocks with fallback paths
- Modular design: Clear separation into utils file
- Logging: Uses
log_message()throughout
⚠️ Issues Found
NONE SIGNIFICANT - This script is well-written!
Minor observations:
# Line ~40: Commented default date
#end_date <- "2023-10-01"
Impact: None - Just cleanup needed
Fix: Remove or move to documentation
# Line ~64: Assignment to global environment
assign("ci_extraction_script", ci_extraction_script, envir = .GlobalEnv)
Impact: Low - Legitimate use for flag passing
Note: This is acceptable for script orchestration
🔧 Recommendations
- LOW: Remove commented code
- LOW: Consider adding validation for date ranges (e.g., not in future)
3. R: ci_extraction_utils.R
✅ Strengths
- Pure functions: No hardcoded data in function bodies!
- Excellent documentation: Comprehensive roxygen comments
- Error handling: Proper validation and try-catch blocks
- Terra optimization: Uses
terra::global()for efficiency - Reusable: Functions accept all required parameters
⚠️ Issues Found
POTENTIAL ISSUE:
# Line ~85: Hardcoded band names assumption
names(loaded_raster) <- c("Red", "Green", "Blue", "NIR")
Impact: Low - Assumes 4-band raster structure
Context: This is CORRECT for Planet data
Recommendation: Add validation to check terra::nlyr(loaded_raster) >= 4
# Line ~92: Hardcoded CI calculation
CI <- loaded_raster$NIR / loaded_raster$Green - 1
Impact: None - This is the CI formula
Note: Consider adding comment explaining formula choice
MINOR CONCERN:
# Line ~145: Hardcoded min_valid_pixels default
min_valid_pixels = 100
Impact: Low - Good default, but could be configurable
Recommendation: Consider making this a parameter in parameters_project.R
🔧 Recommendations
- MEDIUM: Add band count validation before naming
- LOW: Document CI formula with reference
- LOW: Consider making
min_valid_pixelsconfigurable
4. R: 03_interpolate_growth_model.R
✅ Strengths
- Clean script structure: Minimal code, delegates to utils
- Good arg handling: Command-line args with defaults
- Proper sourcing: Handles both default and r_app paths
- Global env assignment: Justified for config passing
⚠️ Issues Found
NONE SIGNIFICANT - Well-designed script!
Minor note:
# Line ~66: Hardcoded filename
"All_pivots_Cumulative_CI_quadrant_year_v2.rds"
Impact: None - This is the OUTPUT filename
Note: Consider making this a constant in parameters_project.R
🔧 Recommendations
- LOW: Move output filenames to parameters file for consistency
- LOW: Add progress messages during year processing loop
5. R: growth_model_utils.R
✅ Strengths
- EXCELLENT: No hardcoded values in function bodies
- Pure functions: All data passed as parameters
- Good error handling: Comprehensive try-catch blocks
- Detailed logging: Informative messages at each step
- Well-documented: Clear roxygen comments
⚠️ Issues Found
NONE - This is exemplary code!
🔧 Recommendations
No changes needed - This file serves as a model for others
6. R: 04_mosaic_creation.R
✅ Strengths
- Good parameterization: Command-line args with defaults
- Flexible: Custom output filename support
- Clear delegation: Processing logic in utils
⚠️ Issues Found
COMMENTED DEFAULTS:
# Line ~27: Commented test dates
#end_date <- "2025-07-22" # Default date for testing
#end_date <- "2025-07-08" # Default date for testing
Impact: None - Just cleanup needed
Recommendation: Remove or move to comments block
WEEK CALCULATION:
# mosaic_creation_utils.R Line ~37: ISO week vs regular week
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)
Impact: None - Consistent use of ISO weeks
Note: Ensure ALL scripts use same week system (they do)
🔧 Recommendations
- LOW: Clean up commented test dates
- LOW: Add comment explaining ISO week choice
7. R: mosaic_creation_utils.R
✅ Strengths
- No hardcoded data: All parameters passed to functions
- Sophisticated cloud handling: Multi-threshold approach
- Terra optimized: Efficient raster operations
- Good documentation: Clear function purposes
⚠️ Issues Found
MAGIC NUMBERS (Cloud thresholds):
# Line ~158-159: Cloud coverage thresholds
missing_pixels_df$thres_5perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 5)
missing_pixels_df$thres_40perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 45)
Impact: Medium - These thresholds determine mosaic quality
Fix: Move to parameters_project.R as configurable constants:
CLOUD_THRESHOLD_STRICT <- 5 # Percent
CLOUD_THRESHOLD_RELAXED <- 45 # Percent
REPEATED LOGIC:
The cloud mask selection logic (lines ~203-269) has substantial code duplication across different threshold conditions.
Recommendation: Extract to helper function:
create_mosaic_from_rasters <- function(raster_list, cloud_masks = NULL) { ... }
🔧 Recommendations
- HIGH: Extract cloud thresholds to constants
- MEDIUM: Reduce code duplication in mosaic creation logic
- LOW: Add more detailed logging for which images are selected
8. R: 09_calculate_kpis.R
✅ Strengths
- Good structure: Clean main() function
- Proper delegation: All logic in utils
- Comprehensive: Handles all 6 KPIs
- Good error messages: Clear feedback
⚠️ Issues Found
NONE SIGNIFICANT - Well-designed orchestration script
🔧 Recommendations
- LOW: Consider adding KPI validation (e.g., check for required input files)
9. R: kpi_utils.R
✅ Strengths
- Mostly parameterized: Functions accept necessary inputs
- Good documentation: Clear function purposes
- Terra optimization: Efficient raster operations
⚠️ Issues Found
HARDCODED THRESHOLDS (Multiple locations):
# Line ~57: Field uniformity thresholds
uniformity_level <- dplyr::case_when(
cv_value < 0.15 ~ "Excellent",
cv_value < 0.25 ~ "Good",
cv_value < 0.35 ~ "Moderate",
TRUE ~ "Poor"
)
Impact: HIGH - Core KPI classification logic
Fix: Move to constants at file top:
UNIFORMITY_EXCELLENT_THRESHOLD <- 0.15
UNIFORMITY_GOOD_THRESHOLD <- 0.25
UNIFORMITY_MODERATE_THRESHOLD <- 0.35
# Line ~598: Weed rapid growth threshold (CHANGED FROM 1.5)
rapid_growth_pixels <- sum(ci_change > 2.0)
Impact: HIGH - Core weed detection logic
Fix: Add to constants:
WEED_RAPID_GROWTH_THRESHOLD <- 2.0 # CI units per week
# Line ~606: Weed risk thresholds (UPDATED)
weed_risk <- dplyr::case_when(
rapid_growth_pct < 10 ~ "Low",
rapid_growth_pct < 25 ~ "Moderate",
TRUE ~ "High"
)
Impact: HIGH - Core weed KPI classification
Fix: Add to constants:
WEED_RISK_LOW_THRESHOLD <- 10 # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25 # Percent
# Line ~560: Field age threshold for weed analysis
if (!is.na(field_age) && field_age >= 240) {
Impact: MEDIUM - Determines weed analysis scope
Fix: Add to constants:
CANOPY_CLOSURE_AGE_DAYS <- 240 # 8 months
# Line ~545: Growth decline risk thresholds
risk_level <- dplyr::case_when(
risk_score < 0.5 ~ "Low",
risk_score < 1.5 ~ "Moderate",
risk_score < 3.0 ~ "High",
TRUE ~ "Very-high"
)
Impact: HIGH - Core decline KPI
Fix: Add to constants
MAGIC NUMBER (Projection):
# Line ~116, ~447: Equal Earth projection
"EPSG:6933"
Impact: Low - Consistent across codebase
Recommendation: Add constant:
EQUAL_AREA_PROJECTION <- "EPSG:6933" # Equal Earth for area calculations
PLACEHOLDER DATA:
# Line ~864: Fallback field sizes
field_details$`Field Size (ha)` <- round(runif(nrow(field_details), 2.5, 4.5), 1)
Impact: MEDIUM - Creates fake data if real data missing
Context: Comment says "placeholder - can be calculated from field boundaries"
Fix: Actually calculate from field_boundaries:
field_details$`Field Size (ha)` <- calculate_field_areas(field_boundaries_sf)
🔧 Recommendations
- URGENT: Extract ALL KPI thresholds to constants section
- HIGH: Replace placeholder field sizes with actual calculations
- MEDIUM: Add validation that thresholds make logical sense (e.g., GOOD < MODERATE)
- LOW: Add comments explaining threshold choices (agronomic basis)
10. R: crop_messaging_utils.R
✅ Strengths
- EXCELLENT constants section: Thresholds defined at top (lines ~28-43)
- Good documentation: Clear threshold purposes
- Pure functions: Parameters passed correctly
- Comprehensive spatial analysis: Moran's I, entropy, CV
⚠️ Issues Found
INCONSISTENCY WITH KPI_UTILS:
# crop_messaging_utils.R:
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
UNIFORMITY_THRESHOLD <- 0.15
EXCELLENT_UNIFORMITY_THRESHOLD <- 0.08
POOR_UNIFORMITY_THRESHOLD <- 0.25
# kpi_utils.R: (HARDCODED IN CASE_WHEN)
cv_value < 0.15 ~ "Excellent"
cv_value < 0.25 ~ "Good"
cv_value < 0.35 ~ "Moderate"
Impact: HIGH - Different thresholds for same metrics!
Fix: Create SHARED constants file:
# r_app/analysis_constants.R
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35
🔧 Recommendations
- URGENT: Unify threshold constants across
crop_messaging_utils.Randkpi_utils.R - HIGH: Create shared constants file:
r_app/analysis_constants.R - MEDIUM: Source this file in both utils files
11. R: parameters_project.R
✅ Strengths
- Centralized configuration: Single source of truth for paths
- Clean initialization: Well-structured setup functions
- Good error handling: Fallback for missing files
- CRS validation: Proper handling of spatial reference systems
⚠️ Issues Found
HARDCODED HARVEST DATA COLUMNS:
# Line ~106-113: Column selection and renaming
dplyr::select(c(
"field", "sub_field", "year", "season_start",
"season_end", "age", "sub_area", "tonnage_ha"
))
Impact: Medium - Assumes exact Excel column names
Recommendation: Add validation or flexible column matching
CONDITIONAL LOGIC:
# Line ~52-56: Project-specific file selection
use_pivot_2 <- exists("project_dir") && project_dir == "esa" &&
exists("ci_extraction_script")
Impact: Low - Works but fragile
Recommendation: Consider configuration file:
projects:
esa:
field_boundaries: pivot_2.geojson
default:
field_boundaries: pivot.geojson
🔧 Recommendations
- MEDIUM: Add harvest data column validation
- LOW: Consider YAML/JSON config for project-specific settings
- LOW: Add project directory validation (exists, writable)
Summary of Critical Issues
🚨 URGENT (Fix Immediately)
- Security: API credentials hardcoded in
01_planet_download.py(lines 13-14) - Consistency: Threshold mismatch between
crop_messaging_utils.Randkpi_utils.R
⚠️ HIGH Priority (Fix Soon)
- Parameterization: Cloud coverage thresholds in
mosaic_creation_utils.R(lines 158-159) - Parameterization: All KPI thresholds in
kpi_utils.Rshould be constants - Data Quality: Replace placeholder field sizes with actual calculations in
kpi_utils.R(line 864)
📋 MEDIUM Priority (Improvement)
- Configuration: Make resolution parameterizable in
01_planet_download.py - Code Quality: Reduce duplication in mosaic creation logic
- Validation: Add harvest data column validation in
parameters_project.R
💡 LOW Priority (Nice to Have)
- Cleanup: Remove commented code throughout
- Documentation: Add formula references and threshold justifications
- Structure: Consider shared configuration file for project-specific settings
Recommendations for Improvement
1. Create Shared Constants File
# r_app/analysis_constants.R
# Crop Index Change Thresholds
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
CI_CHANGE_DECREASE_THRESHOLD <- -0.5
# Field Uniformity Thresholds (CV as decimal)
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35
# Cloud Coverage Thresholds (percent)
CLOUD_THRESHOLD_STRICT <- 5
CLOUD_THRESHOLD_RELAXED <- 45
# KPI Thresholds
WEED_RAPID_GROWTH_THRESHOLD <- 2.0 # CI units per week
WEED_RISK_LOW_THRESHOLD <- 10 # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25 # Percent
CANOPY_CLOSURE_AGE_DAYS <- 240 # 8 months
# Spatial Analysis Thresholds
MORAN_THRESHOLD_HIGH <- 0.95
MORAN_THRESHOLD_MODERATE <- 0.85
MORAN_THRESHOLD_LOW <- 0.7
# Projections
EQUAL_AREA_PROJECTION <- "EPSG:6933" # Equal Earth for area calculations
WGS84_PROJECTION <- "EPSG:4326" # WGS84 for lat/lon
2. Security Improvements
# 01_planet_download.py - Use environment variables
import os
config.sh_client_id = os.getenv('SENTINEL_HUB_CLIENT_ID')
config.sh_client_secret = os.getenv('SENTINEL_HUB_CLIENT_SECRET')
if not config.sh_client_id or not config.sh_client_secret:
raise ValueError("Sentinel Hub credentials not found in environment variables")
3. Configuration File Structure
# config/projects.yaml
default:
field_boundaries: pivot.geojson
resolution: 3
cloud_threshold_strict: 5
cloud_threshold_relaxed: 45
esa:
field_boundaries: pivot_2.geojson
resolution: 3
cloud_threshold_strict: 5
cloud_threshold_relaxed: 45
notes: "Uses pivot_2.geojson for yield prediction extra fields"
kibos:
field_boundaries: pivot.geojson
resolution: 3
Testing Recommendations
-
Unit Tests: Add tests for utility functions
calculate_cv()with known datasetscalculate_spatial_autocorrelation()with synthetic fieldscategorize_change()with boundary cases
-
Integration Tests: Test full workflows
- CI extraction with test imagery
- Mosaic creation with varying cloud cover
- KPI calculation with known inputs
-
Validation: Add data validation
- Date ranges (not in future, reasonable past)
- File existence checks before processing
- CRS compatibility verification
Code Quality Metrics
| Script | Hardcoded Values | Function Quality | Error Handling | Documentation | Grade |
|---|---|---|---|---|---|
01_planet_download.py |
5 issues | Good | Good | Minimal | B- |
02_ci_extraction.R |
0 issues | Excellent | Excellent | Good | A |
ci_extraction_utils.R |
1 minor | Excellent | Excellent | Excellent | A+ |
03_interpolate_growth_model.R |
0 issues | Excellent | Excellent | Good | A |
growth_model_utils.R |
0 issues | Excellent | Excellent | Excellent | A+ |
04_mosaic_creation.R |
0 issues | Good | Good | Good | A- |
mosaic_creation_utils.R |
2 issues | Good | Good | Good | B+ |
09_calculate_kpis.R |
0 issues | Good | Good | Good | A- |
kpi_utils.R |
8 issues | Good | Good | Good | B |
crop_messaging_utils.R |
1 issue | Excellent | Excellent | Excellent | A |
parameters_project.R |
1 issue | Excellent | Excellent | Good | A |
Overall Assessment: The codebase is well-structured with good practices, but needs threshold consolidation and security fixes for API credentials.
Conclusion
The SmartCane codebase demonstrates solid software engineering practices with good separation of concerns, comprehensive error handling, and mostly well-parameterized functions. The primary areas for improvement are:
- Consolidating threshold constants into a shared file
- Securing API credentials via environment variables
- Removing hardcoded magic numbers from KPI calculations
- Cleaning up commented code and test values
The R code quality is particularly strong, with excellent examples in growth_model_utils.R and ci_extraction_utils.R that could serve as templates for other modules.