SmartCane/webapps/docs/QUALITY_CHECK_REPORT.md

19 KiB

SmartCane Code Quality Check Report

Date: 2025-10-14
Reviewer: GitHub Copilot
Scope: Main processing scripts and utilities

Executive Summary

The SmartCane codebase demonstrates generally good quality with well-structured utility functions, proper error handling, and clear separation of concerns. However, there are several areas for improvement including hardcoded values, inconsistent parameterization, and opportunities for better code reusability.

Overall Grade: B+ (Good, with room for improvement)


Detailed Findings by Script

1. Python: 01_planet_download.py / .ipynb

Strengths

  • Good parameterization: Most values pulled from environment variables (os.environ.get())
  • Flexible date handling: Supports both DATE env var and Sys.Date() fallback
  • Well-structured functions: Clear separation (get_true_color_request_day, download_function, merge_files)
  • Error handling: Try-catch blocks for file operations
  • Reusable: Functions accept parameters rather than hardcoding

⚠️ Issues Found

HARDCODED VALUES:

# Line ~13-14: API Credentials HARDCODED (SECURITY RISK!)
config.sh_client_id = '1a72d811-4f0e-4447-8282-df09608cff44'
config.sh_client_secret = 'FcBlRL29i9ZmTzhmKTv1etSMFs5PxSos'

Impact: CRITICAL - Credentials exposed in code
Fix: Move to environment variables or config file

# Line ~16-17: Collection ID hardcoded
collection_id = 'c691479f-358c-46b1-b0f0-e12b70a9856c'

Impact: Medium - Would need code change for different collections
Fix: Add to configuration

# Line ~22-23: Default project hardcoded
project = 'kibos' # or xinavane or chemba_test_8b

Impact: Low - Has os.getenv('PROJECT_DIR', project) fallback
Recommendation: Remove commented projects, keep only default

# Line ~25: Days default hardcoded
days = 9 # change back to 28 which is the default. 3 years is 1095 days.

Impact: Low - Already handled by os.environ.get("DAYS", days)
Recommendation: Clean up comments, set consistent default

# Line ~126: Resolution hardcoded
resolution = 3

Impact: Medium - Should be configurable per project
Fix: Add as parameter or config variable

JUPYTER NOTEBOOK SPECIFIC ISSUES:

# Line ~88 (notebook): Hardcoded project
project = 'esa' #or xinavane or chemba_test_8b

Impact: Medium - Requires manual editing for each run
Fix: Use input cell or environment variable

# Line ~40 (notebook): Conditional geojson path
geojson_file = Path(BASE_PATH /'Data'/ ('pivot_2.geojson' if project == "esa" else 'pivot.geojson'))

Impact: Low - Project-specific logic, acceptable
Note: Consider documenting why ESA needs different file

🔧 Recommendations

  1. URGENT: Move API credentials to environment variables
  2. HIGH: Parameterize resolution via config or command-line arg
  3. MEDIUM: Clean up commented code and standardize defaults
  4. LOW: Add docstrings to all functions

2. R: 02_ci_extraction.R

Strengths

  • Excellent parameterization: All major values from command-line args
  • Good defaults: Sensible fallbacks for missing arguments
  • Error handling: Proper try-catch blocks with fallback paths
  • Modular design: Clear separation into utils file
  • Logging: Uses log_message() throughout

⚠️ Issues Found

NONE SIGNIFICANT - This script is well-written!

Minor observations:

# Line ~40: Commented default date
#end_date <- "2023-10-01"

Impact: None - Just cleanup needed
Fix: Remove or move to documentation

# Line ~64: Assignment to global environment
assign("ci_extraction_script", ci_extraction_script, envir = .GlobalEnv)

Impact: Low - Legitimate use for flag passing
Note: This is acceptable for script orchestration

🔧 Recommendations

  1. LOW: Remove commented code
  2. LOW: Consider adding validation for date ranges (e.g., not in future)

3. R: ci_extraction_utils.R

Strengths

  • Pure functions: No hardcoded data in function bodies!
  • Excellent documentation: Comprehensive roxygen comments
  • Error handling: Proper validation and try-catch blocks
  • Terra optimization: Uses terra::global() for efficiency
  • Reusable: Functions accept all required parameters

⚠️ Issues Found

POTENTIAL ISSUE:

# Line ~85: Hardcoded band names assumption
names(loaded_raster) <- c("Red", "Green", "Blue", "NIR")

Impact: Low - Assumes 4-band raster structure
Context: This is CORRECT for Planet data
Recommendation: Add validation to check terra::nlyr(loaded_raster) >= 4

# Line ~92: Hardcoded CI calculation
CI <- loaded_raster$NIR / loaded_raster$Green - 1

Impact: None - This is the CI formula
Note: Consider adding comment explaining formula choice

MINOR CONCERN:

# Line ~145: Hardcoded min_valid_pixels default
min_valid_pixels = 100

Impact: Low - Good default, but could be configurable
Recommendation: Consider making this a parameter in parameters_project.R

🔧 Recommendations

  1. MEDIUM: Add band count validation before naming
  2. LOW: Document CI formula with reference
  3. LOW: Consider making min_valid_pixels configurable

4. R: 03_interpolate_growth_model.R

Strengths

  • Clean script structure: Minimal code, delegates to utils
  • Good arg handling: Command-line args with defaults
  • Proper sourcing: Handles both default and r_app paths
  • Global env assignment: Justified for config passing

⚠️ Issues Found

NONE SIGNIFICANT - Well-designed script!

Minor note:

# Line ~66: Hardcoded filename
"All_pivots_Cumulative_CI_quadrant_year_v2.rds"

Impact: None - This is the OUTPUT filename
Note: Consider making this a constant in parameters_project.R

🔧 Recommendations

  1. LOW: Move output filenames to parameters file for consistency
  2. LOW: Add progress messages during year processing loop

5. R: growth_model_utils.R

Strengths

  • EXCELLENT: No hardcoded values in function bodies
  • Pure functions: All data passed as parameters
  • Good error handling: Comprehensive try-catch blocks
  • Detailed logging: Informative messages at each step
  • Well-documented: Clear roxygen comments

⚠️ Issues Found

NONE - This is exemplary code!

🔧 Recommendations

No changes needed - This file serves as a model for others


6. R: 04_mosaic_creation.R

Strengths

  • Good parameterization: Command-line args with defaults
  • Flexible: Custom output filename support
  • Clear delegation: Processing logic in utils

⚠️ Issues Found

COMMENTED DEFAULTS:

# Line ~27: Commented test dates
#end_date <- "2025-07-22"  # Default date for testing
#end_date <-  "2025-07-08"   # Default date for testing

Impact: None - Just cleanup needed
Recommendation: Remove or move to comments block

WEEK CALCULATION:

# mosaic_creation_utils.R Line ~37: ISO week vs regular week
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)

Impact: None - Consistent use of ISO weeks
Note: Ensure ALL scripts use same week system (they do)

🔧 Recommendations

  1. LOW: Clean up commented test dates
  2. LOW: Add comment explaining ISO week choice

7. R: mosaic_creation_utils.R

Strengths

  • No hardcoded data: All parameters passed to functions
  • Sophisticated cloud handling: Multi-threshold approach
  • Terra optimized: Efficient raster operations
  • Good documentation: Clear function purposes

⚠️ Issues Found

MAGIC NUMBERS (Cloud thresholds):

# Line ~158-159: Cloud coverage thresholds
missing_pixels_df$thres_5perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 5)
missing_pixels_df$thres_40perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 45)

Impact: Medium - These thresholds determine mosaic quality
Fix: Move to parameters_project.R as configurable constants:

CLOUD_THRESHOLD_STRICT <- 5  # Percent
CLOUD_THRESHOLD_RELAXED <- 45  # Percent

REPEATED LOGIC: The cloud mask selection logic (lines ~203-269) has substantial code duplication across different threshold conditions.
Recommendation: Extract to helper function:

create_mosaic_from_rasters <- function(raster_list, cloud_masks = NULL) { ... }

🔧 Recommendations

  1. HIGH: Extract cloud thresholds to constants
  2. MEDIUM: Reduce code duplication in mosaic creation logic
  3. LOW: Add more detailed logging for which images are selected

8. R: 09_calculate_kpis.R

Strengths

  • Good structure: Clean main() function
  • Proper delegation: All logic in utils
  • Comprehensive: Handles all 6 KPIs
  • Good error messages: Clear feedback

⚠️ Issues Found

NONE SIGNIFICANT - Well-designed orchestration script

🔧 Recommendations

  1. LOW: Consider adding KPI validation (e.g., check for required input files)

9. R: kpi_utils.R

Strengths

  • Mostly parameterized: Functions accept necessary inputs
  • Good documentation: Clear function purposes
  • Terra optimization: Efficient raster operations

⚠️ Issues Found

HARDCODED THRESHOLDS (Multiple locations):

# Line ~57: Field uniformity thresholds
uniformity_level <- dplyr::case_when(
  cv_value < 0.15 ~ "Excellent",
  cv_value < 0.25 ~ "Good", 
  cv_value < 0.35 ~ "Moderate",
  TRUE ~ "Poor"
)

Impact: HIGH - Core KPI classification logic
Fix: Move to constants at file top:

UNIFORMITY_EXCELLENT_THRESHOLD <- 0.15
UNIFORMITY_GOOD_THRESHOLD <- 0.25
UNIFORMITY_MODERATE_THRESHOLD <- 0.35
# Line ~598: Weed rapid growth threshold (CHANGED FROM 1.5)
rapid_growth_pixels <- sum(ci_change > 2.0)

Impact: HIGH - Core weed detection logic
Fix: Add to constants:

WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week
# Line ~606: Weed risk thresholds (UPDATED)
weed_risk <- dplyr::case_when(
  rapid_growth_pct < 10 ~ "Low",
  rapid_growth_pct < 25 ~ "Moderate",
  TRUE ~ "High"
)

Impact: HIGH - Core weed KPI classification
Fix: Add to constants:

WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent
# Line ~560: Field age threshold for weed analysis
if (!is.na(field_age) && field_age >= 240) {

Impact: MEDIUM - Determines weed analysis scope
Fix: Add to constants:

CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months
# Line ~545: Growth decline risk thresholds
risk_level <- dplyr::case_when(
  risk_score < 0.5 ~ "Low",
  risk_score < 1.5 ~ "Moderate", 
  risk_score < 3.0 ~ "High",
  TRUE ~ "Very-high"
)

Impact: HIGH - Core decline KPI
Fix: Add to constants

MAGIC NUMBER (Projection):

# Line ~116, ~447: Equal Earth projection
"EPSG:6933"

Impact: Low - Consistent across codebase
Recommendation: Add constant:

EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations

PLACEHOLDER DATA:

# Line ~864: Fallback field sizes
field_details$`Field Size (ha)` <- round(runif(nrow(field_details), 2.5, 4.5), 1)

Impact: MEDIUM - Creates fake data if real data missing
Context: Comment says "placeholder - can be calculated from field boundaries"
Fix: Actually calculate from field_boundaries:

field_details$`Field Size (ha)` <- calculate_field_areas(field_boundaries_sf)

🔧 Recommendations

  1. URGENT: Extract ALL KPI thresholds to constants section
  2. HIGH: Replace placeholder field sizes with actual calculations
  3. MEDIUM: Add validation that thresholds make logical sense (e.g., GOOD < MODERATE)
  4. LOW: Add comments explaining threshold choices (agronomic basis)

10. R: crop_messaging_utils.R

Strengths

  • EXCELLENT constants section: Thresholds defined at top (lines ~28-43)
  • Good documentation: Clear threshold purposes
  • Pure functions: Parameters passed correctly
  • Comprehensive spatial analysis: Moran's I, entropy, CV

⚠️ Issues Found

INCONSISTENCY WITH KPI_UTILS:

# crop_messaging_utils.R:
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
UNIFORMITY_THRESHOLD <- 0.15
EXCELLENT_UNIFORMITY_THRESHOLD <- 0.08
POOR_UNIFORMITY_THRESHOLD <- 0.25

# kpi_utils.R: (HARDCODED IN CASE_WHEN)
cv_value < 0.15 ~ "Excellent"
cv_value < 0.25 ~ "Good"
cv_value < 0.35 ~ "Moderate"

Impact: HIGH - Different thresholds for same metrics!
Fix: Create SHARED constants file:

# r_app/analysis_constants.R
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35

🔧 Recommendations

  1. URGENT: Unify threshold constants across crop_messaging_utils.R and kpi_utils.R
  2. HIGH: Create shared constants file: r_app/analysis_constants.R
  3. MEDIUM: Source this file in both utils files

11. R: parameters_project.R

Strengths

  • Centralized configuration: Single source of truth for paths
  • Clean initialization: Well-structured setup functions
  • Good error handling: Fallback for missing files
  • CRS validation: Proper handling of spatial reference systems

⚠️ Issues Found

HARDCODED HARVEST DATA COLUMNS:

# Line ~106-113: Column selection and renaming
dplyr::select(c(
  "field", "sub_field", "year", "season_start",
  "season_end", "age", "sub_area", "tonnage_ha"
))

Impact: Medium - Assumes exact Excel column names
Recommendation: Add validation or flexible column matching

CONDITIONAL LOGIC:

# Line ~52-56: Project-specific file selection
use_pivot_2 <- exists("project_dir") && project_dir == "esa" &&
               exists("ci_extraction_script")

Impact: Low - Works but fragile
Recommendation: Consider configuration file:

projects:
  esa:
    field_boundaries: pivot_2.geojson
  default:
    field_boundaries: pivot.geojson

🔧 Recommendations

  1. MEDIUM: Add harvest data column validation
  2. LOW: Consider YAML/JSON config for project-specific settings
  3. LOW: Add project directory validation (exists, writable)

Summary of Critical Issues

🚨 URGENT (Fix Immediately)

  1. Security: API credentials hardcoded in 01_planet_download.py (lines 13-14)
  2. Consistency: Threshold mismatch between crop_messaging_utils.R and kpi_utils.R

⚠️ HIGH Priority (Fix Soon)

  1. Parameterization: Cloud coverage thresholds in mosaic_creation_utils.R (lines 158-159)
  2. Parameterization: All KPI thresholds in kpi_utils.R should be constants
  3. Data Quality: Replace placeholder field sizes with actual calculations in kpi_utils.R (line 864)

📋 MEDIUM Priority (Improvement)

  1. Configuration: Make resolution parameterizable in 01_planet_download.py
  2. Code Quality: Reduce duplication in mosaic creation logic
  3. Validation: Add harvest data column validation in parameters_project.R

💡 LOW Priority (Nice to Have)

  1. Cleanup: Remove commented code throughout
  2. Documentation: Add formula references and threshold justifications
  3. Structure: Consider shared configuration file for project-specific settings

Recommendations for Improvement

1. Create Shared Constants File

# r_app/analysis_constants.R

# Crop Index Change Thresholds
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
CI_CHANGE_DECREASE_THRESHOLD <- -0.5

# Field Uniformity Thresholds (CV as decimal)
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35

# Cloud Coverage Thresholds (percent)
CLOUD_THRESHOLD_STRICT <- 5
CLOUD_THRESHOLD_RELAXED <- 45

# KPI Thresholds
WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week
WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent
CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months

# Spatial Analysis Thresholds
MORAN_THRESHOLD_HIGH <- 0.95
MORAN_THRESHOLD_MODERATE <- 0.85
MORAN_THRESHOLD_LOW <- 0.7

# Projections
EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations
WGS84_PROJECTION <- "EPSG:4326"  # WGS84 for lat/lon

2. Security Improvements

# 01_planet_download.py - Use environment variables
import os
config.sh_client_id = os.getenv('SENTINEL_HUB_CLIENT_ID')
config.sh_client_secret = os.getenv('SENTINEL_HUB_CLIENT_SECRET')

if not config.sh_client_id or not config.sh_client_secret:
    raise ValueError("Sentinel Hub credentials not found in environment variables")

3. Configuration File Structure

# config/projects.yaml
default:
  field_boundaries: pivot.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45

esa:
  field_boundaries: pivot_2.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45
  notes: "Uses pivot_2.geojson for yield prediction extra fields"

kibos:
  field_boundaries: pivot.geojson
  resolution: 3

Testing Recommendations

  1. Unit Tests: Add tests for utility functions

    • calculate_cv() with known datasets
    • calculate_spatial_autocorrelation() with synthetic fields
    • categorize_change() with boundary cases
  2. Integration Tests: Test full workflows

    • CI extraction with test imagery
    • Mosaic creation with varying cloud cover
    • KPI calculation with known inputs
  3. Validation: Add data validation

    • Date ranges (not in future, reasonable past)
    • File existence checks before processing
    • CRS compatibility verification

Code Quality Metrics

Script Hardcoded Values Function Quality Error Handling Documentation Grade
01_planet_download.py 5 issues Good Good Minimal B-
02_ci_extraction.R 0 issues Excellent Excellent Good A
ci_extraction_utils.R 1 minor Excellent Excellent Excellent A+
03_interpolate_growth_model.R 0 issues Excellent Excellent Good A
growth_model_utils.R 0 issues Excellent Excellent Excellent A+
04_mosaic_creation.R 0 issues Good Good Good A-
mosaic_creation_utils.R 2 issues Good Good Good B+
09_calculate_kpis.R 0 issues Good Good Good A-
kpi_utils.R 8 issues Good Good Good B
crop_messaging_utils.R 1 issue Excellent Excellent Excellent A
parameters_project.R 1 issue Excellent Excellent Good A

Overall Assessment: The codebase is well-structured with good practices, but needs threshold consolidation and security fixes for API credentials.


Conclusion

The SmartCane codebase demonstrates solid software engineering practices with good separation of concerns, comprehensive error handling, and mostly well-parameterized functions. The primary areas for improvement are:

  1. Consolidating threshold constants into a shared file
  2. Securing API credentials via environment variables
  3. Removing hardcoded magic numbers from KPI calculations
  4. Cleaning up commented code and test values

The R code quality is particularly strong, with excellent examples in growth_model_utils.R and ci_extraction_utils.R that could serve as templates for other modules.