Timon ddeab0e3bd moved files, added sys architecture to webapps

2026-02-19 12:42:54 +01:00

19 KiB

Raw Blame History

SmartCane Code Quality Check Report

Date: 2025-10-14
Reviewer: GitHub Copilot
Scope: Main processing scripts and utilities

Executive Summary

The SmartCane codebase demonstrates generally good quality with well-structured utility functions, proper error handling, and clear separation of concerns. However, there are several areas for improvement including hardcoded values, inconsistent parameterization, and opportunities for better code reusability.

Overall Grade: B+ (Good, with room for improvement)

Detailed Findings by Script

1. Python: `01_planet_download.py` / `.ipynb`

✅ Strengths

Good parameterization: Most values pulled from environment variables (os.environ.get())
Flexible date handling: Supports both DATE env var and Sys.Date() fallback
Well-structured functions: Clear separation (get_true_color_request_day, download_function, merge_files)
Error handling: Try-catch blocks for file operations
Reusable: Functions accept parameters rather than hardcoding

⚠️ Issues Found

HARDCODED VALUES:

# Line ~13-14: API Credentials HARDCODED (SECURITY RISK!)
config.sh_client_id = '1a72d811-4f0e-4447-8282-df09608cff44'
config.sh_client_secret = 'FcBlRL29i9ZmTzhmKTv1etSMFs5PxSos'

Impact: CRITICAL - Credentials exposed in code
Fix: Move to environment variables or config file

# Line ~16-17: Collection ID hardcoded
collection_id = 'c691479f-358c-46b1-b0f0-e12b70a9856c'

Impact: Medium - Would need code change for different collections
Fix: Add to configuration

# Line ~22-23: Default project hardcoded
project = 'kibos' # or xinavane or chemba_test_8b

Impact: Low - Has os.getenv('PROJECT_DIR', project) fallback
Recommendation: Remove commented projects, keep only default

# Line ~25: Days default hardcoded
days = 9 # change back to 28 which is the default. 3 years is 1095 days.

Impact: Low - Already handled by os.environ.get("DAYS", days)
Recommendation: Clean up comments, set consistent default

# Line ~126: Resolution hardcoded
resolution = 3

Impact: Medium - Should be configurable per project
Fix: Add as parameter or config variable

JUPYTER NOTEBOOK SPECIFIC ISSUES:

# Line ~88 (notebook): Hardcoded project
project = 'esa' #or xinavane or chemba_test_8b

Impact: Medium - Requires manual editing for each run
Fix: Use input cell or environment variable

# Line ~40 (notebook): Conditional geojson path
geojson_file = Path(BASE_PATH /'Data'/ ('pivot_2.geojson' if project == "esa" else 'pivot.geojson'))

Impact: Low - Project-specific logic, acceptable
Note: Consider documenting why ESA needs different file

🔧 Recommendations

URGENT: Move API credentials to environment variables
HIGH: Parameterize resolution via config or command-line arg
MEDIUM: Clean up commented code and standardize defaults
LOW: Add docstrings to all functions

2. R: `02_ci_extraction.R`

✅ Strengths

Excellent parameterization: All major values from command-line args
Good defaults: Sensible fallbacks for missing arguments
Error handling: Proper try-catch blocks with fallback paths
Modular design: Clear separation into utils file
Logging: Uses log_message() throughout

⚠️ Issues Found

NONE SIGNIFICANT - This script is well-written!

Minor observations:

# Line ~40: Commented default date
#end_date <- "2023-10-01"

Impact: None - Just cleanup needed
Fix: Remove or move to documentation

# Line ~64: Assignment to global environment
assign("ci_extraction_script", ci_extraction_script, envir = .GlobalEnv)

Impact: Low - Legitimate use for flag passing
Note: This is acceptable for script orchestration

🔧 Recommendations

LOW: Remove commented code
LOW: Consider adding validation for date ranges (e.g., not in future)

3. R: `ci_extraction_utils.R`

✅ Strengths

Pure functions: No hardcoded data in function bodies!
Excellent documentation: Comprehensive roxygen comments
Error handling: Proper validation and try-catch blocks
Terra optimization: Uses terra::global() for efficiency
Reusable: Functions accept all required parameters

⚠️ Issues Found

POTENTIAL ISSUE:

# Line ~85: Hardcoded band names assumption
names(loaded_raster) <- c("Red", "Green", "Blue", "NIR")

Impact: Low - Assumes 4-band raster structure
Context: This is CORRECT for Planet data
Recommendation: Add validation to check terra::nlyr(loaded_raster) >= 4

# Line ~92: Hardcoded CI calculation
CI <- loaded_raster$NIR / loaded_raster$Green - 1

Impact: None - This is the CI formula
Note: Consider adding comment explaining formula choice

MINOR CONCERN:

# Line ~145: Hardcoded min_valid_pixels default
min_valid_pixels = 100

Impact: Low - Good default, but could be configurable
Recommendation: Consider making this a parameter in parameters_project.R

🔧 Recommendations

MEDIUM: Add band count validation before naming
LOW: Document CI formula with reference
LOW: Consider making min_valid_pixels configurable

4. R: `03_interpolate_growth_model.R`

✅ Strengths

Clean script structure: Minimal code, delegates to utils
Good arg handling: Command-line args with defaults
Proper sourcing: Handles both default and r_app paths
Global env assignment: Justified for config passing

⚠️ Issues Found

NONE SIGNIFICANT - Well-designed script!

Minor note:

# Line ~66: Hardcoded filename
"All_pivots_Cumulative_CI_quadrant_year_v2.rds"

Impact: None - This is the OUTPUT filename
Note: Consider making this a constant in parameters_project.R

🔧 Recommendations

LOW: Move output filenames to parameters file for consistency
LOW: Add progress messages during year processing loop

5. R: `growth_model_utils.R`

✅ Strengths

EXCELLENT: No hardcoded values in function bodies
Pure functions: All data passed as parameters
Good error handling: Comprehensive try-catch blocks
Detailed logging: Informative messages at each step
Well-documented: Clear roxygen comments

⚠️ Issues Found

NONE - This is exemplary code!

🔧 Recommendations

No changes needed - This file serves as a model for others

6. R: `04_mosaic_creation.R`

✅ Strengths

Good parameterization: Command-line args with defaults
Flexible: Custom output filename support
Clear delegation: Processing logic in utils

⚠️ Issues Found

COMMENTED DEFAULTS:

# Line ~27: Commented test dates
#end_date <- "2025-07-22"  # Default date for testing
#end_date <-  "2025-07-08"   # Default date for testing

Impact: None - Just cleanup needed
Recommendation: Remove or move to comments block

WEEK CALCULATION:

# mosaic_creation_utils.R Line ~37: ISO week vs regular week
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)

Impact: None - Consistent use of ISO weeks
Note: Ensure ALL scripts use same week system (they do)

🔧 Recommendations

LOW: Clean up commented test dates
LOW: Add comment explaining ISO week choice

7. R: `mosaic_creation_utils.R`

✅ Strengths

No hardcoded data: All parameters passed to functions
Sophisticated cloud handling: Multi-threshold approach
Terra optimized: Efficient raster operations
Good documentation: Clear function purposes

⚠️ Issues Found

MAGIC NUMBERS (Cloud thresholds):

# Line ~158-159: Cloud coverage thresholds
missing_pixels_df$thres_5perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 5)
missing_pixels_df$thres_40perc[i] <- as.integer(missing_pixels_df$missing_pixels_percentage[i] < 45)

Impact: Medium - These thresholds determine mosaic quality
Fix: Move to parameters_project.R as configurable constants:

CLOUD_THRESHOLD_STRICT <- 5  # Percent
CLOUD_THRESHOLD_RELAXED <- 45  # Percent

REPEATED LOGIC: The cloud mask selection logic (lines ~203-269) has substantial code duplication across different threshold conditions.
Recommendation: Extract to helper function:

create_mosaic_from_rasters <- function(raster_list, cloud_masks = NULL) { ... }

🔧 Recommendations

HIGH: Extract cloud thresholds to constants
MEDIUM: Reduce code duplication in mosaic creation logic
LOW: Add more detailed logging for which images are selected

8. R: `09_calculate_kpis.R`

✅ Strengths

Good structure: Clean main() function
Proper delegation: All logic in utils
Comprehensive: Handles all 6 KPIs
Good error messages: Clear feedback

⚠️ Issues Found

NONE SIGNIFICANT - Well-designed orchestration script

🔧 Recommendations

LOW: Consider adding KPI validation (e.g., check for required input files)

9. R: `kpi_utils.R`

✅ Strengths

Mostly parameterized: Functions accept necessary inputs
Good documentation: Clear function purposes
Terra optimization: Efficient raster operations

⚠️ Issues Found

HARDCODED THRESHOLDS (Multiple locations):

# Line ~57: Field uniformity thresholds
uniformity_level <- dplyr::case_when(
  cv_value < 0.15 ~ "Excellent",
  cv_value < 0.25 ~ "Good", 
  cv_value < 0.35 ~ "Moderate",
  TRUE ~ "Poor"
)

Impact: HIGH - Core KPI classification logic
Fix: Move to constants at file top:

UNIFORMITY_EXCELLENT_THRESHOLD <- 0.15
UNIFORMITY_GOOD_THRESHOLD <- 0.25
UNIFORMITY_MODERATE_THRESHOLD <- 0.35

# Line ~598: Weed rapid growth threshold (CHANGED FROM 1.5)
rapid_growth_pixels <- sum(ci_change > 2.0)

Impact: HIGH - Core weed detection logic
Fix: Add to constants:

WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week

# Line ~606: Weed risk thresholds (UPDATED)
weed_risk <- dplyr::case_when(
  rapid_growth_pct < 10 ~ "Low",
  rapid_growth_pct < 25 ~ "Moderate",
  TRUE ~ "High"
)

Impact: HIGH - Core weed KPI classification
Fix: Add to constants:

WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent

# Line ~560: Field age threshold for weed analysis
if (!is.na(field_age) && field_age >= 240) {

Impact: MEDIUM - Determines weed analysis scope
Fix: Add to constants:

CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months

# Line ~545: Growth decline risk thresholds
risk_level <- dplyr::case_when(
  risk_score < 0.5 ~ "Low",
  risk_score < 1.5 ~ "Moderate", 
  risk_score < 3.0 ~ "High",
  TRUE ~ "Very-high"
)

Impact: HIGH - Core decline KPI
Fix: Add to constants

MAGIC NUMBER (Projection):

# Line ~116, ~447: Equal Earth projection
"EPSG:6933"

Impact: Low - Consistent across codebase
Recommendation: Add constant:

EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations

PLACEHOLDER DATA:

# Line ~864: Fallback field sizes
field_details$`Field Size (ha)` <- round(runif(nrow(field_details), 2.5, 4.5), 1)

Impact: MEDIUM - Creates fake data if real data missing
Context: Comment says "placeholder - can be calculated from field boundaries"
Fix: Actually calculate from field_boundaries:

field_details$`Field Size (ha)` <- calculate_field_areas(field_boundaries_sf)

🔧 Recommendations

URGENT: Extract ALL KPI thresholds to constants section
HIGH: Replace placeholder field sizes with actual calculations
MEDIUM: Add validation that thresholds make logical sense (e.g., GOOD < MODERATE)
LOW: Add comments explaining threshold choices (agronomic basis)

10. R: `crop_messaging_utils.R`

✅ Strengths

EXCELLENT constants section: Thresholds defined at top (lines ~28-43)
Good documentation: Clear threshold purposes
Pure functions: Parameters passed correctly
Comprehensive spatial analysis: Moran's I, entropy, CV

⚠️ Issues Found

INCONSISTENCY WITH KPI_UTILS:

# crop_messaging_utils.R:
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
UNIFORMITY_THRESHOLD <- 0.15
EXCELLENT_UNIFORMITY_THRESHOLD <- 0.08
POOR_UNIFORMITY_THRESHOLD <- 0.25

# kpi_utils.R: (HARDCODED IN CASE_WHEN)
cv_value < 0.15 ~ "Excellent"
cv_value < 0.25 ~ "Good"
cv_value < 0.35 ~ "Moderate"

Impact: HIGH - Different thresholds for same metrics!
Fix: Create SHARED constants file:

# r_app/analysis_constants.R
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35

🔧 Recommendations

URGENT: Unify threshold constants across crop_messaging_utils.R and kpi_utils.R
HIGH: Create shared constants file: r_app/analysis_constants.R
MEDIUM: Source this file in both utils files

11. R: `parameters_project.R`

✅ Strengths

Centralized configuration: Single source of truth for paths
Clean initialization: Well-structured setup functions
Good error handling: Fallback for missing files
CRS validation: Proper handling of spatial reference systems

⚠️ Issues Found

HARDCODED HARVEST DATA COLUMNS:

# Line ~106-113: Column selection and renaming
dplyr::select(c(
  "field", "sub_field", "year", "season_start",
  "season_end", "age", "sub_area", "tonnage_ha"
))

Impact: Medium - Assumes exact Excel column names
Recommendation: Add validation or flexible column matching

CONDITIONAL LOGIC:

# Line ~52-56: Project-specific file selection
use_pivot_2 <- exists("project_dir") && project_dir == "esa" &&
               exists("ci_extraction_script")

Impact: Low - Works but fragile
Recommendation: Consider configuration file:

projects:
  esa:
    field_boundaries: pivot_2.geojson
  default:
    field_boundaries: pivot.geojson

🔧 Recommendations

MEDIUM: Add harvest data column validation
LOW: Consider YAML/JSON config for project-specific settings
LOW: Add project directory validation (exists, writable)

Summary of Critical Issues

🚨 URGENT (Fix Immediately)

Security: API credentials hardcoded in 01_planet_download.py (lines 13-14)
Consistency: Threshold mismatch between crop_messaging_utils.R and kpi_utils.R

⚠️ HIGH Priority (Fix Soon)

Parameterization: Cloud coverage thresholds in mosaic_creation_utils.R (lines 158-159)
Parameterization: All KPI thresholds in kpi_utils.R should be constants
Data Quality: Replace placeholder field sizes with actual calculations in kpi_utils.R (line 864)

📋 MEDIUM Priority (Improvement)

Configuration: Make resolution parameterizable in 01_planet_download.py
Code Quality: Reduce duplication in mosaic creation logic
Validation: Add harvest data column validation in parameters_project.R

💡 LOW Priority (Nice to Have)

Cleanup: Remove commented code throughout
Documentation: Add formula references and threshold justifications
Structure: Consider shared configuration file for project-specific settings

Recommendations for Improvement

1. Create Shared Constants File

# r_app/analysis_constants.R

# Crop Index Change Thresholds
CI_CHANGE_INCREASE_THRESHOLD <- 0.5
CI_CHANGE_DECREASE_THRESHOLD <- -0.5

# Field Uniformity Thresholds (CV as decimal)
UNIFORMITY_EXCELLENT <- 0.15
UNIFORMITY_GOOD <- 0.25
UNIFORMITY_MODERATE <- 0.35

# Cloud Coverage Thresholds (percent)
CLOUD_THRESHOLD_STRICT <- 5
CLOUD_THRESHOLD_RELAXED <- 45

# KPI Thresholds
WEED_RAPID_GROWTH_THRESHOLD <- 2.0  # CI units per week
WEED_RISK_LOW_THRESHOLD <- 10  # Percent
WEED_RISK_MODERATE_THRESHOLD <- 25  # Percent
CANOPY_CLOSURE_AGE_DAYS <- 240  # 8 months

# Spatial Analysis Thresholds
MORAN_THRESHOLD_HIGH <- 0.95
MORAN_THRESHOLD_MODERATE <- 0.85
MORAN_THRESHOLD_LOW <- 0.7

# Projections
EQUAL_AREA_PROJECTION <- "EPSG:6933"  # Equal Earth for area calculations
WGS84_PROJECTION <- "EPSG:4326"  # WGS84 for lat/lon

2. Security Improvements

# 01_planet_download.py - Use environment variables
import os
config.sh_client_id = os.getenv('SENTINEL_HUB_CLIENT_ID')
config.sh_client_secret = os.getenv('SENTINEL_HUB_CLIENT_SECRET')

if not config.sh_client_id or not config.sh_client_secret:
    raise ValueError("Sentinel Hub credentials not found in environment variables")

3. Configuration File Structure

# config/projects.yaml
default:
  field_boundaries: pivot.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45

esa:
  field_boundaries: pivot_2.geojson
  resolution: 3
  cloud_threshold_strict: 5
  cloud_threshold_relaxed: 45
  notes: "Uses pivot_2.geojson for yield prediction extra fields"

kibos:
  field_boundaries: pivot.geojson
  resolution: 3

Testing Recommendations

Unit Tests: Add tests for utility functions
- calculate_cv() with known datasets
- calculate_spatial_autocorrelation() with synthetic fields
- categorize_change() with boundary cases
Integration Tests: Test full workflows
- CI extraction with test imagery
- Mosaic creation with varying cloud cover
- KPI calculation with known inputs
Validation: Add data validation
- Date ranges (not in future, reasonable past)
- File existence checks before processing
- CRS compatibility verification

Code Quality Metrics

Script	Hardcoded Values	Function Quality	Error Handling	Documentation	Grade
`01_planet_download.py`	5 issues	Good	Good	Minimal	B-
`02_ci_extraction.R`	0 issues	Excellent	Excellent	Good	A
`ci_extraction_utils.R`	1 minor	Excellent	Excellent	Excellent	A+
`03_interpolate_growth_model.R`	0 issues	Excellent	Excellent	Good	A
`growth_model_utils.R`	0 issues	Excellent	Excellent	Excellent	A+
`04_mosaic_creation.R`	0 issues	Good	Good	Good	A-
`mosaic_creation_utils.R`	2 issues	Good	Good	Good	B+
`09_calculate_kpis.R`	0 issues	Good	Good	Good	A-
`kpi_utils.R`	8 issues	Good	Good	Good	B
`crop_messaging_utils.R`	1 issue	Excellent	Excellent	Excellent	A
`parameters_project.R`	1 issue	Excellent	Excellent	Good	A

Overall Assessment: The codebase is well-structured with good practices, but needs threshold consolidation and security fixes for API credentials.

Conclusion

The SmartCane codebase demonstrates solid software engineering practices with good separation of concerns, comprehensive error handling, and mostly well-parameterized functions. The primary areas for improvement are:

Consolidating threshold constants into a shared file
Securing API credentials via environment variables
Removing hardcoded magic numbers from KPI calculations
Cleaning up commented code and test values

The R code quality is particularly strong, with excellent examples in growth_model_utils.R and ci_extraction_utils.R that could serve as templates for other modules.

19 KiB Raw Blame History

SmartCane Code Quality Check Report

Executive Summary

Detailed Findings by Script

1. Python: 01_planet_download.py / .ipynb

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

2. R: 02_ci_extraction.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

3. R: ci_extraction_utils.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

4. R: 03_interpolate_growth_model.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

5. R: growth_model_utils.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

6. R: 04_mosaic_creation.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

7. R: mosaic_creation_utils.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

8. R: 09_calculate_kpis.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

9. R: kpi_utils.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

10. R: crop_messaging_utils.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

11. R: parameters_project.R

✅ Strengths

⚠️ Issues Found

🔧 Recommendations

Summary of Critical Issues

🚨 URGENT (Fix Immediately)

⚠️ HIGH Priority (Fix Soon)

📋 MEDIUM Priority (Improvement)

💡 LOW Priority (Nice to Have)

Recommendations for Improvement

1. Create Shared Constants File

2. Security Improvements

3. Configuration File Structure

Testing Recommendations

Code Quality Metrics

Conclusion

19 KiB

Raw Blame History

1. Python: `01_planet_download.py` / `.ipynb`

2. R: `02_ci_extraction.R`

3. R: `ci_extraction_utils.R`

4. R: `03_interpolate_growth_model.R`

5. R: `growth_model_utils.R`

6. R: `04_mosaic_creation.R`

7. R: `mosaic_creation_utils.R`

8. R: `09_calculate_kpis.R`

9. R: `kpi_utils.R`

10. R: `crop_messaging_utils.R`

11. R: `parameters_project.R`