fixed issue 110 moving stuff to 00 common

This commit is contained in:
Timon 2026-02-03 16:14:10 +01:00
parent c4ef10f44f
commit c313f87959
15 changed files with 461 additions and 937 deletions

View file

@ -1,751 +0,0 @@
# SmartCane Pipeline Code Review
## Efficiency, Cleanup, and Architecture Analysis
**Date**: January 29, 2026
**Scope**: `run_full_pipeline.R` + all called scripts (10, 20, 21, 30, 31, 40, 80, 90, 91) + utility files
**Status**: Comprehensive review completed
---
## EXECUTIVE SUMMARY
Your pipeline is **well-structured and intentional**, but has accumulated significant technical debt through development iterations. The main issues are:
1. **🔴 HIGH IMPACT**: **3 separate mosaic mode detection functions** doing identical work
2. **🔴 HIGH IMPACT**: **Week/year calculations duplicated 10+ times** across 6+ files
3. **🟡 MEDIUM IMPACT**: **40+ debug statements** cluttering output
4. **🟡 MEDIUM IMPACT**: **File existence checks repeated** in multiple places (especially KPI checks)
5. **🟢 LOW IMPACT**: Minor redundancy in command construction, but manageable
**Estimated cleanup effort**: 2-3 hours for core refactoring; significant code quality gains.
**Workflow clarity issue**: The split between `merged_tif` vs `merged_tif_8b` and `weekly_mosaic` vs `weekly_tile_max` is **not clearly documented**. This should be clarified.
---
## 1. DUPLICATED FUNCTIONS & LOGIC
### 1.1 Mosaic Mode Detection (CRITICAL REDUNDANCY)
**Problem**: Three identical implementations of `detect_mosaic_mode()`:
| Location | Function Name | Lines | Issue |
|----------|---------------|-------|-------|
| `run_full_pipeline.R` | `detect_mosaic_mode_early()` | ~20 lines | Detects tiled vs single-file |
| `run_full_pipeline.R` | `detect_mosaic_mode_simple()` | ~20 lines | Detects tiled vs single-file (duplicate) |
| `parameters_project.R` | `detect_mosaic_mode()` | ~30 lines | Detects tiled vs single-file (different signature) |
**Impact**: If you change the detection logic, you must update 3 places. Bug risk is high.
**Solution**: Create **single canonical function in `parameters_project.R`**:
```r
# SINGLE SOURCE OF TRUTH
detect_mosaic_mode <- function(project_dir) {
weekly_tile_max <- file.path("laravel_app", "storage", "app", project_dir, "weekly_tile_max")
if (dir.exists(weekly_tile_max)) {
subfolders <- list.dirs(weekly_tile_max, full.names = FALSE, recursive = FALSE)
if (length(grep("^\\d+x\\d+$", subfolders)) > 0) return("tiled")
}
weekly_mosaic <- file.path("laravel_app", "storage", "app", project_dir, "weekly_mosaic")
if (dir.exists(weekly_mosaic) &&
length(list.files(weekly_mosaic, pattern = "^week_.*\\.tif$")) > 0) {
return("single-file")
}
return("unknown")
}
```
Then replace all three calls in `run_full_pipeline.R` with this single function.
---
### 1.2 Week/Year Calculations (CRITICAL REDUNDANCY)
**Problem**: The pattern `week_num <- as.numeric(format(..., "%V"))` + `year_num <- as.numeric(format(..., "%G"))` appears **13+ times** across multiple files.
**Locations**:
- `run_full_pipeline.R`: Lines 82, 126-127, 229-230, 630, 793-794 (5 times)
- `80_calculate_kpis.R`: Lines 323-324 (1 time)
- `80_weekly_stats_utils.R`: Lines 829-830 (1 time)
- `kpi_utils.R`: Line 45 (1 time)
- `80_kpi_utils.R`: Lines 177-178 (1 time)
- Plus inline in sprintf statements: ~10+ additional times
**Impact**:
- High maintenance burden
- Risk of inconsistency (%V vs %Y confusion noted at line 82 in `run_full_pipeline.R`)
- Code verbosity
**Solution**: Create **utility function in `parameters_project.R`**:
```r
get_iso_week_year <- function(date) {
list(
week = as.numeric(format(date, "%V")),
year = as.numeric(format(date, "%G")) # ISO year, not calendar year
)
}
# Usage:
wwy <- get_iso_week_year(end_date)
cat(sprintf("Week %02d/%d\n", wwy$week, wwy$year))
```
**Also add convenience function**:
```r
format_week_year <- function(date, separator = "_") {
wwy <- get_iso_week_year(date)
sprintf("week_%02d%s%d", wwy$week, separator, wwy$year)
}
# Usage: format_week_year(end_date) # "week_02_2026"
```
---
### 1.3 File Path Construction (MEDIUM REDUNDANCY)
**Problem**: Repeated patterns like:
```r
file.path("laravel_app", "storage", "app", project_dir, "weekly_mosaic")
file.path("laravel_app", "storage", "app", project_dir, "reports", "kpis", kpi_subdir)
```
**Solution**: Centralize in `parameters_project.R`:
```r
# Project-agnostic path builders
get_project_storage_path <- function(project_dir, subdir = NULL) {
base <- file.path("laravel_app", "storage", "app", project_dir)
if (!is.null(subdir)) file.path(base, subdir) else base
}
get_mosaic_dir <- function(project_dir, mosaic_mode = "auto") {
if (mosaic_mode == "auto") mosaic_mode <- detect_mosaic_mode(project_dir)
if (mosaic_mode == "tiled") {
get_project_storage_path(project_dir, "weekly_tile_max/5x5")
} else {
get_project_storage_path(project_dir, "weekly_mosaic")
}
}
get_kpi_dir <- function(project_dir, client_type) {
subdir <- if (client_type == "agronomic_support") "field_level" else "field_analysis"
get_project_storage_path(project_dir, file.path("reports", "kpis", subdir))
}
```
---
## 2. DEBUG STATEMENTS & LOGGING CLUTTER
### 2.1 Excessive Debug Output
The pipeline prints **40+ debug statements** that pollute the terminal output. Examples:
**In `run_full_pipeline.R`**:
```r
Line 82: cat(sprintf(" Running week: %02d / %d\n", ...)) # Note: %d (calendar year) should be %G
Line 218: cat(sprintf("[KPI_DIR_CREATED] Created directory: %s\n", ...))
Line 223: cat(sprintf("[KPI_DIR_EXISTS] %s\n", ...))
Line 224: cat(sprintf("[KPI_DEBUG] Total files in directory: %d\n", ...))
Line 225: cat(sprintf("[KPI_DEBUG] Sample files: %s\n", ...))
Line 240: cat(sprintf("[KPI_DEBUG_W%02d_%d] Pattern: '%s' | Found: %d files\n", ...))
Line 630: cat("DEBUG: Running command:", cmd, "\n")
Line 630 in Script 31 execution - prints full conda command
```
**In `80_calculate_kpis.R`**:
```
Line 323: message(paste("Calculating statistics for all fields - Week", week_num, year))
Line 417: # Plus many more ...
```
**Impact**:
- Makes output hard to scan for real issues
- Test developers skip important messages
- Production logs become noise
**Solution**: Replace with **structured logging** (3 levels):
```r
# Add to parameters_project.R
smartcane_log <- function(message, level = "INFO") {
timestamp <- format(Sys.time(), "%Y-%m-%d %H:%M:%S")
prefix <- sprintf("[%s] %s", level, timestamp)
cat(sprintf("%s | %s\n", prefix, message))
}
smartcane_debug <- function(message) {
if (Sys.getenv("SMARTCANE_DEBUG") == "TRUE") {
smartcane_log(message, level = "DEBUG")
}
}
smartcane_warn <- function(message) {
smartcane_log(message, level = "WARN")
}
```
**Usage**:
```r
# Keep important messages
smartcane_log(sprintf("Downloaded %d dates, %d failed", download_count, download_failed))
# Hide debug clutter (only show if DEBUG=TRUE)
smartcane_debug(sprintf("KPI directory exists: %s", kpi_dir))
# Warnings stay visible
smartcane_warn("Some downloads failed, but continuing pipeline")
```
---
### 2.2 Redundant Status Checks in KPI Section
**Lines 218-270 in `run_full_pipeline.R`**: The KPI requirement check has **deeply nested debug statements**.
```r
if (dir.exists(kpi_dir)) {
cat(sprintf("[KPI_DIR_EXISTS] %s\n", kpi_dir))
all_kpi_files <- list.files(kpi_dir)
cat(sprintf("[KPI_DEBUG] Total files in directory: %d\n", length(all_kpi_files)))
if (length(all_kpi_files) > 0) {
cat(sprintf("[KPI_DEBUG] Sample files: %s\n", ...))
}
} else {
cat(sprintf("[KPI_DIR_MISSING] Directory does not exist: %s\n", kpi_dir))
}
```
**Solution**: Simplify to:
```r
if (!dir.exists(kpi_dir)) {
dir.create(kpi_dir, recursive = TRUE, showWarnings = FALSE)
}
all_kpi_files <- list.files(kpi_dir)
smartcane_debug(sprintf("KPI directory: %d files found", length(all_kpi_files)))
```
---
## 3. DOUBLE CALCULATIONS & INEFFICIENCIES
### 3.1 KPI Existence Check (Calculated Twice)
**Problem**: KPI existence is checked **twice** in `run_full_pipeline.R`:
1. **First check (Lines 228-270)**: Initial KPI requirement check that calculates `kpis_needed` dataframe
2. **Second check (Lines 786-810)**: Verification after Script 80 runs (almost identical logic)
Both loops do:
```r
for (weeks_back in 0:(reporting_weeks_needed - 1)) {
check_date <- end_date - (weeks_back * 7)
week_num <- as.numeric(format(check_date, "%V"))
year_num <- as.numeric(format(check_date, "%G"))
week_pattern <- sprintf("week%02d_%d", week_num, year_num)
kpi_files_this_week <- list.files(kpi_dir, pattern = week_pattern)
has_kpis <- length(kpi_files_this_week) > 0
# ... same logic again
}
```
**Impact**: Slower pipeline execution, code duplication
**Solution**: Create **reusable function in utility file**:
```r
check_kpi_completeness <- function(project_dir, client_type, end_date, reporting_weeks_needed) {
kpi_dir <- get_kpi_dir(project_dir, client_type)
kpis_needed <- data.frame()
for (weeks_back in 0:(reporting_weeks_needed - 1)) {
check_date <- end_date - (weeks_back * 7)
wwy <- get_iso_week_year(check_date)
week_pattern <- sprintf("week%02d_%d", wwy$week, wwy$year)
has_kpis <- any(grepl(week_pattern, list.files(kpi_dir)))
kpis_needed <- rbind(kpis_needed, data.frame(
week = wwy$week,
year = wwy$year,
date = check_date,
has_kpis = has_kpis
))
}
return(list(
kpis_df = kpis_needed,
missing_count = sum(!kpis_needed$has_kpis),
all_complete = all(kpis_needed$has_kpis)
))
}
# Then in run_full_pipeline.R:
initial_kpi_check <- check_kpi_completeness(project_dir, client_type, end_date, reporting_weeks_needed)
# ... after Script 80 runs:
final_kpi_check <- check_kpi_completeness(project_dir, client_type, end_date, reporting_weeks_needed)
if (final_kpi_check$all_complete) {
smartcane_log("✓ All KPIs available")
}
```
---
### 3.2 Mosaic Mode Detection (Called 3+ Times per Run)
**Current code**:
- Line 99-117: `detect_mosaic_mode_early()` called once
- Line 301-324: `detect_mosaic_mode_simple()` called again
- Result: **Same detection logic runs twice unnecessarily**
**Solution**: Call once, store result:
```r
mosaic_mode <- detect_mosaic_mode(project_dir) # Once at top
# Then reuse throughout:
if (mosaic_mode == "tiled") { ... }
else if (mosaic_mode == "single-file") { ... }
```
---
### 3.3 Missing Weeks Calculation Inefficiency
**Lines 126-170**: The loop builds `weeks_needed` dataframe, then **immediately** iterates again to find which ones are missing.
**Current code**:
```r
# First: build all weeks
weeks_needed <- data.frame()
for (weeks_back in 0:(reporting_weeks_needed - 1)) {
# ... build weeks_needed
}
# Then: check which are missing (loop again)
missing_weeks <- data.frame()
for (i in 1:nrow(weeks_needed)) {
# ... check each week
}
```
**Solution**: Combine into **single loop**:
```r
weeks_needed <- data.frame()
missing_weeks <- data.frame()
earliest_missing_date <- end_date
for (weeks_back in 0:(reporting_weeks_needed - 1)) {
check_date <- end_date - (weeks_back * 7)
wwy <- get_iso_week_year(check_date)
# Add to weeks_needed
weeks_needed <- rbind(weeks_needed, data.frame(
week = wwy$week, year = wwy$year, date = check_date
))
# Check if missing, add to missing_weeks if so
week_pattern <- sprintf("week_%02d_%d", wwy$week, wwy$year)
mosaic_dir <- get_mosaic_dir(project_dir, mosaic_mode)
if (length(list.files(mosaic_dir, pattern = week_pattern)) == 0) {
missing_weeks <- rbind(missing_weeks, data.frame(
week = wwy$week, year = wwy$year, week_end_date = check_date
))
if (check_date - 6 < earliest_missing_date) {
earliest_missing_date <- check_date - 6
}
}
}
```
---
### 3.4 Data Source Detection Logic
**Lines 58-84**: The `data_source_used` detection is overly complex:
```r
data_source_used <- "merged_tif_8b" # Default
if (dir.exists(merged_tif_path)) {
tif_files <- list.files(merged_tif_path, pattern = "\\.tif$")
if (length(tif_files) > 0) {
data_source_used <- "merged_tif"
# ...
} else if (dir.exists(merged_tif_8b_path)) {
tif_files_8b <- list.files(merged_tif_8b_path, pattern = "\\.tif$")
# ...
}
} else if (dir.exists(merged_tif_8b_path)) {
# ...
}
```
**Issues**:
- Multiple nested conditions doing the same check
- `tif_files` and `tif_files_8b` are listed but only counts checked (not used later)
- Logic could be cleaner
**Solution**: Create utility function:
```r
detect_data_source <- function(project_dir, preferred = "auto") {
storage_dir <- get_project_storage_path(project_dir)
for (source in c("merged_tif", "merged_tif_8b")) {
source_dir <- file.path(storage_dir, source)
if (dir.exists(source_dir)) {
tifs <- list.files(source_dir, pattern = "\\.tif$")
if (length(tifs) > 0) return(source)
}
}
smartcane_warn("No data source found - defaulting to merged_tif_8b")
return("merged_tif_8b")
}
```
---
## 4. WORKFLOW CLARITY ISSUES
### 4.1 TIFF Data Format Confusion
**Problem**: Why are there TWO different TIFF folders?
- `merged_tif`: 4-band data (RGB + NIR)
- `merged_tif_8b`: 8-band data (appears to include UDM cloud masking from Planet)
**Currently in code**:
```r
data_source <- if (project_dir == "angata") "merged_tif_8b" else "merged_tif"
```
**Issues**:
- Hard-coded per project, not based on what's actually available
- Not documented **why** angata uses 8-band
- Unclear what the 8-band data adds (cloud masking? extra bands?)
- Scripts handle both, but it's not clear when to use which
**Recommendation**:
1. **Document in `parameters_project.R`** what each data source contains:
```r
DATA_SOURCE_FORMATS <- list(
"merged_tif" = list(
bands = 4,
description = "4-band PlanetScope: Red, Green, Blue, NIR",
projects = c("aura", "chemba", "xinavane"),
note = "Standard format from Planet API"
),
"merged_tif_8b" = list(
bands = 8,
description = "8-band PlanetScope with UDM: RGB+NIR + 4-band cloud mask",
projects = c("angata"),
note = "Enhanced with cloud confidence from UDM2 (Unusable Data Mask)"
)
)
```
2. **Update hard-coded assignment** to be data-driven:
```r
# OLD: data_source <- if (project_dir == "angata") "merged_tif_8b" else "merged_tif"
# NEW: detect what's actually available
data_source <- detect_data_source(project_dir)
```
---
### 4.2 Mosaic Storage Format Confusion
**Problem**: Why are there TWO different mosaic storage styles?
- `weekly_mosaic/`: Single TIF file per week (monolithic)
- `weekly_tile_max/5x5/`: Tiled TIFFs per week (25+ files per week)
**Currently in code**:
- Detected automatically via `detect_mosaic_mode()`
- But **no documentation** on when/why each is used
**Recommendation**:
1. **Document the trade-offs in `parameters_project.R`**:
```r
MOSAIC_MODES <- list(
"single-file" = list(
description = "One TIF per week",
storage_path = "weekly_mosaic/",
files_per_week = 1,
pros = c("Simpler file management", "Easier to load full mosaic"),
cons = c("Slower for field-specific analysis", "Large file I/O"),
suitable_for = c("agronomic_support", "dashboard visualization")
),
"tiled" = list(
description = "5×5 grid of tiles per week",
storage_path = "weekly_tile_max/5x5/",
files_per_week = 25,
pros = c("Parallel field processing", "Faster per-field queries", "Scalable to 1000+ fields"),
cons = c("More file management", "Requires tile_grid metadata"),
suitable_for = c("cane_supply", "large-scale operations")
)
)
```
2. **Document why angata uses tiled, aura uses single-file**:
- Is it a function of field count? (Angata = cane_supply, large fields → tiled)
- Is it historical? (Legacy decision?)
- Should new projects choose based on client type?
---
### 4.3 Client Type Mapping Clarity
**Current structure** in `parameters_project.R`:
```r
CLIENT_TYPE_MAP <- list(
"angata" = "cane_supply",
"aura" = "agronomic_support",
"chemba" = "cane_supply",
"xinavane" = "cane_supply",
"esa" = "cane_supply"
)
```
**Issues**:
- Not clear **why** aura is agronomic_support while angata/chemba are cane_supply
- No documentation of what each client type needs
- Scripts branch heavily on `skip_cane_supply_only` logic
**Recommendation**:
Add metadata to explain the distinction:
```r
CLIENT_TYPES <- list(
"cane_supply" = list(
description = "Sugar mill supply chain optimization",
requires_harvest_prediction = TRUE, # Script 31
requires_phase_assignment = TRUE, # Based on planting date
per_field_detail = TRUE, # Script 91 Excel report
data_sources = c("merged_tif", "merged_tif_8b"),
mosaic_mode = "tiled",
projects = c("angata", "chemba", "xinavane", "esa")
),
"agronomic_support" = list(
description = "Farm-level decision support for agronomists",
requires_harvest_prediction = FALSE,
requires_phase_assignment = FALSE,
per_field_detail = FALSE,
farm_level_kpis = TRUE, # Script 90 Word report
data_sources = c("merged_tif"),
mosaic_mode = "single-file",
projects = c("aura")
)
)
```
---
## 5. COMMAND CONSTRUCTION REDUNDANCY
### 5.1 Rscript Path Repetition
**Problem**: The Rscript path is repeated 5 times:
```r
Line 519: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"'
Line 676: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"'
Line 685: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"'
```
**Solution**: Define once in `parameters_project.R`:
```r
RSCRIPT_PATH <- "C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"
# Usage:
cmd <- sprintf('"%s" --vanilla r_app/20_ci_extraction.R ...', RSCRIPT_PATH)
```
---
## 6. SPECIFIC LINE-BY-LINE ISSUES
### 6.1 Line 82 Bug: Wrong Format Code
```r
cat(sprintf(" Running week: %02d / %d\n",
as.numeric(format(end_date, "%V")),
as.numeric(format(end_date, "%Y")))) # ❌ Should be %G, not %Y
```
**Issue**: Uses calendar year `%Y` instead of ISO week year `%G`. On dates like 2025-12-30 (week 1 of 2026), this will print "Week 01 / 2025" (confusing).
**Fix**:
```r
wwy <- get_iso_week_year(end_date)
cat(sprintf(" Running week: %02d / %d\n", wwy$week, wwy$year))
```
---
### 6.2 Line 630 Debug Statement
```r
cmd <- sprintf('conda run -n pytorch_gpu python python_app/31_harvest_imminent_weekly.py %s', project_dir)
cat("DEBUG: Running command:", cmd, "\n") # ❌ Prints full conda command
```
**Solution**: Use `smartcane_debug()` function:
```r
cmd <- sprintf('conda run -n pytorch_gpu python python_app/31_harvest_imminent_weekly.py %s', project_dir)
smartcane_debug(sprintf("Running Python 31: %s", cmd))
```
---
### 6.3 Lines 719-723: Verbose Script 31 Verification
```r
# Check for THIS WEEK's specific file
current_week <- as.numeric(format(end_date, "%V"))
current_year <- as.numeric(format(end_date, "%Y"))
expected_file <- file.path(...)
```
**Issue**: Calculates week twice (already done earlier). Also uses `%Y` (should be `%G`).
**Solution**: Reuse earlier `wwy` calculation or create helper.
---
## 7. REFACTORING ROADMAP
### Phase 1: Foundation (1 hour)
- [ ] Consolidate `detect_mosaic_mode()` into single function in `parameters_project.R`
- [ ] Create `get_iso_week_year()` and `format_week_year()` utilities
- [ ] Create `get_project_storage_path()`, `get_mosaic_dir()`, `get_kpi_dir()` helpers
- [ ] Add logging functions (`smartcane_log()`, `smartcane_debug()`, `smartcane_warn()`)
### Phase 2: Deduplication (1 hour)
- [ ] Replace all 13+ week_num/year_num calculations with `get_iso_week_year()`
- [ ] Replace all 3 `detect_mosaic_mode_*()` calls with single function
- [ ] Combine duplicate KPI checks into `check_kpi_completeness()` function
- [ ] Fix line 82 and 630 format bugs
### Phase 3: Cleanup (1 hour)
- [ ] Remove all debug statements (40+), replace with `smartcane_debug()`
- [ ] Simplify nested conditions in data_source detection
- [ ] Combine missing weeks detection into single loop
- [ ] Extract Rscript path to constant
### Phase 4: Documentation (30 min)
- [ ] Add comments explaining `merged_tif` vs `merged_tif_8b` trade-offs
- [ ] Document `single-file` vs `tiled` mosaic modes and when to use each
- [ ] Clarify client type mapping in `CLIENT_TYPE_MAP`
- [ ] Add inline comments for non-obvious logic
---
## 8. ARCHITECTURE & WORKFLOW RECOMMENDATIONS
### 8.1 Clear Data Flow Diagram
Add to `r_app/system_architecture/system_architecture.md`:
```
INPUT SOURCES:
├── Planet API 4-band or 8-band imagery
├── Field boundaries (pivot.geojson)
└── Harvest data (harvest.xlsx, optional for cane_supply)
STORAGE TIERS:
├── Tier 1: Raw data (merged_tif/ or merged_tif_8b/)
├── Tier 2: Daily tiles (daily_tiles_split/{grid_size}/{dates}/)
├── Tier 3: Extracted CI (Data/extracted_ci/daily_vals/*.rds)
├── Tier 4: Weekly mosaics (weekly_mosaic/ OR weekly_tile_max/5x5/)
└── Tier 5: KPI outputs (reports/kpis/{field_level|field_analysis}/)
DECISION POINTS:
└─ Client type (cane_supply vs agronomic_support)
├─ Drives script selection (Scripts 21, 22, 23, 31, 90/91)
├─ Drives data source (merged_tif_8b for cane_supply, merged_tif for agronomic)
├─ Drives mosaic mode (tiled for cane_supply, single-file for agronomic)
└─ Drives KPI subdirectory (field_analysis vs field_level)
```
### 8.2 .sh Scripts Alignment
You mention `.sh` scripts in the online environment. If they're **not calling the R pipeline**, there's a **split responsibility** issue:
**Question**: Are the `.sh` scripts:
- (A) Independent duplicates of the R pipeline logic? (BAD - maintenance nightmare)
- (B) Wrappers calling the R pipeline? (GOOD - single source of truth)
- (C) Different workflow for online vs local? (RED FLAG - they diverge)
**Recommendation**: If using `.sh` for production, ensure they **call the same R scripts** (`run_full_pipeline.R`). Example:
```bash
#!/bin/bash
# Wrapper that ensures R pipeline is called
cd /path/to/smartcane
& "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/run_full_pipeline.R
```
---
## 9. SUMMARY TABLE: Issues by Severity
| Issue | Type | Impact | Effort | Priority |
|-------|------|--------|--------|----------|
| 3 mosaic detection functions | Duplication | HIGH | 30 min | P0 |
| 13+ week/year calculations | Duplication | HIGH | 1 hour | P0 |
| 40+ debug statements | Clutter | MEDIUM | 1 hour | P1 |
| KPI check run twice | Inefficiency | LOW | 30 min | P2 |
| Line 82: %Y should be %G | Bug | LOW | 5 min | P2 |
| Data source confusion | Documentation | MEDIUM | 30 min | P1 |
| Mosaic mode confusion | Documentation | MEDIUM | 30 min | P1 |
| Client type mapping | Documentation | MEDIUM | 30 min | P1 |
| Data source detection complexity | Code style | LOW | 15 min | P3 |
---
## 10. RECOMMENDED NEXT STEPS
1. **Review this report** with your team to align on priorities
2. **Create Linear issues** for each phase of refactoring
3. **Start with Phase 1** (foundation utilities) - builds confidence for Phase 2
4. **Test thoroughly** after each phase - the pipeline is complex and easy to break
5. **Update `.sh` scripts** if they duplicate R logic
6. **Document data flow** in `system_architecture/system_architecture.md`
---
## Questions for Clarification
Before implementing, please clarify:
1. **Data source split**: Why does angata use `merged_tif_8b` (8-band with cloud mask) while aura uses `merged_tif` (4-band)? Is this:
- A function of client need (cane_supply requires cloud masking)?
- Historical (legacy decision for angata)?
- Should new projects choose based on availability?
2. **Mosaic mode split**: Why tiled for angata but single-file for aura? Should this be:
- Hard-coded per project?
- Based on field count/client type?
- Auto-detected from first run?
3. **Production vs local**: Are the `.sh` scripts in the online environment:
- Calling this same R pipeline?
- Duplicating logic independently?
- A different workflow entirely?
4. **Client type growth**: Are there other client types planned beyond `cane_supply` and `agronomic_support`? (e.g., extension_service?)
---
**Report prepared**: January 29, 2026
**Total code reviewed**: ~2,500 lines across 10 files
**Estimated refactoring time**: 3-4 hours
**Estimated maintenance savings**: 5-10 hours/month (fewer bugs, easier updates)

401
r_app/00_common_utils.R Normal file
View file

@ -0,0 +1,401 @@
# ==============================================================================
# 00_COMMON_UTILS.R
# ==============================================================================
# GENERIC UTILITY FUNCTIONS FOR SMARTCANE PIPELINE
#
# PURPOSE:
# Centralized location for foundational utilities used across multiple scripts.
# These functions have NO project knowledge, NO client-type dependencies,
# NO domain-specific logic.
#
# USAGE:
# All scripts (10, 20, 21, 30, 40, 80, 90, 91) should source this file:
#
# source(here::here("r_app", "parameters_project.R")) # Config first
# source(here::here("r_app", "00_common_utils.R")) # Then common utilities
#
# FUNCTIONS:
# 1. safe_log() — Generic logging with [LEVEL] prefix
# 2. smartcane_debug() — Conditional debug logging
# 3. smartcane_warn() — Convenience wrapper for WARN-level messages
# 4. date_list() — Generate date sequences for processing windows
# 5. get_iso_week() — Extract ISO week number from date
# 6. get_iso_year() — Extract ISO year from date
# 7. get_iso_week_year() — Extract both ISO week and year as list
# 8. format_week_label() — Format date as week/year label (e.g., "week01_2025")
# 9. load_field_boundaries() — Load field geometries from GeoJSON
# 10. load_harvesting_data() — Load harvest schedule from Excel
#
# ==============================================================================
#' Safe Logging Function
#'
#' Generic logging with [LEVEL] prefix. Works standalone without any framework.
#' Consistent with SmartCane logging standard.
#'
#' @param message The message to log
#' @param level The log level (default: "INFO"). Options: "INFO", "WARNING", "ERROR", "DEBUG"
#' @return NULL (invisible, used for side effects)
#'
#' @examples
#' safe_log("Processing started", "INFO")
#' safe_log("Check input file", "WARNING")
#' safe_log("Failed to load data", "ERROR")
#'
safe_log <- function(message, level = "INFO") {
prefix <- sprintf("[%s]", level)
cat(sprintf("%s %s\n", prefix, message))
}
#' SmartCane Debug Logging (Conditional)
#'
#' Logs DEBUG-level messages only if verbose=TRUE or SMARTCANE_DEBUG env var is set.
#' Useful for development/troubleshooting without cluttering normal output.
#'
#' @param message The message to log
#' @param verbose Whether to output regardless of SMARTCANE_DEBUG (default: FALSE)
#' @return NULL (invisible, used for side effects)
#'
#' @examples
#' smartcane_debug("Processing field 1", verbose = FALSE) # Only if SMARTCANE_DEBUG=TRUE
#' smartcane_debug("Detailed state info", verbose = TRUE) # Always outputs
#'
smartcane_debug <- function(message, verbose = FALSE) {
if (!verbose && Sys.getenv("SMARTCANE_DEBUG") != "TRUE") {
return(invisible(NULL))
}
safe_log(message, level = "DEBUG")
}
#' SmartCane Warning Logging
#'
#' Logs WARN-level messages. Convenience wrapper around safe_log().
#'
#' @param message The message to log
#' @return NULL (invisible, used for side effects)
#'
#' @examples
#' smartcane_warn("Check data format before proceeding")
#'
smartcane_warn <- function(message) {
safe_log(message, level = "WARN")
}
#' Extract ISO Week Number from Date
#'
#' Extracts ISO week number (1-53) from a date using %V format.
#' ISO weeks follow the international standard: Week 1 starts on Monday.
#'
#' @param date A Date object or string convertible to Date
#' @return Numeric: ISO week number (1-53)
#'
#' @examples
#' get_iso_week(as.Date("2025-01-15")) # Returns: 3
#'
get_iso_week <- function(date) {
as.numeric(format(date, "%V"))
}
#' Extract ISO Year from Date
#'
#' Extracts ISO year from a date using %G format.
#' ISO year can differ from calendar year around year boundaries.
#'
#' @param date A Date object or string convertible to Date
#' @return Numeric: ISO year
#'
#' @examples
#' get_iso_year(as.Date("2025-01-01")) # Returns: 2025
#'
get_iso_year <- function(date) {
as.numeric(format(date, "%G"))
}
#' Extract ISO Week and Year as List
#'
#' Combines get_iso_week() and get_iso_year() for convenience.
#'
#' @param date A Date object or string convertible to Date
#' @return List with elements: week (1-53), year
#'
#' @examples
#' wwy <- get_iso_week_year(as.Date("2025-01-15"))
#' # Returns: list(week = 3, year = 2025)
#'
get_iso_week_year <- function(date) {
list(
week = as.numeric(format(date, "%V")),
year = as.numeric(format(date, "%G"))
)
}
#' Format Date as Week/Year Label
#'
#' Converts a date into a readable week label format.
#' Useful for filenames, directory names, and output identification.
#'
#' @param date A Date object or string convertible to Date
#' @param separator Separator between week number and year (default: "_")
#' @return String in format "week##_YYYY" (e.g., "week03_2025")
#'
#' @examples
#' format_week_label(as.Date("2025-01-15")) # "week03_2025"
#' format_week_label(as.Date("2025-01-15"), "-") # "week03-2025"
#'
format_week_label <- function(date, separator = "_") {
wwy <- get_iso_week_year(date)
sprintf("week%02d%s%d", wwy$week, separator, wwy$year)
}
#' Load Field Boundaries from GeoJSON
#'
#' Loads field polygon geometries from GeoJSON file (pivot.geojson or pivot_2.geojson).
#' Handles CRS validation and column standardization.
#'
#' @param data_dir Directory containing GeoJSON file
#' @return List with elements:
#' - field_boundaries_sf: sf (Simple Features) object
#' - field_boundaries: terra SpatVect object (if conversion successful, else sf fallback)
#'
#' @details
#' Automatically selects pivot_2.geojson for ESA project during CI extraction,
#' otherwise uses pivot.geojson. Handles both multi-polygon and simple polygon geometries.
#'
#' @examples
#' boundaries <- load_field_boundaries("laravel_app/storage/app/angata")
#' head(boundaries$field_boundaries_sf)
#'
load_field_boundaries <- function(data_dir) {
# Choose field boundaries file based on project and script type
# ESA project uses pivot_2.geojson ONLY for scripts 02-03 (CI extraction & growth model)
# All other scripts (including 04-mosaic, 09-KPIs, 10-reports) use pivot.geojson
use_pivot_2 <- exists("project_dir") && project_dir == "esa" &&
exists("ci_extraction_script") # ci_extraction_script flag set by scripts 02-03
if (use_pivot_2) {
field_boundaries_path <- here(data_dir, "pivot_2.geojson")
} else {
field_boundaries_path <- here(data_dir, "pivot.geojson")
}
if (!file.exists(field_boundaries_path)) {
stop(paste("Field boundaries file not found at path:", field_boundaries_path))
}
tryCatch({
# Read GeoJSON with explicit CRS handling
field_boundaries_sf <- st_read(field_boundaries_path, quiet = TRUE)
# Remove OBJECTID column immediately if it exists
if ("OBJECTID" %in% names(field_boundaries_sf)) {
field_boundaries_sf <- field_boundaries_sf %>% select(-OBJECTID)
}
# Validate and fix CRS if needed
tryCatch({
# Simply assign WGS84 if not already set (safe approach)
if (is.na(sf::st_crs(field_boundaries_sf)$epsg)) {
st_crs(field_boundaries_sf) <- 4326
warning("CRS was missing, assigned WGS84 (EPSG:4326)")
}
}, error = function(e) {
tryCatch({
st_crs(field_boundaries_sf) <<- 4326
}, error = function(e2) {
warning(paste("Could not set CRS:", e2$message))
})
})
# Handle column names - accommodate optional sub_area column
if ("sub_area" %in% names(field_boundaries_sf)) {
field_boundaries_sf <- field_boundaries_sf %>%
dplyr::select(field, sub_field, sub_area) %>%
sf::st_set_geometry("geometry")
} else {
field_boundaries_sf <- field_boundaries_sf %>%
dplyr::select(field, sub_field) %>%
sf::st_set_geometry("geometry")
}
# Convert to terra vector if possible, otherwise use sf
field_boundaries <- tryCatch({
field_boundaries_terra <- terra::vect(field_boundaries_sf)
crs_value <- tryCatch(terra::crs(field_boundaries_terra), error = function(e) NULL)
crs_str <- if (!is.null(crs_value)) as.character(crs_value) else ""
if (is.null(crs_value) || length(crs_value) == 0 || nchar(crs_str) == 0) {
terra::crs(field_boundaries_terra) <- "EPSG:4326"
warning("Terra object CRS was empty, assigned WGS84 (EPSG:4326)")
}
field_boundaries_terra
}, error = function(e) {
warning(paste("Terra conversion failed, using sf object instead:", e$message))
field_boundaries_sf
})
return(list(
field_boundaries_sf = field_boundaries_sf,
field_boundaries = field_boundaries
))
}, error = function(e) {
cat("[DEBUG] Error in load_field_boundaries:\n")
cat(" Message:", e$message, "\n")
cat(" Call:", deparse(e$call), "\n")
stop(paste("Error loading field boundaries:", e$message))
})
}
#' Load Harvesting Data from Excel
#'
#' Loads crop harvest schedule from harvest.xlsx file.
#' Handles flexible date formats (numeric, YYYY-MM-DD, DD/MM/YYYY, etc.).
#'
#' @param data_dir Directory containing harvest.xlsx file
#' @return Data frame with columns: field, sub_field, year, season_start, season_end,
#' age (weeks), sub_area, tonnage_ha. Returns NULL if file not found.
#'
#' @examples
#' harvest <- load_harvesting_data("laravel_app/storage/app/angata")
#' head(harvest)
#'
load_harvesting_data <- function(data_dir) {
harvest_file <- here(data_dir, "harvest.xlsx")
if (!file.exists(harvest_file)) {
warning(paste("Harvest data file not found at path:", harvest_file))
return(NULL)
}
# Helper function to parse dates with multiple format detection
parse_flexible_date <- function(x) {
if (is.na(x) || is.null(x)) return(NA_real_)
if (inherits(x, "Date")) return(x)
if (inherits(x, "POSIXct")) return(as.Date(x))
# If it's numeric (Excel date serial), convert directly
if (is.numeric(x)) {
return(as.Date(x, origin = "1899-12-30"))
}
# Try character conversion with multiple formats
x_char <- as.character(x)
formats <- c("%Y-%m-%d", "%d/%m/%Y", "%m/%d/%Y", "%Y-%m-%d %H:%M:%S")
for (fmt in formats) {
result <- suppressWarnings(as.Date(x_char, format = fmt))
if (!is.na(result)) return(result)
}
return(NA)
}
tryCatch({
harvesting_data <- read_excel(harvest_file) %>%
dplyr::select(
c(
"field",
"sub_field",
"year",
"season_start",
"season_end",
"age",
"sub_area",
"tonnage_ha"
)
) %>%
mutate(
field = as.character(field),
sub_field = as.character(sub_field),
year = as.numeric(year),
season_start = sapply(season_start, parse_flexible_date),
season_end = sapply(season_end, parse_flexible_date),
season_start = as.Date(season_start, origin = "1970-01-01"),
season_end = as.Date(season_end, origin = "1970-01-01"),
age = as.numeric(age),
sub_area = as.character(sub_area),
tonnage_ha = as.numeric(tonnage_ha)
) %>%
mutate(
season_end = case_when(
season_end > Sys.Date() ~ Sys.Date(),
is.na(season_end) ~ Sys.Date(),
TRUE ~ season_end
),
age = round(as.numeric(season_end - season_start) / 7, 0)
)
return(harvesting_data)
}, error = function(e) {
warning(paste("Error loading harvesting data:", e$message))
return(NULL)
})
}
#' Generate a Sequence of Dates for Processing
#'
#' Creates a date range from start_date to end_date and extracts week/year info.
#' Used by Scripts 20, 30, 40 to determine data processing windows.
#'
#' @param end_date The end date for the sequence (Date object or "YYYY-MM-DD" string)
#' @param offset Number of days to look back from end_date (e.g., 7 for one week)
#' @return A list containing:
#' - week: ISO week number of start_date
#' - year: ISO year of start_date
#' - days_filter: Vector of dates in "YYYY-MM-DD" format
#' - start_date: Start date as Date object
#' - end_date: End date as Date object
#'
#' @details
#' IMPORTANT: Uses `lubridate::week()` and `lubridate::year()` which return
#' ISO week numbers (week 1 starts on Monday). For ISO week-based calculations,
#' use `lubridate::isoweek()` and `lubridate::isoyear()` instead.
#'
#' @examples
#' dates <- date_list(as.Date("2025-01-15"), offset = 7)
#' # Returns: week=2, year=2025, days_filter = c("2025-01-09", ..., "2025-01-15")
#'
#' dates <- date_list("2025-12-31", offset = 14)
#' # Handles string input and returns 14 days of data
#'
date_list <- function(end_date, offset) {
# Input validation
if (!lubridate::is.Date(end_date)) {
end_date <- as.Date(end_date)
if (is.na(end_date)) {
stop("Invalid end_date provided. Expected a Date object or a string convertible to Date.")
}
}
offset <- as.numeric(offset)
if (is.na(offset) || offset < 1) {
stop("Invalid offset provided. Expected a positive number.")
}
# Calculate date range
offset <- offset - 1 # Adjust offset to include end_date
start_date <- end_date - lubridate::days(offset)
# Extract ISO week and year information (from END date for reporting period)
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)
# Generate sequence of dates
days_filter <- seq(from = start_date, to = end_date, by = "day")
days_filter <- format(days_filter, "%Y-%m-%d") # Format for consistent filtering
# Log the date range
safe_log(paste("Date range generated from", start_date, "to", end_date))
return(list(
"week" = week,
"year" = year,
"days_filter" = days_filter,
"start_date" = start_date,
"end_date" = end_date
))
}
# ==============================================================================
# END 00_COMMON_UTILS.R
# ==============================================================================

View file

@ -46,6 +46,7 @@ library(sf)
# LOAD CENTRALIZED PARAMETERS & PATHS
# ==============================================================================
source(here::here("r_app", "parameters_project.R"))
source(here::here("r_app", "00_common_utils.R"))
# Get project parameter from command line
args <- commandArgs(trailingOnly = TRUE)
@ -58,9 +59,9 @@ if (length(args) == 0) {
# Load centralized path structure (creates all directories automatically)
paths <- setup_project_directories(PROJECT)
smartcane_log(paste("Project:", PROJECT))
smartcane_log(paste("Base path:", paths$laravel_storage_dir))
smartcane_log(paste("Data dir:", paths$data_dir))
safe_log(paste("Project:", PROJECT))
safe_log(paste("Base path:", paths$laravel_storage_dir))
safe_log(paste("Data dir:", paths$data_dir))
# Unified function to crop TIFF to field boundaries
# Called by both migration and processing phases
@ -72,14 +73,14 @@ crop_tiff_to_fields <- function(tif_path, tif_date, fields, output_base_dir) {
# Load raster
if (!file.exists(tif_path)) {
smartcane_log(paste("ERROR: TIFF not found:", tif_path))
safe_log(paste("ERROR: TIFF not found:", tif_path))
return(list(created = 0, skipped = 0, errors = 1))
}
rast <- tryCatch({
rast(tif_path)
}, error = function(e) {
smartcane_log(paste("ERROR loading raster:", e$message))
safe_log(paste("ERROR loading raster:", e$message))
return(NULL)
})
@ -99,7 +100,7 @@ crop_tiff_to_fields <- function(tif_path, tif_date, fields, output_base_dir) {
overlapping_indices <- unique(unlist(overlapping_indices))
if (length(overlapping_indices) == 0) {
smartcane_log(paste("No fields intersect TIFF:", basename(tif_path)))
safe_log(paste("No fields intersect TIFF:", basename(tif_path)))
return(list(created = 0, skipped = 0, errors = 0))
}
@ -129,7 +130,7 @@ crop_tiff_to_fields <- function(tif_path, tif_date, fields, output_base_dir) {
writeRaster(field_rast, output_path, overwrite = TRUE)
created <- created + 1
}, error = function(e) {
smartcane_log(paste("ERROR cropping field", field_name, ":", e$message))
safe_log(paste("ERROR cropping field", field_name, ":", e$message))
errors <<- errors + 1
})
}
@ -142,13 +143,13 @@ crop_tiff_to_fields <- function(tif_path, tif_date, fields, output_base_dir) {
# NORMAL MODE: Otherwise, process merged_tif/ → field_tiles/
process_new_merged_tif <- function(merged_tif_dir, field_tiles_dir, fields, field_tiles_ci_dir = NULL) {
smartcane_log("\n========================================")
smartcane_log("PHASE 2: PROCESSING NEW DOWNLOADS")
smartcane_log("========================================")
safe_log("\n========================================")
safe_log("PHASE 2: PROCESSING NEW DOWNLOADS")
safe_log("========================================")
# Check if download directory exists
if (!dir.exists(merged_tif_dir)) {
smartcane_log("No merged_tif/ directory found - no new data to process")
safe_log("No merged_tif/ directory found - no new data to process")
return(list(total_created = 0, total_skipped = 0, total_errors = 0))
}
@ -164,10 +165,10 @@ process_new_merged_tif <- function(merged_tif_dir, field_tiles_dir, fields, fiel
full.names = TRUE
)
smartcane_log(paste("Found", length(tiff_files), "TIFF(s) to process"))
safe_log(paste("Found", length(tiff_files), "TIFF(s) to process"))
if (length(tiff_files) == 0) {
smartcane_log("No new TIFFs found - nothing to process")
safe_log("No new TIFFs found - nothing to process")
return(list(total_created = 0, total_skipped = 0, total_errors = 0))
}
@ -196,13 +197,13 @@ process_new_merged_tif <- function(merged_tif_dir, field_tiles_dir, fields, fiel
}
if (date_migrated) {
smartcane_log(paste("Skipping:", tif_date, "(already migrated and processed by Script 20)"))
safe_log(paste("Skipping:", tif_date, "(already migrated and processed by Script 20)"))
total_skipped <- total_skipped + 1
next
}
}
smartcane_log(paste("Processing:", tif_date))
safe_log(paste("Processing:", tif_date))
result <- crop_tiff_to_fields(tif_path, tif_date, fields, field_tiles_dir)
total_created <- total_created + result$created
@ -210,7 +211,7 @@ process_new_merged_tif <- function(merged_tif_dir, field_tiles_dir, fields, fiel
total_errors <- total_errors + result$errors
}
smartcane_log(paste("Processing complete: created =", total_created,
safe_log(paste("Processing complete: created =", total_created,
", skipped =", total_skipped, ", errors =", total_errors))
return(list(total_created = total_created, total_skipped = total_skipped,
@ -222,9 +223,9 @@ process_new_merged_tif <- function(merged_tif_dir, field_tiles_dir, fields, fiel
# MAIN EXECUTION
# ==============================================================================
smartcane_log("========================================")
smartcane_log(paste("Script 10: Per-Field TIFF Creation for", PROJECT))
smartcane_log("========================================")
safe_log("========================================")
safe_log(paste("Script 10: Per-Field TIFF Creation for", PROJECT))
safe_log("========================================")
# Load field boundaries using centralized path (no dir.create needed - already created by setup_project_directories)
fields <- load_field_boundaries(paths$field_boundaries_path)
@ -238,11 +239,11 @@ field_tiles_ci_dir <- paths$field_tiles_ci_dir
# Pass field_tiles_ci_dir so it can skip dates already migrated
process_result <- process_new_merged_tif(merged_tif_dir, field_tiles_dir, fields, field_tiles_ci_dir)
smartcane_log("\n========================================")
smartcane_log("FINAL SUMMARY")
smartcane_log("========================================")
smartcane_log(paste("Processing: created =", process_result$total_created,
safe_log("\n========================================")
safe_log("FINAL SUMMARY")
safe_log("========================================")
safe_log(paste("Processing: created =", process_result$total_created,
", skipped =", process_result$total_skipped,
", errors =", process_result$total_errors))
smartcane_log("Script 10 complete")
smartcane_log("========================================\n")
safe_log("Script 10 complete")
safe_log("========================================\n")

View file

@ -114,6 +114,15 @@ main <- function() {
# Load centralized path structure (creates all directories automatically)
paths <- setup_project_directories(project_dir)
cat("[DEBUG] Attempting to source r_app/00_common_utils.R\n")
tryCatch({
source("r_app/00_common_utils.R")
cat("[DEBUG] Successfully sourced r_app/00_common_utils.R\n")
}, error = function(e) {
cat("[ERROR] Failed to source r_app/00_common_utils.R:\n", e$message, "\n")
stop(e)
})
cat("[DEBUG] Attempting to source r_app/20_ci_extraction_utils.R\n")
tryCatch({
source("r_app/20_ci_extraction_utils.R")

View file

@ -11,24 +11,6 @@
# - calc_ci_from_raster(): Calculate CI from 4-band raster (Chlorophyll Index formula: NIR/Green - 1)
# - extract_ci_by_subfield(): Extract per-sub_field CI statistics from raster
#' Safe logging function that works whether log_message exists or not
#'
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
#'
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Generate a sequence of dates for processing
#'
#' @param end_date The end date for the sequence (Date object)

View file

@ -5,24 +5,6 @@
# Utility functions for growth model interpolation and manipulation.
# These functions support the creation of continuous growth models from point measurements.
#' Safe logging function that works whether log_message exists or not
#'
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
#'
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Load and prepare the combined CI data (Per-Field Architecture)
#'
#' @param daily_vals_dir Directory containing per-field daily RDS files (Data/extracted_ci/daily_vals)

View file

@ -20,9 +20,11 @@ suppressPackageStartupMessages({
})
# =============================================================================
# Load utility functions from 30_growth_model_utils.R
# Load configuration and utility functions
# =============================================================================
source("r_app/30_growth_model_utils.R")
source(here::here("r_app", "parameters_project.R"))
source(here::here("r_app", "00_common_utils.R"))
source(here::here("r_app", "30_growth_model_utils.R"))
# =============================================================================
# Main Processing

View file

@ -124,6 +124,7 @@ main <- function() {
tryCatch({
source("r_app/parameters_project.R")
source("r_app/00_common_utils.R")
source("r_app/40_mosaic_creation_utils.R")
safe_log(paste("Successfully sourced files from 'r_app' directory."))
}, error = function(e) {

View file

@ -17,23 +17,6 @@
# ↓
# Scripts 90/91: Read weekly_mosaic/{FIELD}/week_WW_YYYY.tif (unchanged interface)
#' Safe logging function
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
#'
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Generate date range for processing (ISO week-based)
#'
#' @param end_date The end date (Date object or YYYY-MM-DD string)

View file

@ -35,66 +35,14 @@ detect_tile_structure_from_files <- function(merged_final_tif_dir) {
))
}
#' Safe logging function
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
#'
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Generate a sequence of dates for processing
#'
#' @param end_date The end date for the sequence (Date object)
#' @param offset Number of days to look back from end_date
#' @return A list containing week number, year, and a sequence of dates for filtering
#'
date_list <- function(end_date, offset) {
# Input validation
if (!lubridate::is.Date(end_date)) {
end_date <- as.Date(end_date)
if (is.na(end_date)) {
stop("Invalid end_date provided. Expected a Date object or a string convertible to Date.")
}
}
offset <- as.numeric(offset)
if (is.na(offset) || offset < 1) {
stop("Invalid offset provided. Expected a positive number.")
}
# Calculate date range
offset <- offset - 1 # Adjust offset to include end_date
start_date <- end_date - lubridate::days(offset)
# Extract week and year information
week <- lubridate::isoweek(end_date)
year <- lubridate::isoyear(end_date)
# Generate sequence of dates
days_filter <- seq(from = start_date, to = end_date, by = "day")
days_filter <- format(days_filter, "%Y-%m-%d") # Format for consistent filtering
# Log the date range
safe_log(paste("Date range generated from", start_date, "to", end_date))
return(list(
"week" = week,
"year" = year,
"days_filter" = days_filter,
"start_date" = start_date,
"end_date" = end_date
))
}
# NOTE: date_list() is now in 00_common_utils.R - import from there
# This function was duplicated and has been consolidated
#' Create a weekly mosaic from available VRT files
#'

View file

@ -122,6 +122,18 @@ suppressPackageStartupMessages({
# LOAD UTILITY FUNCTIONS FROM SEPARATED MODULES
# ============================================================================
tryCatch({
source(here("r_app", "parameters_project.R"))
}, error = function(e) {
stop("Error loading parameters_project.R: ", e$message)
})
tryCatch({
source(here("r_app", "00_common_utils.R"))
}, error = function(e) {
stop("Error loading 00_common_utils.R: ", e$message)
})
tryCatch({
source(here("r_app", "80_weekly_stats_utils.R"))
}, error = function(e) {

View file

@ -18,22 +18,6 @@ MORAN_THRESHOLD_HIGH <- 0.95 # Above this = very strong clustering (problemati
MORAN_THRESHOLD_MODERATE <- 0.85 # Above this = moderate clustering
MORAN_THRESHOLD_LOW <- 0.7 # Above this = normal field continuity
#' Logging utility for consistent message handling
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Calculate coefficient of variation for uniformity assessment
#' @param values Numeric vector of CI values
#' @return Coefficient of variation (CV) as decimal

View file

@ -711,24 +711,11 @@ get_kpi_dir <- function(project_dir, client_type) {
get_project_storage_path(project_dir, file.path("reports", "kpis", subdir))
}
# Logging functions for clean output
smartcane_log <- function(message, level = "INFO", verbose = TRUE) {
if (!verbose) return(invisible(NULL))
timestamp <- format(Sys.time(), "%Y-%m-%d %H:%M:%S")
prefix <- sprintf("[%s]", level)
cat(sprintf("%s %s\n", prefix, message))
}
smartcane_debug <- function(message, verbose = FALSE) {
if (!verbose && Sys.getenv("SMARTCANE_DEBUG") != "TRUE") {
return(invisible(NULL))
}
smartcane_log(message, level = "DEBUG", verbose = TRUE)
}
smartcane_warn <- function(message) {
smartcane_log(message, level = "WARN", verbose = TRUE)
}
# Logging functions moved to 00_common_utils.R
# - smartcane_log() — Main logging function with level prefix
# - smartcane_debug() — Conditional debug logging
# - smartcane_warn() — Warning wrapper
# Import with: source("r_app/00_common_utils.R")
# ============================================================================
# PHASE 3 & 4: OPTIMIZATION & DOCUMENTATION

View file

@ -4,24 +4,6 @@
# These functions support the creation of maps, charts and report elements
# for the CI_report_dashboard_planet.Rmd document.
#' Safe logging function that works whether log_message exists or not
#'
#' @param message The message to log
#' @param level The log level (default: "INFO")
#' @return NULL (used for side effects)
#'
safe_log <- function(message, level = "INFO") {
if (exists("log_message")) {
log_message(message, level)
} else {
if (level %in% c("ERROR", "WARNING")) {
warning(message)
} else {
message(message)
}
}
}
#' Creates a sub-chunk for use within RMarkdown documents
#'
#' @param g A ggplot object to render in the sub-chunk

View file

@ -41,6 +41,7 @@ RSCRIPT_PATH <- file.path("C:", "Program Files", "R", "R-4.4.3", "bin", "x64", "
# Load client type mapping and centralized paths from parameters_project.R
source("r_app/parameters_project.R")
source("r_app/00_common_utils.R")
paths <- setup_project_directories(project_dir)
client_type <- get_client_type(project_dir)
cat(sprintf("\nProject: %s → Client Type: %s\n", project_dir, client_type))