# SmartCane Pipeline Code Review ## Efficiency, Cleanup, and Architecture Analysis **Date**: January 29, 2026 **Scope**: `run_full_pipeline.R` + all called scripts (10, 20, 21, 30, 31, 40, 80, 90, 91) + utility files **Status**: Comprehensive review completed --- ## EXECUTIVE SUMMARY Your pipeline is **well-structured and intentional**, but has accumulated significant technical debt through development iterations. The main issues are: 1. **🔴 HIGH IMPACT**: **3 separate mosaic mode detection functions** doing identical work 2. **🔴 HIGH IMPACT**: **Week/year calculations duplicated 10+ times** across 6+ files 3. **🟡 MEDIUM IMPACT**: **40+ debug statements** cluttering output 4. **🟡 MEDIUM IMPACT**: **File existence checks repeated** in multiple places (especially KPI checks) 5. **🟢 LOW IMPACT**: Minor redundancy in command construction, but manageable **Estimated cleanup effort**: 2-3 hours for core refactoring; significant code quality gains. **Workflow clarity issue**: The split between `merged_tif` vs `merged_tif_8b` and `weekly_mosaic` vs `weekly_tile_max` is **not clearly documented**. This should be clarified. --- ## 1. DUPLICATED FUNCTIONS & LOGIC ### 1.1 Mosaic Mode Detection (CRITICAL REDUNDANCY) **Problem**: Three identical implementations of `detect_mosaic_mode()`: | Location | Function Name | Lines | Issue | |----------|---------------|-------|-------| | `run_full_pipeline.R` | `detect_mosaic_mode_early()` | ~20 lines | Detects tiled vs single-file | | `run_full_pipeline.R` | `detect_mosaic_mode_simple()` | ~20 lines | Detects tiled vs single-file (duplicate) | | `parameters_project.R` | `detect_mosaic_mode()` | ~30 lines | Detects tiled vs single-file (different signature) | **Impact**: If you change the detection logic, you must update 3 places. Bug risk is high. **Solution**: Create **single canonical function in `parameters_project.R`**: ```r # SINGLE SOURCE OF TRUTH detect_mosaic_mode <- function(project_dir) { weekly_tile_max <- file.path("laravel_app", "storage", "app", project_dir, "weekly_tile_max") if (dir.exists(weekly_tile_max)) { subfolders <- list.dirs(weekly_tile_max, full.names = FALSE, recursive = FALSE) if (length(grep("^\\d+x\\d+$", subfolders)) > 0) return("tiled") } weekly_mosaic <- file.path("laravel_app", "storage", "app", project_dir, "weekly_mosaic") if (dir.exists(weekly_mosaic) && length(list.files(weekly_mosaic, pattern = "^week_.*\\.tif$")) > 0) { return("single-file") } return("unknown") } ``` Then replace all three calls in `run_full_pipeline.R` with this single function. --- ### 1.2 Week/Year Calculations (CRITICAL REDUNDANCY) **Problem**: The pattern `week_num <- as.numeric(format(..., "%V"))` + `year_num <- as.numeric(format(..., "%G"))` appears **13+ times** across multiple files. **Locations**: - `run_full_pipeline.R`: Lines 82, 126-127, 229-230, 630, 793-794 (5 times) - `80_calculate_kpis.R`: Lines 323-324 (1 time) - `80_weekly_stats_utils.R`: Lines 829-830 (1 time) - `kpi_utils.R`: Line 45 (1 time) - `80_kpi_utils.R`: Lines 177-178 (1 time) - Plus inline in sprintf statements: ~10+ additional times **Impact**: - High maintenance burden - Risk of inconsistency (%V vs %Y confusion noted at line 82 in `run_full_pipeline.R`) - Code verbosity **Solution**: Create **utility function in `parameters_project.R`**: ```r get_iso_week_year <- function(date) { list( week = as.numeric(format(date, "%V")), year = as.numeric(format(date, "%G")) # ISO year, not calendar year ) } # Usage: wwy <- get_iso_week_year(end_date) cat(sprintf("Week %02d/%d\n", wwy$week, wwy$year)) ``` **Also add convenience function**: ```r format_week_year <- function(date, separator = "_") { wwy <- get_iso_week_year(date) sprintf("week_%02d%s%d", wwy$week, separator, wwy$year) } # Usage: format_week_year(end_date) # "week_02_2026" ``` --- ### 1.3 File Path Construction (MEDIUM REDUNDANCY) **Problem**: Repeated patterns like: ```r file.path("laravel_app", "storage", "app", project_dir, "weekly_mosaic") file.path("laravel_app", "storage", "app", project_dir, "reports", "kpis", kpi_subdir) ``` **Solution**: Centralize in `parameters_project.R`: ```r # Project-agnostic path builders get_project_storage_path <- function(project_dir, subdir = NULL) { base <- file.path("laravel_app", "storage", "app", project_dir) if (!is.null(subdir)) file.path(base, subdir) else base } get_mosaic_dir <- function(project_dir, mosaic_mode = "auto") { if (mosaic_mode == "auto") mosaic_mode <- detect_mosaic_mode(project_dir) if (mosaic_mode == "tiled") { get_project_storage_path(project_dir, "weekly_tile_max/5x5") } else { get_project_storage_path(project_dir, "weekly_mosaic") } } get_kpi_dir <- function(project_dir, client_type) { subdir <- if (client_type == "agronomic_support") "field_level" else "field_analysis" get_project_storage_path(project_dir, file.path("reports", "kpis", subdir)) } ``` --- ## 2. DEBUG STATEMENTS & LOGGING CLUTTER ### 2.1 Excessive Debug Output The pipeline prints **40+ debug statements** that pollute the terminal output. Examples: **In `run_full_pipeline.R`**: ```r Line 82: cat(sprintf(" Running week: %02d / %d\n", ...)) # Note: %d (calendar year) should be %G Line 218: cat(sprintf("[KPI_DIR_CREATED] Created directory: %s\n", ...)) Line 223: cat(sprintf("[KPI_DIR_EXISTS] %s\n", ...)) Line 224: cat(sprintf("[KPI_DEBUG] Total files in directory: %d\n", ...)) Line 225: cat(sprintf("[KPI_DEBUG] Sample files: %s\n", ...)) Line 240: cat(sprintf("[KPI_DEBUG_W%02d_%d] Pattern: '%s' | Found: %d files\n", ...)) Line 630: cat("DEBUG: Running command:", cmd, "\n") Line 630 in Script 31 execution - prints full conda command ``` **In `80_calculate_kpis.R`**: ``` Line 323: message(paste("Calculating statistics for all fields - Week", week_num, year)) Line 417: # Plus many more ... ``` **Impact**: - Makes output hard to scan for real issues - Test developers skip important messages - Production logs become noise **Solution**: Replace with **structured logging** (3 levels): ```r # Add to parameters_project.R smartcane_log <- function(message, level = "INFO") { timestamp <- format(Sys.time(), "%Y-%m-%d %H:%M:%S") prefix <- sprintf("[%s] %s", level, timestamp) cat(sprintf("%s | %s\n", prefix, message)) } smartcane_debug <- function(message) { if (Sys.getenv("SMARTCANE_DEBUG") == "TRUE") { smartcane_log(message, level = "DEBUG") } } smartcane_warn <- function(message) { smartcane_log(message, level = "WARN") } ``` **Usage**: ```r # Keep important messages smartcane_log(sprintf("Downloaded %d dates, %d failed", download_count, download_failed)) # Hide debug clutter (only show if DEBUG=TRUE) smartcane_debug(sprintf("KPI directory exists: %s", kpi_dir)) # Warnings stay visible smartcane_warn("Some downloads failed, but continuing pipeline") ``` --- ### 2.2 Redundant Status Checks in KPI Section **Lines 218-270 in `run_full_pipeline.R`**: The KPI requirement check has **deeply nested debug statements**. ```r if (dir.exists(kpi_dir)) { cat(sprintf("[KPI_DIR_EXISTS] %s\n", kpi_dir)) all_kpi_files <- list.files(kpi_dir) cat(sprintf("[KPI_DEBUG] Total files in directory: %d\n", length(all_kpi_files))) if (length(all_kpi_files) > 0) { cat(sprintf("[KPI_DEBUG] Sample files: %s\n", ...)) } } else { cat(sprintf("[KPI_DIR_MISSING] Directory does not exist: %s\n", kpi_dir)) } ``` **Solution**: Simplify to: ```r if (!dir.exists(kpi_dir)) { dir.create(kpi_dir, recursive = TRUE, showWarnings = FALSE) } all_kpi_files <- list.files(kpi_dir) smartcane_debug(sprintf("KPI directory: %d files found", length(all_kpi_files))) ``` --- ## 3. DOUBLE CALCULATIONS & INEFFICIENCIES ### 3.1 KPI Existence Check (Calculated Twice) **Problem**: KPI existence is checked **twice** in `run_full_pipeline.R`: 1. **First check (Lines 228-270)**: Initial KPI requirement check that calculates `kpis_needed` dataframe 2. **Second check (Lines 786-810)**: Verification after Script 80 runs (almost identical logic) Both loops do: ```r for (weeks_back in 0:(reporting_weeks_needed - 1)) { check_date <- end_date - (weeks_back * 7) week_num <- as.numeric(format(check_date, "%V")) year_num <- as.numeric(format(check_date, "%G")) week_pattern <- sprintf("week%02d_%d", week_num, year_num) kpi_files_this_week <- list.files(kpi_dir, pattern = week_pattern) has_kpis <- length(kpi_files_this_week) > 0 # ... same logic again } ``` **Impact**: Slower pipeline execution, code duplication **Solution**: Create **reusable function in utility file**: ```r check_kpi_completeness <- function(project_dir, client_type, end_date, reporting_weeks_needed) { kpi_dir <- get_kpi_dir(project_dir, client_type) kpis_needed <- data.frame() for (weeks_back in 0:(reporting_weeks_needed - 1)) { check_date <- end_date - (weeks_back * 7) wwy <- get_iso_week_year(check_date) week_pattern <- sprintf("week%02d_%d", wwy$week, wwy$year) has_kpis <- any(grepl(week_pattern, list.files(kpi_dir))) kpis_needed <- rbind(kpis_needed, data.frame( week = wwy$week, year = wwy$year, date = check_date, has_kpis = has_kpis )) } return(list( kpis_df = kpis_needed, missing_count = sum(!kpis_needed$has_kpis), all_complete = all(kpis_needed$has_kpis) )) } # Then in run_full_pipeline.R: initial_kpi_check <- check_kpi_completeness(project_dir, client_type, end_date, reporting_weeks_needed) # ... after Script 80 runs: final_kpi_check <- check_kpi_completeness(project_dir, client_type, end_date, reporting_weeks_needed) if (final_kpi_check$all_complete) { smartcane_log("✓ All KPIs available") } ``` --- ### 3.2 Mosaic Mode Detection (Called 3+ Times per Run) **Current code**: - Line 99-117: `detect_mosaic_mode_early()` called once - Line 301-324: `detect_mosaic_mode_simple()` called again - Result: **Same detection logic runs twice unnecessarily** **Solution**: Call once, store result: ```r mosaic_mode <- detect_mosaic_mode(project_dir) # Once at top # Then reuse throughout: if (mosaic_mode == "tiled") { ... } else if (mosaic_mode == "single-file") { ... } ``` --- ### 3.3 Missing Weeks Calculation Inefficiency **Lines 126-170**: The loop builds `weeks_needed` dataframe, then **immediately** iterates again to find which ones are missing. **Current code**: ```r # First: build all weeks weeks_needed <- data.frame() for (weeks_back in 0:(reporting_weeks_needed - 1)) { # ... build weeks_needed } # Then: check which are missing (loop again) missing_weeks <- data.frame() for (i in 1:nrow(weeks_needed)) { # ... check each week } ``` **Solution**: Combine into **single loop**: ```r weeks_needed <- data.frame() missing_weeks <- data.frame() earliest_missing_date <- end_date for (weeks_back in 0:(reporting_weeks_needed - 1)) { check_date <- end_date - (weeks_back * 7) wwy <- get_iso_week_year(check_date) # Add to weeks_needed weeks_needed <- rbind(weeks_needed, data.frame( week = wwy$week, year = wwy$year, date = check_date )) # Check if missing, add to missing_weeks if so week_pattern <- sprintf("week_%02d_%d", wwy$week, wwy$year) mosaic_dir <- get_mosaic_dir(project_dir, mosaic_mode) if (length(list.files(mosaic_dir, pattern = week_pattern)) == 0) { missing_weeks <- rbind(missing_weeks, data.frame( week = wwy$week, year = wwy$year, week_end_date = check_date )) if (check_date - 6 < earliest_missing_date) { earliest_missing_date <- check_date - 6 } } } ``` --- ### 3.4 Data Source Detection Logic **Lines 58-84**: The `data_source_used` detection is overly complex: ```r data_source_used <- "merged_tif_8b" # Default if (dir.exists(merged_tif_path)) { tif_files <- list.files(merged_tif_path, pattern = "\\.tif$") if (length(tif_files) > 0) { data_source_used <- "merged_tif" # ... } else if (dir.exists(merged_tif_8b_path)) { tif_files_8b <- list.files(merged_tif_8b_path, pattern = "\\.tif$") # ... } } else if (dir.exists(merged_tif_8b_path)) { # ... } ``` **Issues**: - Multiple nested conditions doing the same check - `tif_files` and `tif_files_8b` are listed but only counts checked (not used later) - Logic could be cleaner **Solution**: Create utility function: ```r detect_data_source <- function(project_dir, preferred = "auto") { storage_dir <- get_project_storage_path(project_dir) for (source in c("merged_tif", "merged_tif_8b")) { source_dir <- file.path(storage_dir, source) if (dir.exists(source_dir)) { tifs <- list.files(source_dir, pattern = "\\.tif$") if (length(tifs) > 0) return(source) } } smartcane_warn("No data source found - defaulting to merged_tif_8b") return("merged_tif_8b") } ``` --- ## 4. WORKFLOW CLARITY ISSUES ### 4.1 TIFF Data Format Confusion **Problem**: Why are there TWO different TIFF folders? - `merged_tif`: 4-band data (RGB + NIR) - `merged_tif_8b`: 8-band data (appears to include UDM cloud masking from Planet) **Currently in code**: ```r data_source <- if (project_dir == "angata") "merged_tif_8b" else "merged_tif" ``` **Issues**: - Hard-coded per project, not based on what's actually available - Not documented **why** angata uses 8-band - Unclear what the 8-band data adds (cloud masking? extra bands?) - Scripts handle both, but it's not clear when to use which **Recommendation**: 1. **Document in `parameters_project.R`** what each data source contains: ```r DATA_SOURCE_FORMATS <- list( "merged_tif" = list( bands = 4, description = "4-band PlanetScope: Red, Green, Blue, NIR", projects = c("aura", "chemba", "xinavane"), note = "Standard format from Planet API" ), "merged_tif_8b" = list( bands = 8, description = "8-band PlanetScope with UDM: RGB+NIR + 4-band cloud mask", projects = c("angata"), note = "Enhanced with cloud confidence from UDM2 (Unusable Data Mask)" ) ) ``` 2. **Update hard-coded assignment** to be data-driven: ```r # OLD: data_source <- if (project_dir == "angata") "merged_tif_8b" else "merged_tif" # NEW: detect what's actually available data_source <- detect_data_source(project_dir) ``` --- ### 4.2 Mosaic Storage Format Confusion **Problem**: Why are there TWO different mosaic storage styles? - `weekly_mosaic/`: Single TIF file per week (monolithic) - `weekly_tile_max/5x5/`: Tiled TIFFs per week (25+ files per week) **Currently in code**: - Detected automatically via `detect_mosaic_mode()` - But **no documentation** on when/why each is used **Recommendation**: 1. **Document the trade-offs in `parameters_project.R`**: ```r MOSAIC_MODES <- list( "single-file" = list( description = "One TIF per week", storage_path = "weekly_mosaic/", files_per_week = 1, pros = c("Simpler file management", "Easier to load full mosaic"), cons = c("Slower for field-specific analysis", "Large file I/O"), suitable_for = c("agronomic_support", "dashboard visualization") ), "tiled" = list( description = "5×5 grid of tiles per week", storage_path = "weekly_tile_max/5x5/", files_per_week = 25, pros = c("Parallel field processing", "Faster per-field queries", "Scalable to 1000+ fields"), cons = c("More file management", "Requires tile_grid metadata"), suitable_for = c("cane_supply", "large-scale operations") ) ) ``` 2. **Document why angata uses tiled, aura uses single-file**: - Is it a function of field count? (Angata = cane_supply, large fields → tiled) - Is it historical? (Legacy decision?) - Should new projects choose based on client type? --- ### 4.3 Client Type Mapping Clarity **Current structure** in `parameters_project.R`: ```r CLIENT_TYPE_MAP <- list( "angata" = "cane_supply", "aura" = "agronomic_support", "chemba" = "cane_supply", "xinavane" = "cane_supply", "esa" = "cane_supply" ) ``` **Issues**: - Not clear **why** aura is agronomic_support while angata/chemba are cane_supply - No documentation of what each client type needs - Scripts branch heavily on `skip_cane_supply_only` logic **Recommendation**: Add metadata to explain the distinction: ```r CLIENT_TYPES <- list( "cane_supply" = list( description = "Sugar mill supply chain optimization", requires_harvest_prediction = TRUE, # Script 31 requires_phase_assignment = TRUE, # Based on planting date per_field_detail = TRUE, # Script 91 Excel report data_sources = c("merged_tif", "merged_tif_8b"), mosaic_mode = "tiled", projects = c("angata", "chemba", "xinavane", "esa") ), "agronomic_support" = list( description = "Farm-level decision support for agronomists", requires_harvest_prediction = FALSE, requires_phase_assignment = FALSE, per_field_detail = FALSE, farm_level_kpis = TRUE, # Script 90 Word report data_sources = c("merged_tif"), mosaic_mode = "single-file", projects = c("aura") ) ) ``` --- ## 5. COMMAND CONSTRUCTION REDUNDANCY ### 5.1 Rscript Path Repetition **Problem**: The Rscript path is repeated 5 times: ```r Line 519: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"' Line 676: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"' Line 685: '"C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe"' ``` **Solution**: Define once in `parameters_project.R`: ```r RSCRIPT_PATH <- "C:\\Program Files\\R\\R-4.4.3\\bin\\x64\\Rscript.exe" # Usage: cmd <- sprintf('"%s" --vanilla r_app/20_ci_extraction.R ...', RSCRIPT_PATH) ``` --- ## 6. SPECIFIC LINE-BY-LINE ISSUES ### 6.1 Line 82 Bug: Wrong Format Code ```r cat(sprintf(" Running week: %02d / %d\n", as.numeric(format(end_date, "%V")), as.numeric(format(end_date, "%Y")))) # ❌ Should be %G, not %Y ``` **Issue**: Uses calendar year `%Y` instead of ISO week year `%G`. On dates like 2025-12-30 (week 1 of 2026), this will print "Week 01 / 2025" (confusing). **Fix**: ```r wwy <- get_iso_week_year(end_date) cat(sprintf(" Running week: %02d / %d\n", wwy$week, wwy$year)) ``` --- ### 6.2 Line 630 Debug Statement ```r cmd <- sprintf('conda run -n pytorch_gpu python python_app/31_harvest_imminent_weekly.py %s', project_dir) cat("DEBUG: Running command:", cmd, "\n") # ❌ Prints full conda command ``` **Solution**: Use `smartcane_debug()` function: ```r cmd <- sprintf('conda run -n pytorch_gpu python python_app/31_harvest_imminent_weekly.py %s', project_dir) smartcane_debug(sprintf("Running Python 31: %s", cmd)) ``` --- ### 6.3 Lines 719-723: Verbose Script 31 Verification ```r # Check for THIS WEEK's specific file current_week <- as.numeric(format(end_date, "%V")) current_year <- as.numeric(format(end_date, "%Y")) expected_file <- file.path(...) ``` **Issue**: Calculates week twice (already done earlier). Also uses `%Y` (should be `%G`). **Solution**: Reuse earlier `wwy` calculation or create helper. --- ## 7. REFACTORING ROADMAP ### Phase 1: Foundation (1 hour) - [ ] Consolidate `detect_mosaic_mode()` into single function in `parameters_project.R` - [ ] Create `get_iso_week_year()` and `format_week_year()` utilities - [ ] Create `get_project_storage_path()`, `get_mosaic_dir()`, `get_kpi_dir()` helpers - [ ] Add logging functions (`smartcane_log()`, `smartcane_debug()`, `smartcane_warn()`) ### Phase 2: Deduplication (1 hour) - [ ] Replace all 13+ week_num/year_num calculations with `get_iso_week_year()` - [ ] Replace all 3 `detect_mosaic_mode_*()` calls with single function - [ ] Combine duplicate KPI checks into `check_kpi_completeness()` function - [ ] Fix line 82 and 630 format bugs ### Phase 3: Cleanup (1 hour) - [ ] Remove all debug statements (40+), replace with `smartcane_debug()` - [ ] Simplify nested conditions in data_source detection - [ ] Combine missing weeks detection into single loop - [ ] Extract Rscript path to constant ### Phase 4: Documentation (30 min) - [ ] Add comments explaining `merged_tif` vs `merged_tif_8b` trade-offs - [ ] Document `single-file` vs `tiled` mosaic modes and when to use each - [ ] Clarify client type mapping in `CLIENT_TYPE_MAP` - [ ] Add inline comments for non-obvious logic --- ## 8. ARCHITECTURE & WORKFLOW RECOMMENDATIONS ### 8.1 Clear Data Flow Diagram Add to `r_app/system_architecture/system_architecture.md`: ``` INPUT SOURCES: ├── Planet API 4-band or 8-band imagery ├── Field boundaries (pivot.geojson) └── Harvest data (harvest.xlsx, optional for cane_supply) STORAGE TIERS: ├── Tier 1: Raw data (merged_tif/ or merged_tif_8b/) ├── Tier 2: Daily tiles (daily_tiles_split/{grid_size}/{dates}/) ├── Tier 3: Extracted CI (Data/extracted_ci/daily_vals/*.rds) ├── Tier 4: Weekly mosaics (weekly_mosaic/ OR weekly_tile_max/5x5/) └── Tier 5: KPI outputs (reports/kpis/{field_level|field_analysis}/) DECISION POINTS: └─ Client type (cane_supply vs agronomic_support) ├─ Drives script selection (Scripts 21, 22, 23, 31, 90/91) ├─ Drives data source (merged_tif_8b for cane_supply, merged_tif for agronomic) ├─ Drives mosaic mode (tiled for cane_supply, single-file for agronomic) └─ Drives KPI subdirectory (field_analysis vs field_level) ``` ### 8.2 .sh Scripts Alignment You mention `.sh` scripts in the online environment. If they're **not calling the R pipeline**, there's a **split responsibility** issue: **Question**: Are the `.sh` scripts: - (A) Independent duplicates of the R pipeline logic? (BAD - maintenance nightmare) - (B) Wrappers calling the R pipeline? (GOOD - single source of truth) - (C) Different workflow for online vs local? (RED FLAG - they diverge) **Recommendation**: If using `.sh` for production, ensure they **call the same R scripts** (`run_full_pipeline.R`). Example: ```bash #!/bin/bash # Wrapper that ensures R pipeline is called cd /path/to/smartcane & "C:\Program Files\R\R-4.4.3\bin\x64\Rscript.exe" r_app/run_full_pipeline.R ``` --- ## 9. SUMMARY TABLE: Issues by Severity | Issue | Type | Impact | Effort | Priority | |-------|------|--------|--------|----------| | 3 mosaic detection functions | Duplication | HIGH | 30 min | P0 | | 13+ week/year calculations | Duplication | HIGH | 1 hour | P0 | | 40+ debug statements | Clutter | MEDIUM | 1 hour | P1 | | KPI check run twice | Inefficiency | LOW | 30 min | P2 | | Line 82: %Y should be %G | Bug | LOW | 5 min | P2 | | Data source confusion | Documentation | MEDIUM | 30 min | P1 | | Mosaic mode confusion | Documentation | MEDIUM | 30 min | P1 | | Client type mapping | Documentation | MEDIUM | 30 min | P1 | | Data source detection complexity | Code style | LOW | 15 min | P3 | --- ## 10. RECOMMENDED NEXT STEPS 1. **Review this report** with your team to align on priorities 2. **Create Linear issues** for each phase of refactoring 3. **Start with Phase 1** (foundation utilities) - builds confidence for Phase 2 4. **Test thoroughly** after each phase - the pipeline is complex and easy to break 5. **Update `.sh` scripts** if they duplicate R logic 6. **Document data flow** in `system_architecture/system_architecture.md` --- ## Questions for Clarification Before implementing, please clarify: 1. **Data source split**: Why does angata use `merged_tif_8b` (8-band with cloud mask) while aura uses `merged_tif` (4-band)? Is this: - A function of client need (cane_supply requires cloud masking)? - Historical (legacy decision for angata)? - Should new projects choose based on availability? 2. **Mosaic mode split**: Why tiled for angata but single-file for aura? Should this be: - Hard-coded per project? - Based on field count/client type? - Auto-detected from first run? 3. **Production vs local**: Are the `.sh` scripts in the online environment: - Calling this same R pipeline? - Duplicating logic independently? - A different workflow entirely? 4. **Client type growth**: Are there other client types planned beyond `cane_supply` and `agronomic_support`? (e.g., extension_service?) --- **Report prepared**: January 29, 2026 **Total code reviewed**: ~2,500 lines across 10 files **Estimated refactoring time**: 3-4 hours **Estimated maintenance savings**: 5-10 hours/month (fewer bugs, easier updates)