# Implementation Roadmap: Improving the Harvest Detection Model **Target**: Move from 88% imminent AUC (current) to 95%+ with fewer false positives --- ## Phase 1: Multi-Client Retraining (Est. 1-2 hours active work) ### What to Do Change the model from ESA-only to all-client training. ### Step-by-Step 1. **Open the notebook** at `python_app/harvest_detection_experiments/05_lstm_harvest_detection_pytorch.ipynb` 2. **Go to Section 2** (Data Loading), find this line (~line 49): ```python CLIENT_FILTER = 'esa' # ← CHANGE THIS ``` 3. **Change to:** ```python CLIENT_FILTER = None # Now uses ALL clients ``` 4. **Run Sections 2-12 sequentially** - Section 2: Data loading & cleaning (2-5 min) - Sections 3-6: Feature engineering (1-2 min) - Sections 7-9: Training (5-15 min, depending on GPU) - Sections 10-12: Evaluation & saving (2-3 min) 5. **Compare results** - Before: `harvest_detection_model_esa_esa.pt` (ESA-only) - After: `harvest_detection_model_esa_None.pt` (all-client) - Expected: Imminent AUC improves from 0.8793 → 0.90+, fewer false positives ### Expected Outcome ``` ESA-Only (Current): - Train data: ~2,000 days (2 fields) - Imminent AUC: 0.8793 - Issue: False imminent peaks during seasonal dips All-Client (Expected): - Train data: ~10,000+ days (15+ fields) - Imminent AUC: 0.90-0.92 (5-10% improvement) - Issue: Reduced, but CI-only limitation remains ``` ### Success Criteria - ✅ Model trains without errors - ✅ AUC scores reasonable (imminent > 0.85, detected > 0.95) - ✅ Sequence visualization shows fewer false imminent peaks --- ## Phase 2: Add Temperature Features (Est. 3-4 hours) ### Why Temperature Matters Sugarcane harvest timing correlates with accumulated heat. Different types of CI decline: ``` Normal Ripening (HARVEST-READY): - Temperature: Moderate-warm - Rainfall: Normal - CI: Declining over 2 weeks - → Launch harvest alerts Stress-Induced Decline (AVOID): - Temperature: Very hot or very cold - Rainfall: Low (drought) or excessive - CI: Similar decline pattern - → DON'T trigger alerts (crop stressed, not ready) Model Problem: Can't distinguish! Need temperature + rainfall. ``` ### Step 1: Find Temperature Data **Option A: ECMWF Reanalysis** (Recommended) - Global 0.25° resolution - Free: https://www.ecmwf.int/ - Daily or monthly data available - Takes 1-2 hours to download/process **Option B: Local Weather Stations** - Higher accuracy if available - Must interpolate between stations - May have gaps **Option C: MODIS/Satellite Temperature** - From Landsat, Sentinel-3 - Already integrated with your pipeline? - Same download as CI **Steps**: 1. Download daily average temperature for field locations, 2020-2024 2. Merge with CI data by date/location 3. Format: One row per field, per date with temperature column ### Step 2: Engineer Temperature-Based Features Add to Section 5 (Feature Engineering): ```python def add_temperature_features(df, temp_column='daily_avg_temp'): """ Add harvest-relevant temperature features. New features (4 total): 1. gdd_cumulative: Growing Degree Days (sum of (T-base) where T>10°C) 2. gdd_7d_velocity: 7-day change in accumulated heat 3. temp_anomaly: Current temp vs seasonal average 4. gdd_percentile: Where in season's heat accumulation? """ # 1. Growing Degree Days (GDD) # Base temp for sugarcane: 10°C df['daily_gdd'] = np.maximum(0, df[temp_column] - 10) df['gdd_cumulative'] = df.groupby(['field', 'model'])['daily_gdd'].cumsum() # 2. GDD velocity df['gdd_7d_velocity'] = 0.0 for (field, model), group in df.groupby(['field', 'model']): idx = group.index gdd_values = group['gdd_cumulative'].values for i in range(7, len(gdd_values)): df.loc[idx[i], 'gdd_7d_velocity'] = gdd_values[i] - gdd_values[i-7] # 3. Temperature anomaly (vs 30-day rolling average) df['temp_30d_avg'] = df.groupby('field')[temp_column].transform( lambda x: x.rolling(30, center=True, min_periods=1).mean() ) df['temp_anomaly'] = df[temp_column] - df['temp_30d_avg'] # 4. GDD percentile (within season) df['gdd_percentile'] = 0.0 for (field, model), group in df.groupby(['field', 'model']): idx = group.index gdd_values = group['gdd_cumulative'].values max_gdd = gdd_values[-1] df.loc[idx, 'gdd_percentile'] = gdd_values / (max_gdd + 0.001) return df ``` ### Step 3: Update Feature List In Section 5, change from 7 features to 11: ```python feature_names = [ 'CI', # Original '7d Velocity', # Original '7d Acceleration', # Original '14d MA', # Original '14d Velocity', # Original '7d Min', # Original 'Velocity Magnitude', # Original 'GDD Cumulative', # NEW 'GDD 7d Velocity', # NEW 'Temp Anomaly', # NEW 'GDD Percentile' # NEW ] # Update feature engineering: features = np.column_stack([ ci_smooth, velocity_7d, acceleration_7d, ma14_values, velocity_14d, min_7d, velocity_magnitude, gdd_cumulative, # NEW gdd_7d_velocity, # NEW temp_anomaly, # NEW gdd_percentile # NEW ]) ``` ### Step 4: Update Model Input Size In Section 8, change: ```python # OLD model = HarvestDetectionLSTM(input_size=7, ...) # NEW model = HarvestDetectionLSTM(input_size=11, ...) # 7 + 4 new features ``` ### Step 5: Retrain Run Sections 6-12 again with new data + model size. ### Expected Outcome ``` Before Temperature Features: - Input: 7 features (CI-derived only) - Imminent AUC: 0.90 (all-client baseline) - False imminent rate: 15-20% of predictions After Temperature Features: - Input: 11 features (CI + temperature) - Imminent AUC: 0.93-0.95 (3-5% gain) - False imminent rate: 5-10% (50% reduction!) - Model can distinguish: Stress-decline vs. harvest-ready decline ``` ### Why This Works **Harvest-specific pattern** (with temperature): ``` Imminent Harvest: CI: Declining ↘ GDD: Very high (>3500 total) GDD Velocity: Moderate (still accumulating) Temp Anomaly: Normal → Model learns: "High GDD + declining CI + normal temp" = HARVEST Drought Stress (False Positive Prevention): CI: Declining ↘ (same as above) GDD: Moderate (1500-2000) GDD Velocity: Negative (cooling, winter) Temp Anomaly: Very hot → Model learns: "Low GDD + stress temp" ≠ HARVEST ``` --- ## Phase 3: Test Different Imminent Windows (Est. 1-2 hours) ### Current Window: 3-14 days **Question**: Is this optimal? Let's test: - 5-15 days (shift right, later warning) - 7-14 days (tighten lower bound) - 10-21 days (wider, earlier warning) - 3-7 days (ultra-tight, latest warning) ### How to Test In Section 4, create a loop: ```python windows_to_test = [ (3, 14), # Current (5, 15), (7, 14), (10, 21), (3, 7), ] results = [] for imm_start, imm_end in windows_to_test: # Relabel with new window labeled_seqs = label_harvest_windows_per_season( test_sequences, imminent_start=imm_start, imminent_end=imm_end, detected_start=1, detected_end=21 ) # Evaluate y_true = concat labels from labeled_seqs y_pred = get_model_predictions(test_sequences) auc = roc_auc_score(y_true, y_pred) fp_rate = false_positive_rate(y_true, y_pred) results.append({ 'window': f"{imm_start}-{imm_end}", 'auc': auc, 'fp_rate': fp_rate, }) # Print results results_df = pd.DataFrame(results).sort_values('auc', ascending=False) print(results_df) ``` ### Expected Outcome ``` Window AUC FP_Rate 0 7-14 0.920 0.08 ← RECOMMENDED (best balance) 1 5-15 0.918 0.12 2 3-14 0.915 0.15 ← Current 3 10-21 0.910 0.05 ← Too late 4 3-7 0.905 0.20 ← Too early ``` Choose the window with highest AUC and acceptable false positive rate. --- ## Phase 4: Operational Metrics (Est. 2 hours) ### What We Need For deployment, understand: 1. **Lead time**: How many days before harvest do we warn? 2. **False positive rate**: How often do we cry wolf? 3. **Miss rate**: How often do we miss the harvest window? 4. **Per-field performance**: Do some fields have worse predictions? ### Code to Add ```python def compute_operational_metrics(model, test_sequences_labeled, test_features): """ Compute farmer-relevant metrics. """ lead_times = [] false_positives = [] misses = [] field_performance = {} for seq_idx, seq_dict in enumerate(test_sequences_labeled): field = seq_dict['field'] data = seq_dict['data'] # Get predictions X_features = test_features[seq_idx] with torch.no_grad(): imminent_pred, _ = model(torch.from_numpy(X_features[np.newaxis, :, :])) imminent_pred = imminent_pred[0].cpu().numpy() # Find harvest boundary harvest_idx = np.where(data['harvest_boundary'] == 1)[0] if len(harvest_idx) == 0: continue harvest_idx = harvest_idx[0] # Find when model triggered (imminent > 0.5) triggered_indices = np.where(imminent_pred > 0.5)[0] if len(triggered_indices) > 0: # Last trigger before harvest triggers_before = triggered_indices[triggered_indices < harvest_idx] if len(triggers_before) > 0: last_trigger = triggers_before[-1] lead_time = harvest_idx - last_trigger lead_times.append(lead_time) # Check if within optimal window (e.g., 3-14 days) if 3 <= lead_time <= 14: if field not in field_performance: field_performance[field] = {'correct': 0, 'total': 0} field_performance[field]['correct'] += 1 else: # Triggered after harvest = false positive false_positives.append(len(triggered_indices)) else: # No trigger at all = miss misses.append(seq_idx) if field not in field_performance: field_performance[field] = {'correct': 0, 'total': 0} field_performance[field]['total'] += 1 # Compute statistics print("\n" + "="*60) print("OPERATIONAL METRICS") print("="*60) print(f"\nLead Time Analysis:") print(f" Mean: {np.mean(lead_times):.1f} days") print(f" Std: {np.std(lead_times):.1f} days") print(f" Min: {np.min(lead_times):.0f} days") print(f" Max: {np.max(lead_times):.0f} days") print(f" Optimal (3-14d): {sum((3<=x<=14 for x in lead_times))/len(lead_times)*100:.1f}%") print(f"\nError Analysis:") print(f" False positives (wrong timing): {len(false_positives)} sequences") print(f" Misses (no warning): {len(misses)} sequences") print(f" Accuracy: {len(lead_times)/(len(lead_times)+len(false_positives)+len(misses))*100:.1f}%") print(f"\nPer-Field Performance:") for field, perf in sorted(field_performance.items()): accuracy = perf['correct'] / perf['total'] * 100 print(f" {field:15s}: {accuracy:5.1f}% correct") return { 'lead_times': lead_times, 'false_positives': len(false_positives), 'misses': len(misses), 'field_performance': field_performance } # Run it metrics = compute_operational_metrics(model, test_sequences_labeled, X_test_features) ``` ### What to Look For **Good performance**: ``` Mean lead time: 7-10 days ✅ (gives farmer time to prepare) Optimal timing: >80% ✅ (most warnings in 3-14d window) False positives: <5% ✅ (rarely cry wolf) Misses: <10% ✅ (rarely miss harvest) ``` **Poor performance**: ``` Mean lead time: 2 days ❌ (too late) Optimal timing: <60% ❌ (inconsistent) False positives: >20% ❌ (farmers lose trust) Misses: >20% ❌ (unreliable) ``` --- ## Phase 5: Rainfall Features (Optional, High Value) (Est. 3-4 hours) ### Similar to Temperature Add rainfall + soil moisture features: ```python def add_rainfall_features(df, rainfall_column='daily_rainfall_mm'): """ Add drought/moisture stress features. New features (3 total): 1. rainfall_7d: Total rain in last 7 days 2. rainfall_deficit: Deficit vs normal for this time of year 3. drought_stress_index: Combination metric """ # 1. 7-day rainfall df['rainfall_7d'] = df.groupby('field')[rainfall_column].transform( lambda x: x.rolling(7, min_periods=1).sum() ) # 2. Seasonal rainfall average df['seasonal_rain_avg'] = df.groupby('field')[rainfall_column].transform( lambda x: x.rolling(30, center=True, min_periods=1).mean() ) df['rainfall_deficit'] = df['seasonal_rain_avg'] - df[rainfall_column] # 3. Drought stress index # (0 = not stressed, 1 = severe drought) df['drought_stress'] = np.minimum( 1.0, df['rainfall_deficit'] / (df['seasonal_rain_avg'] + 0.1) ) return df ``` **Why this helps**: - Drought accelerates maturity (early harvest) - Excessive rain delays harvest - Model can distinguish "ready to harvest" from "crop stressed" --- ## Summary: Quick Implementation Checklist ### Week 1: Foundation - [ ] Phase 1: Retrain on all clients - [ ] Change `CLIENT_FILTER = None` - [ ] Run full pipeline - [ ] Compare metrics ### Week 2: Core Enhancement - [ ] Phase 2: Add temperature features - [ ] Find/download temperature data - [ ] Merge with CI data - [ ] Update feature engineering (7 → 11 features) - [ ] Retrain model - [ ] Compare metrics (expect 3-5% AUC gain) ### Week 3: Optimization & Testing - [ ] Phase 3: Test imminent windows - [ ] Run sensitivity analysis - [ ] Choose optimal window - [ ] Retrain with new window - [ ] Phase 4: Operational metrics - [ ] Compute lead times - [ ] Measure false positive rate - [ ] Per-field performance analysis ### Week 4: Optional Enhancement - [ ] Phase 5: Add rainfall features (if data available) - [ ] Download precipitation data - [ ] Add drought stress features - [ ] Retrain - [ ] Measure improvement --- ## Expected Performance Trajectory ``` Current (ESA-only, CI-only): Imminent AUC: 0.8793 False positive rate: ~15% Phase 1 (All clients): Imminent AUC: 0.90-0.92 (+2-3%) False positive rate: ~12% Phase 2 (Add temperature): Imminent AUC: 0.93-0.95 (+3-5% from Phase 1) False positive rate: ~5% Phase 3 (Optimize window): Imminent AUC: 0.95-0.96 (+1% from fine-tuning) False positive rate: ~3% Phase 4 (Operational tuning): Imminent AUC: 0.95-0.96 (stable) Lead time: 7-10 days Operational readiness: 95% Phase 5 (Add rainfall): Imminent AUC: 0.96-0.97 (+1% for drought years) False positive rate: ~2% Operational readiness: 99% ``` --- ## Key Takeaways 1. **Multi-client retraining is the biggest quick win** (5-10% gain with minimal effort) 2. **Temperature features are essential** for distinguishing harvest-ready from stress 3. **Imminent window tuning** can reduce false positives by 30-50% 4. **Operational metrics** matter more than academic metrics (lead time > AUC) 5. **Rainfall features** are optional but valuable for drought-prone regions --- ## Next Steps 1. **This week**: Run Phase 1 (all-client retrain) 2. **Analyze results**: Compare on same fields, measure improvements 3. **Plan Phase 2**: Identify temperature data source 4. **Schedule Phase 2**: Allocate 3-4 hours for implementation 5. **Document findings**: Track AUC, false positive rate, lead time for each phase Good luck! This is a solid model with clear paths to improvement. 🚀