# Implementation Roadmap: Improving the Harvest Detection Model

**Target**: Move from 88% imminent AUC (current) to 95%+ with fewer false positives

---

## Phase 1: Multi-Client Retraining (Est. 1-2 hours active work)

### What to Do
Change the model from ESA-only to all-client training.

### Step-by-Step

1. **Open the notebook** at `python_app/harvest_detection_experiments/05_lstm_harvest_detection_pytorch.ipynb`

2. **Go to Section 2** (Data Loading), find this line (~line 49):
   ```python
   CLIENT_FILTER = 'esa'  # ← CHANGE THIS
   ```

3. **Change to:**
   ```python
   CLIENT_FILTER = None  # Now uses ALL clients
   ```

4. **Run Sections 2-12 sequentially**
   - Section 2: Data loading & cleaning (2-5 min)
   - Sections 3-6: Feature engineering (1-2 min)
   - Sections 7-9: Training (5-15 min, depending on GPU)
   - Sections 10-12: Evaluation & saving (2-3 min)

5. **Compare results**
   - Before: `harvest_detection_model_esa_esa.pt` (ESA-only)
   - After: `harvest_detection_model_esa_None.pt` (all-client)
   - Expected: Imminent AUC improves from 0.8793 → 0.90+, fewer false positives

### Expected Outcome
```
ESA-Only (Current):
- Train data: ~2,000 days (2 fields)
- Imminent AUC: 0.8793
- Issue: False imminent peaks during seasonal dips

All-Client (Expected):
- Train data: ~10,000+ days (15+ fields)
- Imminent AUC: 0.90-0.92 (5-10% improvement)
- Issue: Reduced, but CI-only limitation remains
```

### Success Criteria
- ✅ Model trains without errors
- ✅ AUC scores reasonable (imminent > 0.85, detected > 0.95)
- ✅ Sequence visualization shows fewer false imminent peaks

---

## Phase 2: Add Temperature Features (Est. 3-4 hours)

### Why Temperature Matters

Sugarcane harvest timing correlates with accumulated heat. Different types of CI decline:

```
Normal Ripening (HARVEST-READY):
- Temperature: Moderate-warm
- Rainfall: Normal
- CI: Declining over 2 weeks
- → Launch harvest alerts

Stress-Induced Decline (AVOID):
- Temperature: Very hot or very cold
- Rainfall: Low (drought) or excessive
- CI: Similar decline pattern
- → DON'T trigger alerts (crop stressed, not ready)

Model Problem: Can't distinguish! Need temperature + rainfall.
```

### Step 1: Find Temperature Data

**Option A: ECMWF Reanalysis** (Recommended)
- Global 0.25° resolution
- Free: https://www.ecmwf.int/
- Daily or monthly data available
- Takes 1-2 hours to download/process

**Option B: Local Weather Stations**
- Higher accuracy if available
- Must interpolate between stations
- May have gaps

**Option C: MODIS/Satellite Temperature**
- From Landsat, Sentinel-3
- Already integrated with your pipeline?
- Same download as CI

**Steps**:
1. Download daily average temperature for field locations, 2020-2024
2. Merge with CI data by date/location
3. Format: One row per field, per date with temperature column

### Step 2: Engineer Temperature-Based Features

Add to Section 5 (Feature Engineering):

```python
def add_temperature_features(df, temp_column='daily_avg_temp'):
    """
    Add harvest-relevant temperature features.
    
    New features (4 total):
    1. gdd_cumulative: Growing Degree Days (sum of (T-base) where T>10°C)
    2. gdd_7d_velocity: 7-day change in accumulated heat
    3. temp_anomaly: Current temp vs seasonal average
    4. gdd_percentile: Where in season's heat accumulation?
    """
    
    # 1. Growing Degree Days (GDD)
    # Base temp for sugarcane: 10°C
    df['daily_gdd'] = np.maximum(0, df[temp_column] - 10)
    df['gdd_cumulative'] = df.groupby(['field', 'model'])['daily_gdd'].cumsum()
    
    # 2. GDD velocity
    df['gdd_7d_velocity'] = 0.0
    for (field, model), group in df.groupby(['field', 'model']):
        idx = group.index
        gdd_values = group['gdd_cumulative'].values
        for i in range(7, len(gdd_values)):
            df.loc[idx[i], 'gdd_7d_velocity'] = gdd_values[i] - gdd_values[i-7]
    
    # 3. Temperature anomaly (vs 30-day rolling average)
    df['temp_30d_avg'] = df.groupby('field')[temp_column].transform(
        lambda x: x.rolling(30, center=True, min_periods=1).mean()
    )
    df['temp_anomaly'] = df[temp_column] - df['temp_30d_avg']
    
    # 4. GDD percentile (within season)
    df['gdd_percentile'] = 0.0
    for (field, model), group in df.groupby(['field', 'model']):
        idx = group.index
        gdd_values = group['gdd_cumulative'].values
        max_gdd = gdd_values[-1]
        df.loc[idx, 'gdd_percentile'] = gdd_values / (max_gdd + 0.001)
    
    return df
```

### Step 3: Update Feature List

In Section 5, change from 7 features to 11:

```python
feature_names = [
    'CI',                    # Original
    '7d Velocity',           # Original
    '7d Acceleration',       # Original
    '14d MA',               # Original
    '14d Velocity',         # Original
    '7d Min',               # Original
    'Velocity Magnitude',   # Original
    'GDD Cumulative',       # NEW
    'GDD 7d Velocity',      # NEW
    'Temp Anomaly',         # NEW
    'GDD Percentile'        # NEW
]

# Update feature engineering:
features = np.column_stack([
    ci_smooth,
    velocity_7d,
    acceleration_7d,
    ma14_values,
    velocity_14d,
    min_7d,
    velocity_magnitude,
    gdd_cumulative,        # NEW
    gdd_7d_velocity,       # NEW
    temp_anomaly,          # NEW
    gdd_percentile         # NEW
])
```

### Step 4: Update Model Input Size

In Section 8, change:
```python
# OLD
model = HarvestDetectionLSTM(input_size=7, ...)

# NEW
model = HarvestDetectionLSTM(input_size=11, ...)  # 7 + 4 new features
```

### Step 5: Retrain

Run Sections 6-12 again with new data + model size.

### Expected Outcome

```
Before Temperature Features:
- Input: 7 features (CI-derived only)
- Imminent AUC: 0.90 (all-client baseline)
- False imminent rate: 15-20% of predictions

After Temperature Features:
- Input: 11 features (CI + temperature)
- Imminent AUC: 0.93-0.95 (3-5% gain)
- False imminent rate: 5-10% (50% reduction!)
- Model can distinguish: Stress-decline vs. harvest-ready decline
```

### Why This Works

**Harvest-specific pattern** (with temperature):
```
Imminent Harvest:
  CI: Declining ↘
  GDD: Very high (>3500 total)
  GDD Velocity: Moderate (still accumulating)
  Temp Anomaly: Normal
  → Model learns: "High GDD + declining CI + normal temp" = HARVEST

Drought Stress (False Positive Prevention):
  CI: Declining ↘ (same as above)
  GDD: Moderate (1500-2000)
  GDD Velocity: Negative (cooling, winter)
  Temp Anomaly: Very hot
  → Model learns: "Low GDD + stress temp" ≠ HARVEST
```

---

## Phase 3: Test Different Imminent Windows (Est. 1-2 hours)

### Current Window: 3-14 days

**Question**: Is this optimal? Let's test:
- 5-15 days (shift right, later warning)
- 7-14 days (tighten lower bound)
- 10-21 days (wider, earlier warning)
- 3-7 days (ultra-tight, latest warning)

### How to Test

In Section 4, create a loop:

```python
windows_to_test = [
    (3, 14),   # Current
    (5, 15),
    (7, 14),
    (10, 21),
    (3, 7),
]

results = []

for imm_start, imm_end in windows_to_test:
    # Relabel with new window
    labeled_seqs = label_harvest_windows_per_season(
        test_sequences,
        imminent_start=imm_start,
        imminent_end=imm_end,
        detected_start=1,
        detected_end=21
    )
    
    # Evaluate
    y_true = concat labels from labeled_seqs
    y_pred = get_model_predictions(test_sequences)
    
    auc = roc_auc_score(y_true, y_pred)
    fp_rate = false_positive_rate(y_true, y_pred)
    
    results.append({
        'window': f"{imm_start}-{imm_end}",
        'auc': auc,
        'fp_rate': fp_rate,
    })

# Print results
results_df = pd.DataFrame(results).sort_values('auc', ascending=False)
print(results_df)
```

### Expected Outcome

```
     Window   AUC    FP_Rate
0    7-14    0.920  0.08      ← RECOMMENDED (best balance)
1    5-15    0.918  0.12
2    3-14    0.915  0.15      ← Current
3    10-21   0.910  0.05      ← Too late
4    3-7     0.905  0.20      ← Too early
```

Choose the window with highest AUC and acceptable false positive rate.

---

## Phase 4: Operational Metrics (Est. 2 hours)

### What We Need

For deployment, understand:
1. **Lead time**: How many days before harvest do we warn?
2. **False positive rate**: How often do we cry wolf?
3. **Miss rate**: How often do we miss the harvest window?
4. **Per-field performance**: Do some fields have worse predictions?

### Code to Add

```python
def compute_operational_metrics(model, test_sequences_labeled, test_features):
    """
    Compute farmer-relevant metrics.
    """
    
    lead_times = []
    false_positives = []
    misses = []
    field_performance = {}
    
    for seq_idx, seq_dict in enumerate(test_sequences_labeled):
        field = seq_dict['field']
        data = seq_dict['data']
        
        # Get predictions
        X_features = test_features[seq_idx]
        with torch.no_grad():
            imminent_pred, _ = model(torch.from_numpy(X_features[np.newaxis, :, :]))
        imminent_pred = imminent_pred[0].cpu().numpy()
        
        # Find harvest boundary
        harvest_idx = np.where(data['harvest_boundary'] == 1)[0]
        if len(harvest_idx) == 0:
            continue
        harvest_idx = harvest_idx[0]
        
        # Find when model triggered (imminent > 0.5)
        triggered_indices = np.where(imminent_pred > 0.5)[0]
        
        if len(triggered_indices) > 0:
            # Last trigger before harvest
            triggers_before = triggered_indices[triggered_indices < harvest_idx]
            if len(triggers_before) > 0:
                last_trigger = triggers_before[-1]
                lead_time = harvest_idx - last_trigger
                lead_times.append(lead_time)
                
                # Check if within optimal window (e.g., 3-14 days)
                if 3 <= lead_time <= 14:
                    if field not in field_performance:
                        field_performance[field] = {'correct': 0, 'total': 0}
                    field_performance[field]['correct'] += 1
            else:
                # Triggered after harvest = false positive
                false_positives.append(len(triggered_indices))
        else:
            # No trigger at all = miss
            misses.append(seq_idx)
        
        if field not in field_performance:
            field_performance[field] = {'correct': 0, 'total': 0}
        field_performance[field]['total'] += 1
    
    # Compute statistics
    print("\n" + "="*60)
    print("OPERATIONAL METRICS")
    print("="*60)
    
    print(f"\nLead Time Analysis:")
    print(f"  Mean: {np.mean(lead_times):.1f} days")
    print(f"  Std:  {np.std(lead_times):.1f} days")
    print(f"  Min:  {np.min(lead_times):.0f} days")
    print(f"  Max:  {np.max(lead_times):.0f} days")
    print(f"  Optimal (3-14d): {sum((3<=x<=14 for x in lead_times))/len(lead_times)*100:.1f}%")
    
    print(f"\nError Analysis:")
    print(f"  False positives (wrong timing): {len(false_positives)} sequences")
    print(f"  Misses (no warning): {len(misses)} sequences")
    print(f"  Accuracy: {len(lead_times)/(len(lead_times)+len(false_positives)+len(misses))*100:.1f}%")
    
    print(f"\nPer-Field Performance:")
    for field, perf in sorted(field_performance.items()):
        accuracy = perf['correct'] / perf['total'] * 100
        print(f"  {field:15s}: {accuracy:5.1f}% correct")
    
    return {
        'lead_times': lead_times,
        'false_positives': len(false_positives),
        'misses': len(misses),
        'field_performance': field_performance
    }

# Run it
metrics = compute_operational_metrics(model, test_sequences_labeled, X_test_features)
```

### What to Look For

**Good performance**:
```
Mean lead time:    7-10 days  ✅ (gives farmer time to prepare)
Optimal timing:    >80%       ✅ (most warnings in 3-14d window)
False positives:   <5%        ✅ (rarely cry wolf)
Misses:            <10%       ✅ (rarely miss harvest)
```

**Poor performance**:
```
Mean lead time:    2 days     ❌ (too late)
Optimal timing:    <60%       ❌ (inconsistent)
False positives:   >20%       ❌ (farmers lose trust)
Misses:            >20%       ❌ (unreliable)
```

---

## Phase 5: Rainfall Features (Optional, High Value) (Est. 3-4 hours)

### Similar to Temperature

Add rainfall + soil moisture features:

```python
def add_rainfall_features(df, rainfall_column='daily_rainfall_mm'):
    """
    Add drought/moisture stress features.
    
    New features (3 total):
    1. rainfall_7d: Total rain in last 7 days
    2. rainfall_deficit: Deficit vs normal for this time of year
    3. drought_stress_index: Combination metric
    """
    
    # 1. 7-day rainfall
    df['rainfall_7d'] = df.groupby('field')[rainfall_column].transform(
        lambda x: x.rolling(7, min_periods=1).sum()
    )
    
    # 2. Seasonal rainfall average
    df['seasonal_rain_avg'] = df.groupby('field')[rainfall_column].transform(
        lambda x: x.rolling(30, center=True, min_periods=1).mean()
    )
    df['rainfall_deficit'] = df['seasonal_rain_avg'] - df[rainfall_column]
    
    # 3. Drought stress index
    # (0 = not stressed, 1 = severe drought)
    df['drought_stress'] = np.minimum(
        1.0,
        df['rainfall_deficit'] / (df['seasonal_rain_avg'] + 0.1)
    )
    
    return df
```

**Why this helps**:
- Drought accelerates maturity (early harvest)
- Excessive rain delays harvest
- Model can distinguish "ready to harvest" from "crop stressed"

---

## Summary: Quick Implementation Checklist

### Week 1: Foundation
- [ ] Phase 1: Retrain on all clients
  - [ ] Change `CLIENT_FILTER = None`
  - [ ] Run full pipeline
  - [ ] Compare metrics
  
### Week 2: Core Enhancement
- [ ] Phase 2: Add temperature features
  - [ ] Find/download temperature data
  - [ ] Merge with CI data
  - [ ] Update feature engineering (7 → 11 features)
  - [ ] Retrain model
  - [ ] Compare metrics (expect 3-5% AUC gain)

### Week 3: Optimization & Testing
- [ ] Phase 3: Test imminent windows
  - [ ] Run sensitivity analysis
  - [ ] Choose optimal window
  - [ ] Retrain with new window
  
- [ ] Phase 4: Operational metrics
  - [ ] Compute lead times
  - [ ] Measure false positive rate
  - [ ] Per-field performance analysis

### Week 4: Optional Enhancement
- [ ] Phase 5: Add rainfall features (if data available)
  - [ ] Download precipitation data
  - [ ] Add drought stress features
  - [ ] Retrain
  - [ ] Measure improvement

---

## Expected Performance Trajectory

```
Current (ESA-only, CI-only):
  Imminent AUC: 0.8793
  False positive rate: ~15%

Phase 1 (All clients):
  Imminent AUC: 0.90-0.92  (+2-3%)
  False positive rate: ~12%

Phase 2 (Add temperature):
  Imminent AUC: 0.93-0.95  (+3-5% from Phase 1)
  False positive rate: ~5%

Phase 3 (Optimize window):
  Imminent AUC: 0.95-0.96  (+1% from fine-tuning)
  False positive rate: ~3%

Phase 4 (Operational tuning):
  Imminent AUC: 0.95-0.96  (stable)
  Lead time: 7-10 days
  Operational readiness: 95%

Phase 5 (Add rainfall):
  Imminent AUC: 0.96-0.97  (+1% for drought years)
  False positive rate: ~2%
  Operational readiness: 99%
```

---

## Key Takeaways

1. **Multi-client retraining is the biggest quick win** (5-10% gain with minimal effort)
2. **Temperature features are essential** for distinguishing harvest-ready from stress
3. **Imminent window tuning** can reduce false positives by 30-50%
4. **Operational metrics** matter more than academic metrics (lead time > AUC)
5. **Rainfall features** are optional but valuable for drought-prone regions

---

## Next Steps

1. **This week**: Run Phase 1 (all-client retrain)
2. **Analyze results**: Compare on same fields, measure improvements
3. **Plan Phase 2**: Identify temperature data source
4. **Schedule Phase 2**: Allocate 3-4 hours for implementation
5. **Document findings**: Track AUC, false positive rate, lead time for each phase

Good luck! This is a solid model with clear paths to improvement. 🚀