16 KiB
Implementation Roadmap: Improving the Harvest Detection Model
Target: Move from 88% imminent AUC (current) to 95%+ with fewer false positives
Phase 1: Multi-Client Retraining (Est. 1-2 hours active work)
What to Do
Change the model from ESA-only to all-client training.
Step-by-Step
-
Open the notebook at
python_app/harvest_detection_experiments/05_lstm_harvest_detection_pytorch.ipynb -
Go to Section 2 (Data Loading), find this line (~line 49):
CLIENT_FILTER = 'esa' # ← CHANGE THIS -
Change to:
CLIENT_FILTER = None # Now uses ALL clients -
Run Sections 2-12 sequentially
- Section 2: Data loading & cleaning (2-5 min)
- Sections 3-6: Feature engineering (1-2 min)
- Sections 7-9: Training (5-15 min, depending on GPU)
- Sections 10-12: Evaluation & saving (2-3 min)
-
Compare results
- Before:
harvest_detection_model_esa_esa.pt(ESA-only) - After:
harvest_detection_model_esa_None.pt(all-client) - Expected: Imminent AUC improves from 0.8793 → 0.90+, fewer false positives
- Before:
Expected Outcome
ESA-Only (Current):
- Train data: ~2,000 days (2 fields)
- Imminent AUC: 0.8793
- Issue: False imminent peaks during seasonal dips
All-Client (Expected):
- Train data: ~10,000+ days (15+ fields)
- Imminent AUC: 0.90-0.92 (5-10% improvement)
- Issue: Reduced, but CI-only limitation remains
Success Criteria
- ✅ Model trains without errors
- ✅ AUC scores reasonable (imminent > 0.85, detected > 0.95)
- ✅ Sequence visualization shows fewer false imminent peaks
Phase 2: Add Temperature Features (Est. 3-4 hours)
Why Temperature Matters
Sugarcane harvest timing correlates with accumulated heat. Different types of CI decline:
Normal Ripening (HARVEST-READY):
- Temperature: Moderate-warm
- Rainfall: Normal
- CI: Declining over 2 weeks
- → Launch harvest alerts
Stress-Induced Decline (AVOID):
- Temperature: Very hot or very cold
- Rainfall: Low (drought) or excessive
- CI: Similar decline pattern
- → DON'T trigger alerts (crop stressed, not ready)
Model Problem: Can't distinguish! Need temperature + rainfall.
Step 1: Find Temperature Data
Option A: ECMWF Reanalysis (Recommended)
- Global 0.25° resolution
- Free: https://www.ecmwf.int/
- Daily or monthly data available
- Takes 1-2 hours to download/process
Option B: Local Weather Stations
- Higher accuracy if available
- Must interpolate between stations
- May have gaps
Option C: MODIS/Satellite Temperature
- From Landsat, Sentinel-3
- Already integrated with your pipeline?
- Same download as CI
Steps:
- Download daily average temperature for field locations, 2020-2024
- Merge with CI data by date/location
- Format: One row per field, per date with temperature column
Step 2: Engineer Temperature-Based Features
Add to Section 5 (Feature Engineering):
def add_temperature_features(df, temp_column='daily_avg_temp'):
"""
Add harvest-relevant temperature features.
New features (4 total):
1. gdd_cumulative: Growing Degree Days (sum of (T-base) where T>10°C)
2. gdd_7d_velocity: 7-day change in accumulated heat
3. temp_anomaly: Current temp vs seasonal average
4. gdd_percentile: Where in season's heat accumulation?
"""
# 1. Growing Degree Days (GDD)
# Base temp for sugarcane: 10°C
df['daily_gdd'] = np.maximum(0, df[temp_column] - 10)
df['gdd_cumulative'] = df.groupby(['field', 'model'])['daily_gdd'].cumsum()
# 2. GDD velocity
df['gdd_7d_velocity'] = 0.0
for (field, model), group in df.groupby(['field', 'model']):
idx = group.index
gdd_values = group['gdd_cumulative'].values
for i in range(7, len(gdd_values)):
df.loc[idx[i], 'gdd_7d_velocity'] = gdd_values[i] - gdd_values[i-7]
# 3. Temperature anomaly (vs 30-day rolling average)
df['temp_30d_avg'] = df.groupby('field')[temp_column].transform(
lambda x: x.rolling(30, center=True, min_periods=1).mean()
)
df['temp_anomaly'] = df[temp_column] - df['temp_30d_avg']
# 4. GDD percentile (within season)
df['gdd_percentile'] = 0.0
for (field, model), group in df.groupby(['field', 'model']):
idx = group.index
gdd_values = group['gdd_cumulative'].values
max_gdd = gdd_values[-1]
df.loc[idx, 'gdd_percentile'] = gdd_values / (max_gdd + 0.001)
return df
Step 3: Update Feature List
In Section 5, change from 7 features to 11:
feature_names = [
'CI', # Original
'7d Velocity', # Original
'7d Acceleration', # Original
'14d MA', # Original
'14d Velocity', # Original
'7d Min', # Original
'Velocity Magnitude', # Original
'GDD Cumulative', # NEW
'GDD 7d Velocity', # NEW
'Temp Anomaly', # NEW
'GDD Percentile' # NEW
]
# Update feature engineering:
features = np.column_stack([
ci_smooth,
velocity_7d,
acceleration_7d,
ma14_values,
velocity_14d,
min_7d,
velocity_magnitude,
gdd_cumulative, # NEW
gdd_7d_velocity, # NEW
temp_anomaly, # NEW
gdd_percentile # NEW
])
Step 4: Update Model Input Size
In Section 8, change:
# OLD
model = HarvestDetectionLSTM(input_size=7, ...)
# NEW
model = HarvestDetectionLSTM(input_size=11, ...) # 7 + 4 new features
Step 5: Retrain
Run Sections 6-12 again with new data + model size.
Expected Outcome
Before Temperature Features:
- Input: 7 features (CI-derived only)
- Imminent AUC: 0.90 (all-client baseline)
- False imminent rate: 15-20% of predictions
After Temperature Features:
- Input: 11 features (CI + temperature)
- Imminent AUC: 0.93-0.95 (3-5% gain)
- False imminent rate: 5-10% (50% reduction!)
- Model can distinguish: Stress-decline vs. harvest-ready decline
Why This Works
Harvest-specific pattern (with temperature):
Imminent Harvest:
CI: Declining ↘
GDD: Very high (>3500 total)
GDD Velocity: Moderate (still accumulating)
Temp Anomaly: Normal
→ Model learns: "High GDD + declining CI + normal temp" = HARVEST
Drought Stress (False Positive Prevention):
CI: Declining ↘ (same as above)
GDD: Moderate (1500-2000)
GDD Velocity: Negative (cooling, winter)
Temp Anomaly: Very hot
→ Model learns: "Low GDD + stress temp" ≠ HARVEST
Phase 3: Test Different Imminent Windows (Est. 1-2 hours)
Current Window: 3-14 days
Question: Is this optimal? Let's test:
- 5-15 days (shift right, later warning)
- 7-14 days (tighten lower bound)
- 10-21 days (wider, earlier warning)
- 3-7 days (ultra-tight, latest warning)
How to Test
In Section 4, create a loop:
windows_to_test = [
(3, 14), # Current
(5, 15),
(7, 14),
(10, 21),
(3, 7),
]
results = []
for imm_start, imm_end in windows_to_test:
# Relabel with new window
labeled_seqs = label_harvest_windows_per_season(
test_sequences,
imminent_start=imm_start,
imminent_end=imm_end,
detected_start=1,
detected_end=21
)
# Evaluate
y_true = concat labels from labeled_seqs
y_pred = get_model_predictions(test_sequences)
auc = roc_auc_score(y_true, y_pred)
fp_rate = false_positive_rate(y_true, y_pred)
results.append({
'window': f"{imm_start}-{imm_end}",
'auc': auc,
'fp_rate': fp_rate,
})
# Print results
results_df = pd.DataFrame(results).sort_values('auc', ascending=False)
print(results_df)
Expected Outcome
Window AUC FP_Rate
0 7-14 0.920 0.08 ← RECOMMENDED (best balance)
1 5-15 0.918 0.12
2 3-14 0.915 0.15 ← Current
3 10-21 0.910 0.05 ← Too late
4 3-7 0.905 0.20 ← Too early
Choose the window with highest AUC and acceptable false positive rate.
Phase 4: Operational Metrics (Est. 2 hours)
What We Need
For deployment, understand:
- Lead time: How many days before harvest do we warn?
- False positive rate: How often do we cry wolf?
- Miss rate: How often do we miss the harvest window?
- Per-field performance: Do some fields have worse predictions?
Code to Add
def compute_operational_metrics(model, test_sequences_labeled, test_features):
"""
Compute farmer-relevant metrics.
"""
lead_times = []
false_positives = []
misses = []
field_performance = {}
for seq_idx, seq_dict in enumerate(test_sequences_labeled):
field = seq_dict['field']
data = seq_dict['data']
# Get predictions
X_features = test_features[seq_idx]
with torch.no_grad():
imminent_pred, _ = model(torch.from_numpy(X_features[np.newaxis, :, :]))
imminent_pred = imminent_pred[0].cpu().numpy()
# Find harvest boundary
harvest_idx = np.where(data['harvest_boundary'] == 1)[0]
if len(harvest_idx) == 0:
continue
harvest_idx = harvest_idx[0]
# Find when model triggered (imminent > 0.5)
triggered_indices = np.where(imminent_pred > 0.5)[0]
if len(triggered_indices) > 0:
# Last trigger before harvest
triggers_before = triggered_indices[triggered_indices < harvest_idx]
if len(triggers_before) > 0:
last_trigger = triggers_before[-1]
lead_time = harvest_idx - last_trigger
lead_times.append(lead_time)
# Check if within optimal window (e.g., 3-14 days)
if 3 <= lead_time <= 14:
if field not in field_performance:
field_performance[field] = {'correct': 0, 'total': 0}
field_performance[field]['correct'] += 1
else:
# Triggered after harvest = false positive
false_positives.append(len(triggered_indices))
else:
# No trigger at all = miss
misses.append(seq_idx)
if field not in field_performance:
field_performance[field] = {'correct': 0, 'total': 0}
field_performance[field]['total'] += 1
# Compute statistics
print("\n" + "="*60)
print("OPERATIONAL METRICS")
print("="*60)
print(f"\nLead Time Analysis:")
print(f" Mean: {np.mean(lead_times):.1f} days")
print(f" Std: {np.std(lead_times):.1f} days")
print(f" Min: {np.min(lead_times):.0f} days")
print(f" Max: {np.max(lead_times):.0f} days")
print(f" Optimal (3-14d): {sum((3<=x<=14 for x in lead_times))/len(lead_times)*100:.1f}%")
print(f"\nError Analysis:")
print(f" False positives (wrong timing): {len(false_positives)} sequences")
print(f" Misses (no warning): {len(misses)} sequences")
print(f" Accuracy: {len(lead_times)/(len(lead_times)+len(false_positives)+len(misses))*100:.1f}%")
print(f"\nPer-Field Performance:")
for field, perf in sorted(field_performance.items()):
accuracy = perf['correct'] / perf['total'] * 100
print(f" {field:15s}: {accuracy:5.1f}% correct")
return {
'lead_times': lead_times,
'false_positives': len(false_positives),
'misses': len(misses),
'field_performance': field_performance
}
# Run it
metrics = compute_operational_metrics(model, test_sequences_labeled, X_test_features)
What to Look For
Good performance:
Mean lead time: 7-10 days ✅ (gives farmer time to prepare)
Optimal timing: >80% ✅ (most warnings in 3-14d window)
False positives: <5% ✅ (rarely cry wolf)
Misses: <10% ✅ (rarely miss harvest)
Poor performance:
Mean lead time: 2 days ❌ (too late)
Optimal timing: <60% ❌ (inconsistent)
False positives: >20% ❌ (farmers lose trust)
Misses: >20% ❌ (unreliable)
Phase 5: Rainfall Features (Optional, High Value) (Est. 3-4 hours)
Similar to Temperature
Add rainfall + soil moisture features:
def add_rainfall_features(df, rainfall_column='daily_rainfall_mm'):
"""
Add drought/moisture stress features.
New features (3 total):
1. rainfall_7d: Total rain in last 7 days
2. rainfall_deficit: Deficit vs normal for this time of year
3. drought_stress_index: Combination metric
"""
# 1. 7-day rainfall
df['rainfall_7d'] = df.groupby('field')[rainfall_column].transform(
lambda x: x.rolling(7, min_periods=1).sum()
)
# 2. Seasonal rainfall average
df['seasonal_rain_avg'] = df.groupby('field')[rainfall_column].transform(
lambda x: x.rolling(30, center=True, min_periods=1).mean()
)
df['rainfall_deficit'] = df['seasonal_rain_avg'] - df[rainfall_column]
# 3. Drought stress index
# (0 = not stressed, 1 = severe drought)
df['drought_stress'] = np.minimum(
1.0,
df['rainfall_deficit'] / (df['seasonal_rain_avg'] + 0.1)
)
return df
Why this helps:
- Drought accelerates maturity (early harvest)
- Excessive rain delays harvest
- Model can distinguish "ready to harvest" from "crop stressed"
Summary: Quick Implementation Checklist
Week 1: Foundation
- Phase 1: Retrain on all clients
- Change
CLIENT_FILTER = None - Run full pipeline
- Compare metrics
- Change
Week 2: Core Enhancement
- Phase 2: Add temperature features
- Find/download temperature data
- Merge with CI data
- Update feature engineering (7 → 11 features)
- Retrain model
- Compare metrics (expect 3-5% AUC gain)
Week 3: Optimization & Testing
-
Phase 3: Test imminent windows
- Run sensitivity analysis
- Choose optimal window
- Retrain with new window
-
Phase 4: Operational metrics
- Compute lead times
- Measure false positive rate
- Per-field performance analysis
Week 4: Optional Enhancement
- Phase 5: Add rainfall features (if data available)
- Download precipitation data
- Add drought stress features
- Retrain
- Measure improvement
Expected Performance Trajectory
Current (ESA-only, CI-only):
Imminent AUC: 0.8793
False positive rate: ~15%
Phase 1 (All clients):
Imminent AUC: 0.90-0.92 (+2-3%)
False positive rate: ~12%
Phase 2 (Add temperature):
Imminent AUC: 0.93-0.95 (+3-5% from Phase 1)
False positive rate: ~5%
Phase 3 (Optimize window):
Imminent AUC: 0.95-0.96 (+1% from fine-tuning)
False positive rate: ~3%
Phase 4 (Operational tuning):
Imminent AUC: 0.95-0.96 (stable)
Lead time: 7-10 days
Operational readiness: 95%
Phase 5 (Add rainfall):
Imminent AUC: 0.96-0.97 (+1% for drought years)
False positive rate: ~2%
Operational readiness: 99%
Key Takeaways
- Multi-client retraining is the biggest quick win (5-10% gain with minimal effort)
- Temperature features are essential for distinguishing harvest-ready from stress
- Imminent window tuning can reduce false positives by 30-50%
- Operational metrics matter more than academic metrics (lead time > AUC)
- Rainfall features are optional but valuable for drought-prone regions
Next Steps
- This week: Run Phase 1 (all-client retrain)
- Analyze results: Compare on same fields, measure improvements
- Plan Phase 2: Identify temperature data source
- Schedule Phase 2: Allocate 3-4 hours for implementation
- Document findings: Track AUC, false positive rate, lead time for each phase
Good luck! This is a solid model with clear paths to improvement. 🚀