# Action Plan: Fix False Imminent Triggers (CI-Only + Confidence Intervals) **Problem**: Noise/clouds cause false imminent triggers (model learns on noisy data) **Solution**: Better smoothing + uncertainty quantification to filter noise **Effort**: 4-5 hours implementation + 30 min training --- ## Root Cause Analysis Your graph shows: Smooth blue LOESS curve (real field state) vs. Jagged red line (noisy measurements) **Current model problem:** - Feature engineering uses raw noisy data - Model learns "this noise pattern = harvest signal" - When clouds/sensor errors create similar noise → False trigger **Fix:** 1. Derive features from SMOOTHED curve only (remove noise at source) 2. Add "stability" feature (harvest = smooth decline, noise = jagged) 3. Add "decline rate" feature (harvest = consistent slope) 4. Add confidence intervals to identify uncertain predictions (= noise) --- ## Step-by-Step Implementation ### STEP 1: Update Feature Engineering (Section 5) **What**: Replace 7 features with new CI-only features **How**: Use 21-day median + 7-day mean smoothing as foundation **Features**: - Smoothed CI (from smooth curve, not raw) - 7d velocity (from smooth curve) - 7d acceleration (from smooth curve) - 21d MA (very long-term trend) - 21d velocity (slow changes only) - **Decline rate** (NEW - slope of smooth curve, harvest = negative slope) - **Stability** (NEW - smoothness metric, harvest = high stability) **Code**: See `CI_ONLY_IMPROVEMENTS.md` → "Solution 1: Aggressive Smoothing" **Expected result**: Model learns real patterns, not noise ### STEP 2: Add Monte Carlo Dropout (Confidence Intervals) **What**: Run prediction 30 times with dropout ON, get uncertainty **Why**: High uncertainty = model unsure = probably noise **How**: Keep dropout active during inference, ensemble predictions **Code**: See `CI_ONLY_IMPROVEMENTS.md` → "Solution 2: Add Confidence Intervals" **Expected result**: Each prediction has mean + 95% CI ### STEP 3: Filter by Uncertainty **What**: Only alert on HIGH probability + LOW uncertainty **Why**: Filters out noise-driven false positives **How**: Use threshold like `prob > 0.5 AND std < 0.10` **Code**: See `CI_ONLY_IMPROVEMENTS.md` → "Solution 3: Use Uncertainty to Filter" **Expected result**: False positive rate drops 30-50% without losing real harvests ### STEP 4: Retrain & Evaluate **Runtime**: ~30 minutes on GPU (standard) --- ## What NOT to Do (Yet) ❌ **Don't add temperature data yet** ❌ **Don't add rainfall data yet** ❌ **Don't add soil moisture yet** Reason: Fix CI-only first. Once this works perfectly, external data will add value. Adding too many features now would confuse the problem. --- ## Expected Performance | Metric | Before | After | Change | |--------|--------|-------|--------| | Imminent AUC | 0.8793 | 0.90-0.92 | +1-3% | | False positive rate | ~15% | ~3-5% | -70% | | **Recall** (catches real harvests) | 100% | 85-90% | -10-15% | **Trade-off**: You lose 10-15% of early warnings to filter 70% of false positives. Acceptable trade. --- ## Testing Strategy After implementation, test on same 6 sequences you've been using: ``` For each sequence: 1. Plot imminent probability + confidence bands 2. Plot uncertainty over time 3. Verify: - Cloud dips show HIGH uncertainty - Real harvest shows LOW uncertainty - False triggers disappeared ``` --- ## File Location All documentation is now in: `python_app/harvest_detection_experiments/` Main files: - `CI_ONLY_IMPROVEMENTS.md` ← Implementation details + code - `README_EVALUATION.md` ← Navigation guide - Other `.md` files for reference --- ## Timeline - **Day 1**: Read CI_ONLY_IMPROVEMENTS.md, plan implementation - **Day 2-3**: Implement Step 1 (new features) - **Day 4**: Implement Steps 2-3 (Monte Carlo + filtering) - **Day 5**: Retrain + test - **Day 5+**: Evaluate results, iterate Total: **3-4 focused days** of work --- ## Success Criteria ✅ Model trained without errors ✅ Uncertainty bands visible in plots ✅ Cloud dips show high uncertainty ✅ Real harvest shows low uncertainty ✅ False positive rate < 5% ✅ Recall > 85% (still catches most real harvests)