4.1 KiB
Action Plan: Fix False Imminent Triggers (CI-Only + Confidence Intervals)
Problem: Noise/clouds cause false imminent triggers (model learns on noisy data)
Solution: Better smoothing + uncertainty quantification to filter noise
Effort: 4-5 hours implementation + 30 min training
Root Cause Analysis
Your graph shows: Smooth blue LOESS curve (real field state) vs. Jagged red line (noisy measurements)
Current model problem:
- Feature engineering uses raw noisy data
- Model learns "this noise pattern = harvest signal"
- When clouds/sensor errors create similar noise → False trigger
Fix:
- Derive features from SMOOTHED curve only (remove noise at source)
- Add "stability" feature (harvest = smooth decline, noise = jagged)
- Add "decline rate" feature (harvest = consistent slope)
- Add confidence intervals to identify uncertain predictions (= noise)
Step-by-Step Implementation
STEP 1: Update Feature Engineering (Section 5)
What: Replace 7 features with new CI-only features
How: Use 21-day median + 7-day mean smoothing as foundation
Features:
- Smoothed CI (from smooth curve, not raw)
- 7d velocity (from smooth curve)
- 7d acceleration (from smooth curve)
- 21d MA (very long-term trend)
- 21d velocity (slow changes only)
- Decline rate (NEW - slope of smooth curve, harvest = negative slope)
- Stability (NEW - smoothness metric, harvest = high stability)
Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 1: Aggressive Smoothing"
Expected result: Model learns real patterns, not noise
STEP 2: Add Monte Carlo Dropout (Confidence Intervals)
What: Run prediction 30 times with dropout ON, get uncertainty
Why: High uncertainty = model unsure = probably noise
How: Keep dropout active during inference, ensemble predictions
Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 2: Add Confidence Intervals"
Expected result: Each prediction has mean + 95% CI
STEP 3: Filter by Uncertainty
What: Only alert on HIGH probability + LOW uncertainty
Why: Filters out noise-driven false positives
How: Use threshold like prob > 0.5 AND std < 0.10
Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 3: Use Uncertainty to Filter"
Expected result: False positive rate drops 30-50% without losing real harvests
STEP 4: Retrain & Evaluate
Runtime: ~30 minutes on GPU (standard)
What NOT to Do (Yet)
❌ Don't add temperature data yet
❌ Don't add rainfall data yet
❌ Don't add soil moisture yet
Reason: Fix CI-only first. Once this works perfectly, external data will add value. Adding too many features now would confuse the problem.
Expected Performance
| Metric | Before | After | Change |
|---|---|---|---|
| Imminent AUC | 0.8793 | 0.90-0.92 | +1-3% |
| False positive rate | ~15% | ~3-5% | -70% |
| Recall (catches real harvests) | 100% | 85-90% | -10-15% |
Trade-off: You lose 10-15% of early warnings to filter 70% of false positives. Acceptable trade.
Testing Strategy
After implementation, test on same 6 sequences you've been using:
For each sequence:
1. Plot imminent probability + confidence bands
2. Plot uncertainty over time
3. Verify:
- Cloud dips show HIGH uncertainty
- Real harvest shows LOW uncertainty
- False triggers disappeared
File Location
All documentation is now in:
python_app/harvest_detection_experiments/
Main files:
CI_ONLY_IMPROVEMENTS.md← Implementation details + codeREADME_EVALUATION.md← Navigation guide- Other
.mdfiles for reference
Timeline
- Day 1: Read CI_ONLY_IMPROVEMENTS.md, plan implementation
- Day 2-3: Implement Step 1 (new features)
- Day 4: Implement Steps 2-3 (Monte Carlo + filtering)
- Day 5: Retrain + test
- Day 5+: Evaluate results, iterate
Total: 3-4 focused days of work
Success Criteria
✅ Model trained without errors
✅ Uncertainty bands visible in plots
✅ Cloud dips show high uncertainty
✅ Real harvest shows low uncertainty
✅ False positive rate < 5%
✅ Recall > 85% (still catches most real harvests)