2026-01-06 14:17:37 +01:00

4.1 KiB

Raw Blame History

Action Plan: Fix False Imminent Triggers (CI-Only + Confidence Intervals)

Problem: Noise/clouds cause false imminent triggers (model learns on noisy data)
Solution: Better smoothing + uncertainty quantification to filter noise
Effort: 4-5 hours implementation + 30 min training

Root Cause Analysis

Your graph shows: Smooth blue LOESS curve (real field state) vs. Jagged red line (noisy measurements)

Current model problem:

Feature engineering uses raw noisy data
Model learns "this noise pattern = harvest signal"
When clouds/sensor errors create similar noise → False trigger

Fix:

Derive features from SMOOTHED curve only (remove noise at source)
Add "stability" feature (harvest = smooth decline, noise = jagged)
Add "decline rate" feature (harvest = consistent slope)
Add confidence intervals to identify uncertain predictions (= noise)

Step-by-Step Implementation

STEP 1: Update Feature Engineering (Section 5)

What: Replace 7 features with new CI-only features
How: Use 21-day median + 7-day mean smoothing as foundation
Features:

Smoothed CI (from smooth curve, not raw)
7d velocity (from smooth curve)
7d acceleration (from smooth curve)
21d MA (very long-term trend)
21d velocity (slow changes only)
Decline rate (NEW - slope of smooth curve, harvest = negative slope)
Stability (NEW - smoothness metric, harvest = high stability)

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 1: Aggressive Smoothing"

Expected result: Model learns real patterns, not noise

STEP 2: Add Monte Carlo Dropout (Confidence Intervals)

What: Run prediction 30 times with dropout ON, get uncertainty
Why: High uncertainty = model unsure = probably noise
How: Keep dropout active during inference, ensemble predictions

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 2: Add Confidence Intervals"

Expected result: Each prediction has mean + 95% CI

STEP 3: Filter by Uncertainty

What: Only alert on HIGH probability + LOW uncertainty
Why: Filters out noise-driven false positives
How: Use threshold like prob > 0.5 AND std < 0.10

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 3: Use Uncertainty to Filter"

Expected result: False positive rate drops 30-50% without losing real harvests

STEP 4: Retrain & Evaluate

Runtime: ~30 minutes on GPU (standard)

What NOT to Do (Yet)

❌ Don't add temperature data yet
❌ Don't add rainfall data yet
❌ Don't add soil moisture yet

Reason: Fix CI-only first. Once this works perfectly, external data will add value. Adding too many features now would confuse the problem.

Expected Performance

Metric	Before	After	Change
Imminent AUC	0.8793	0.90-0.92	+1-3%
False positive rate	~15%	~3-5%	-70%
Recall (catches real harvests)	100%	85-90%	-10-15%

Trade-off: You lose 10-15% of early warnings to filter 70% of false positives. Acceptable trade.

Testing Strategy

After implementation, test on same 6 sequences you've been using:

For each sequence:
  1. Plot imminent probability + confidence bands
  2. Plot uncertainty over time
  3. Verify:
     - Cloud dips show HIGH uncertainty
     - Real harvest shows LOW uncertainty
     - False triggers disappeared

File Location

All documentation is now in:
python_app/harvest_detection_experiments/

Main files:

CI_ONLY_IMPROVEMENTS.md ← Implementation details + code
README_EVALUATION.md ← Navigation guide
Other .md files for reference

Timeline

Day 1: Read CI_ONLY_IMPROVEMENTS.md, plan implementation
Day 2-3: Implement Step 1 (new features)
Day 4: Implement Steps 2-3 (Monte Carlo + filtering)
Day 5: Retrain + test
Day 5+: Evaluate results, iterate

Total: 3-4 focused days of work

Success Criteria

✅ Model trained without errors
✅ Uncertainty bands visible in plots
✅ Cloud dips show high uncertainty
✅ Real harvest shows low uncertainty
✅ False positive rate < 5%
✅ Recall > 85% (still catches most real harvests)

4.1 KiB Raw Blame History