SmartCane/python_app/harvest_detection_experiments/_archive/ACTION_PLAN.md
2026-01-06 14:17:37 +01:00

4.1 KiB

Action Plan: Fix False Imminent Triggers (CI-Only + Confidence Intervals)

Problem: Noise/clouds cause false imminent triggers (model learns on noisy data)
Solution: Better smoothing + uncertainty quantification to filter noise
Effort: 4-5 hours implementation + 30 min training


Root Cause Analysis

Your graph shows: Smooth blue LOESS curve (real field state) vs. Jagged red line (noisy measurements)

Current model problem:

  • Feature engineering uses raw noisy data
  • Model learns "this noise pattern = harvest signal"
  • When clouds/sensor errors create similar noise → False trigger

Fix:

  1. Derive features from SMOOTHED curve only (remove noise at source)
  2. Add "stability" feature (harvest = smooth decline, noise = jagged)
  3. Add "decline rate" feature (harvest = consistent slope)
  4. Add confidence intervals to identify uncertain predictions (= noise)

Step-by-Step Implementation

STEP 1: Update Feature Engineering (Section 5)

What: Replace 7 features with new CI-only features
How: Use 21-day median + 7-day mean smoothing as foundation
Features:

  • Smoothed CI (from smooth curve, not raw)
  • 7d velocity (from smooth curve)
  • 7d acceleration (from smooth curve)
  • 21d MA (very long-term trend)
  • 21d velocity (slow changes only)
  • Decline rate (NEW - slope of smooth curve, harvest = negative slope)
  • Stability (NEW - smoothness metric, harvest = high stability)

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 1: Aggressive Smoothing"

Expected result: Model learns real patterns, not noise

STEP 2: Add Monte Carlo Dropout (Confidence Intervals)

What: Run prediction 30 times with dropout ON, get uncertainty
Why: High uncertainty = model unsure = probably noise
How: Keep dropout active during inference, ensemble predictions

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 2: Add Confidence Intervals"

Expected result: Each prediction has mean + 95% CI

STEP 3: Filter by Uncertainty

What: Only alert on HIGH probability + LOW uncertainty
Why: Filters out noise-driven false positives
How: Use threshold like prob > 0.5 AND std < 0.10

Code: See CI_ONLY_IMPROVEMENTS.md → "Solution 3: Use Uncertainty to Filter"

Expected result: False positive rate drops 30-50% without losing real harvests

STEP 4: Retrain & Evaluate

Runtime: ~30 minutes on GPU (standard)


What NOT to Do (Yet)

Don't add temperature data yet
Don't add rainfall data yet
Don't add soil moisture yet

Reason: Fix CI-only first. Once this works perfectly, external data will add value. Adding too many features now would confuse the problem.


Expected Performance

Metric Before After Change
Imminent AUC 0.8793 0.90-0.92 +1-3%
False positive rate ~15% ~3-5% -70%
Recall (catches real harvests) 100% 85-90% -10-15%

Trade-off: You lose 10-15% of early warnings to filter 70% of false positives. Acceptable trade.


Testing Strategy

After implementation, test on same 6 sequences you've been using:

For each sequence:
  1. Plot imminent probability + confidence bands
  2. Plot uncertainty over time
  3. Verify:
     - Cloud dips show HIGH uncertainty
     - Real harvest shows LOW uncertainty
     - False triggers disappeared

File Location

All documentation is now in:
python_app/harvest_detection_experiments/

Main files:

  • CI_ONLY_IMPROVEMENTS.md ← Implementation details + code
  • README_EVALUATION.md ← Navigation guide
  • Other .md files for reference

Timeline

  • Day 1: Read CI_ONLY_IMPROVEMENTS.md, plan implementation
  • Day 2-3: Implement Step 1 (new features)
  • Day 4: Implement Steps 2-3 (Monte Carlo + filtering)
  • Day 5: Retrain + test
  • Day 5+: Evaluate results, iterate

Total: 3-4 focused days of work


Success Criteria

Model trained without errors
Uncertainty bands visible in plots
Cloud dips show high uncertainty
Real harvest shows low uncertainty
False positive rate < 5%
Recall > 85% (still catches most real harvests)