2026-01-06 14:17:37 +01:00

7.4 KiB

Raw Blame History

Harvest Detection Experiment Framework

Systematic experimentation framework for harvest detection using LSTM/GRU models with comprehensive feature engineering and automated result tracking.

Overview

This framework enables systematic, reproducible experiments for optimizing harvest detection models. It separates concerns:

Configuration (YAML files) - Define experiments without touching code
Execution (Python scripts) - Automated training, evaluation, comparison
Results (Organized folders) - All metrics, models, and plots saved automatically

Quick Start

1. Run a Single Experiment

cd experiment_framework
python run_experiment.py --exp exp_001

This will:

Load data from lstm_complete_data.csv
Extract features defined in config/experiments.yaml
Train with 5-fold cross-validation
Evaluate on held-out test set
Save all results to results/001_trends_only/

2. Run Multiple Experiments (Batch)

python run_experiment.py --exp exp_001,exp_002,exp_003

Runs experiments 001, 002, and 003 sequentially.

3. Compare All Results

python analyze_results.py --experiments all --rank-by imminent_auc

This generates:

results/comparison_table.csv - Sortable metrics table
results/comparison_imminent_auc.png - Bar chart of AUC scores
results/comparison_all_metrics.png - Multi-metric comparison

4. Find Top Performers

python analyze_results.py --rank-by imminent_auc --top 3

Shows the top 3 experiments ranked by imminent AUC.

Project Structure

experiment_framework/
├── config/
│   └── experiments.yaml          # All experiment configurations
├── src/
│   ├── data_loader.py            # Data loading & preprocessing
│   ├── feature_engineering.py    # 25-feature extraction system
│   ├── models.py                 # LSTM/GRU architectures
│   ├── training.py               # K-fold CV training engine
│   └── evaluation.py             # Metrics & visualization
├── run_experiment.py             # Main execution script
├── analyze_results.py            # Comparison dashboard
└── results/                      # Auto-generated results
    ├── 001_trends_only/
    │   ├── config.json           # Exact config used
    │   ├── model.pt              # Trained weights
    │   ├── metrics.json          # All metrics
    │   ├── training_curves.png   # Loss curves
    │   ├── roc_curves.png        # ROC plots
    │   └── confusion_matrices.png
    └── comparison/
        ├── comparison_table.csv
        └── comparison_*.png

Phase 1 Experiments (Feature Selection)

Goal: Identify which feature types improve harvest detection most.

Exp ID	Features	Count	Purpose
001	CI, 7d_MA, 14d_MA, 21d_MA	4	Baseline (trends only)
002	001 + velocities	7	Add rate of change
003	002 + accelerations	10	Add momentum
004	001 + mins	7	Add structural lows
005	001 + maxs	7	Add structural highs
006	001 + ranges	7	Add volatility
007	001 + stds	7	Add noise indicators
008	001 + CVs	7	Add relative stability
009	Trends + vel + mins + std	13	Combined best features
010	All 25 features	25	Full feature set

All experiments use:

Model: LSTM, hidden_size=128, num_layers=1, dropout=0.5
Window: 28-1 days before harvest
Training: 5-fold CV, 150 epochs, early stopping (patience=20)

Feature Engineering System

25 Total Features (All Causal/Operational)

Tier 1: State (4)

CI_raw, 7d_MA, 14d_MA, 21d_MA

Tier 2: Velocity (3)

7d_velocity, 14d_velocity, 21d_velocity

Tier 3: Acceleration (3)

7d_acceleration, 14d_acceleration, 21d_acceleration

Tier 4: Structural (9)

Min: 7d_min, 14d_min, 21d_min
Max: 7d_max, 14d_max, 21d_max
Range: 7d_range, 14d_range, 21d_range

Tier 5: Stability (6)

Std: 7d_std, 14d_std, 21d_std
CV: 7d_CV, 14d_CV, 21d_CV

All features use backward-looking rolling windows (causal) for operational deployment.

Output Metrics

Cross-Validation (K-Fold)

Imminent AUC (mean ± std across folds)
Detected AUC (mean ± std across folds)

Test Set (Held-Out 15%)

Imminent: AUC, F1, Precision, Recall
Detected: AUC, F1, Precision, Recall
Total predictions (timesteps)

Visualizations Per Experiment

Training/validation loss curves (all folds)
ROC curves (imminent + detected)
Confusion matrices (imminent + detected)

Customization

Add New Experiment

Edit config/experiments.yaml:

exp_011:
  name: "011_my_custom_experiment"
  description: "Testing something new"
  features:
    - CI_raw
    - 7d_MA
    - 7d_velocity
  model:
    type: LSTM  # or GRU
    hidden_size: 256
    num_layers: 2
    dropout: 0.6
  training:
    imminent_days_before: 30
    imminent_days_before_end: 1
    k_folds: 5
    num_epochs: 200
    # ... other params

Then run:

python run_experiment.py --exp exp_011

Add New Feature

Edit src/feature_engineering.py, add to compute_feature():

elif feature_name == '30d_MA':
    return ci_series.rolling(window=30, min_periods=1, center=False).mean().values

Then use in experiment config.

Workflow Recommendations

1. Feature Selection (Phase 1)

# Run all Phase 1 experiments
python run_experiment.py --exp exp_001,exp_002,exp_003,exp_004,exp_005,exp_006,exp_007,exp_008,exp_009,exp_010

# Compare results
python analyze_results.py --experiments all --rank-by imminent_auc

Expected Time: ~30-60 minutes per experiment on GPU (5-fold CV × 150 epochs)

2. Identify Best Features

# Show top 3
python analyze_results.py --rank-by imminent_auc --top 3

Decision: Choose feature set with highest test AUC that generalizes well (CV AUC ≈ test AUC).

3. Model Architecture Optimization (Phase 2)

Once best features identified, test different architectures:

Vary hidden_size: 64, 128, 256
Vary num_layers: 1, 2
Try GRU vs LSTM

4. Hyperparameter Tuning (Phase 3)

Fine-tune best model:

Dropout: 0.3, 0.5, 0.7
Learning rate: 0.0005, 0.001, 0.002
Window length: 21-1, 28-1, 35-1

Tips

✅ Always compare CV AUC vs Test AUC - Large gap = overfitting
✅ Start with baseline (exp_001) - Establishes minimum performance
✅ Change one thing at a time - Isolate impact of features vs model vs hyperparams
✅ Check confusion matrices - Understand failure modes (false positives vs negatives)
✅ Monitor training curves - Early stopping = converged, long plateaus = needs more capacity

Troubleshooting

CUDA out of memory:

python run_experiment.py --exp exp_001 --device cpu

Experiment not found: Check exact name in config/experiments.yaml (case-sensitive)

Import errors: Ensure you're running from experiment_framework/ directory

Next Steps

After Phase 1 completes:

Identify best feature set
Configure Phase 2 experiments (model architecture) in experiments.yaml
Run Phase 2, compare results
Select final model for production

Requirements

Python 3.8+
PyTorch 1.10+
scikit-learn
pandas
numpy
matplotlib
seaborn
pyyaml

Install:

pip install torch scikit-learn pandas numpy matplotlib seaborn pyyaml

7.4 KiB Raw Blame History Unescape Escape