213 lines
6.2 KiB
Markdown
213 lines
6.2 KiB
Markdown
# SmartCane Data Validation Tool
|
|
|
|
A standalone, client-side data validation tool for validating Excel harvest data and GeoJSON field boundaries before uploading to the SmartCane system.
|
|
|
|
## Features
|
|
|
|
### 🚦 Traffic Light System
|
|
- **🟢 GREEN**: All checks passed
|
|
- **🟡 YELLOW**: Warnings detected (non-critical issues)
|
|
- **🔴 RED**: Errors detected (blocking issues)
|
|
|
|
### ✅ Validation Checks
|
|
|
|
1. **Excel Column Validation**
|
|
- Checks for all 8 required columns: `field`, `sub_field`, `year`, `season_start`, `season_end`, `age`, `sub_area`, `tonnage_ha`
|
|
- Identifies extra columns that will be ignored
|
|
- Shows missing columns that must be added
|
|
|
|
2. **GeoJSON Properties Validation**
|
|
- Checks all features have required properties: `field`, `sub_field`
|
|
- Identifies redundant properties that will be ignored
|
|
|
|
3. **Coordinate Reference System (CRS)**
|
|
- Validates correct CRS: **EPSG:32736 (UTM Zone 36S)**
|
|
- This CRS was validated from your Angata farm coordinates
|
|
- Explains why this specific CRS is required
|
|
|
|
4. **Field Name Matching**
|
|
- Compares field names between Excel and GeoJSON
|
|
- Shows which fields exist in only one dataset
|
|
- Highlights misspellings or missing fields
|
|
- Provides complete matching summary table
|
|
|
|
5. **Data Type & Content Validation**
|
|
- Checks column data types:
|
|
- `year`: Must be integer
|
|
- `season_start`, `season_end`: Must be valid dates
|
|
- `age`, `sub_area`, `tonnage_ha`: Must be numeric (decimal)
|
|
- Identifies rows with missing `season_start` dates
|
|
- Flags invalid date formats and numeric values
|
|
|
|
## File Requirements
|
|
|
|
### Excel File (harvest.xlsx)
|
|
```
|
|
| field | sub_field | year | season_start | season_end | age | sub_area | tonnage_ha |
|
|
|----------|------------------|------|--------------|------------|-----|----------|-----------|
|
|
| kowawa | kowawa | 2023 | 2023-01-15 | 2024-01-14 | 1.5 | 45 | 125.5 |
|
|
| Tamu | Tamu Upper | 2023 | 2023-02-01 | 2024-01-31 | 1.0 | 30 | 98.0 |
|
|
```
|
|
|
|
**Data Types:**
|
|
- `field`, `sub_field`: Text (can be numeric as text)
|
|
- `year`: Integer
|
|
- `season_start`, `season_end`: Date (YYYY-MM-DD format)
|
|
- `age`, `sub_area`, `tonnage_ha`: Decimal/Float
|
|
|
|
**Extra columns** are allowed but will not be processed.
|
|
|
|
### GeoJSON File (pivot.geojson)
|
|
|
|
```json
|
|
{
|
|
"type": "FeatureCollection",
|
|
"crs": {
|
|
"type": "name",
|
|
"properties": {
|
|
"name": "urn:ogc:def:crs:EPSG::32736"
|
|
}
|
|
},
|
|
"features": [
|
|
{
|
|
"type": "Feature",
|
|
"properties": {
|
|
"field": "kowawa",
|
|
"sub_field": "kowawa"
|
|
},
|
|
"geometry": {
|
|
"type": "MultiPolygon",
|
|
"coordinates": [...]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Required Properties:**
|
|
- `field`: Field identifier (must match Excel)
|
|
- `sub_field`: Sub-field identifier (must match Excel)
|
|
|
|
**Optional Properties:**
|
|
- `STATUS`, `name`, `age`, etc. - These are allowed but not required
|
|
|
|
**CRS:**
|
|
- Must be EPSG:32736 (UTM Zone 36S)
|
|
- This was determined from analyzing your Angata farm coordinates
|
|
|
|
## Deployment
|
|
|
|
### Local Use (Recommended for Security)
|
|
1. Download the `data_validation_tool` folder
|
|
2. Open `index.html` in a web browser
|
|
3. Files are processed entirely client-side - no data is sent to servers
|
|
|
|
### Netlify Deployment
|
|
1. Connect to your GitHub repository
|
|
2. Set build command: `None`
|
|
3. Set publish directory: `data_validation_tool`
|
|
4. Deploy
|
|
|
|
Or use Netlify CLI:
|
|
```bash
|
|
npm install -g netlify-cli
|
|
netlify deploy --dir data_validation_tool
|
|
```
|
|
|
|
### Manual Testing
|
|
1. Use the provided sample files:
|
|
- Excel: `laravel_app/storage/app/aura/Data/harvest.xlsx`
|
|
- GeoJSON: `laravel_app/storage/app/aura/Data/pivot.geojson`
|
|
2. Open `index.html`
|
|
3. Upload both files
|
|
4. Review validation results
|
|
|
|
## Technical Details
|
|
|
|
### Browser Requirements
|
|
- Modern browser with ES6 support (Chrome, Firefox, Safari, Edge)
|
|
- Must support FileReader API and JSON parsing
|
|
- Requires XLSX library for Excel parsing
|
|
|
|
### Dependencies
|
|
- **XLSX.js**: For reading Excel files (loaded via CDN in index.html)
|
|
|
|
### What Happens When You Upload
|
|
1. File is read into memory (client-side only)
|
|
2. Excel: Parsed using XLSX library into JSON
|
|
3. GeoJSON: Parsed directly as JSON
|
|
4. All validation runs in your browser
|
|
5. Results displayed locally
|
|
6. **No files are sent to any server**
|
|
|
|
## Validation Rules
|
|
|
|
### Traffic Light Logic
|
|
|
|
**All GREEN (✓ Passed)**
|
|
- All required columns/properties present
|
|
- Correct CRS
|
|
- All field names match
|
|
- All data types valid
|
|
|
|
**YELLOW (⚠️ Warnings)**
|
|
- Extra columns detected (will be ignored)
|
|
- Extra properties detected (will be ignored)
|
|
- Missing dates in some fields
|
|
- Data type issues in specific rows
|
|
|
|
**RED (✗ Failed)**
|
|
- Missing required columns/properties
|
|
- Wrong CRS
|
|
- Field names mismatch between files
|
|
- Fundamental data structure issues
|
|
|
|
### CRS Explanation
|
|
|
|
From your project's geospatial analysis:
|
|
- **Original issue**: Angata farm GeoJSON had coordinates in UTM Zone 37S but marked as WGS84
|
|
- **Root cause**: UTM Zone mismatch - farm is actually in UTM Zone 36S
|
|
- **Solution**: Reproject to EPSG:32736 (UTM Zone 36S)
|
|
- **Why**: This aligns with actual Angata farm coordinates (longitude ~34.4°E)
|
|
|
|
## Troubleshooting
|
|
|
|
### "Failed to read Excel file"
|
|
- Ensure file is `.xlsx` format
|
|
- File should not be open in Excel while uploading
|
|
- Try saving as Excel 2007+ format
|
|
|
|
### "Failed to parse GeoJSON"
|
|
- Ensure file is valid JSON
|
|
- Check for syntax errors (extra commas, missing brackets)
|
|
- Use online JSON validator at jsonlint.com
|
|
|
|
### "Wrong CRS detected"
|
|
- GeoJSON must explicitly state CRS as EPSG:32736
|
|
- Example: `"name": "urn:ogc:def:crs:EPSG::32736"`
|
|
- Reproject in QGIS or R if needed
|
|
|
|
### "Field names don't match"
|
|
- Check for typos and capitalization differences
|
|
- Spaces at beginning/end of field names
|
|
- Use field names exactly as they appear in both files
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Download validation report as PDF
|
|
- [ ] Batch upload multiple Excel/GeoJSON pairs
|
|
- [ ] Auto-detect and suggest field mappings
|
|
- [ ] Geometry validity checks (self-intersecting polygons)
|
|
- [ ] Area comparison between Excel and GeoJSON
|
|
- [ ] Export cleaned/standardized files
|
|
|
|
## Support
|
|
|
|
For questions about data validation requirements, contact the SmartCane team.
|
|
|
|
---
|
|
|
|
**Tool Version**: 1.0
|
|
**Last Updated**: December 2025
|
|
**CRS Reference**: EPSG:32736 (UTM Zone 36S)
|