6.2 KiB
6.2 KiB
SmartCane Data Validation Tool
A standalone, client-side data validation tool for validating Excel harvest data and GeoJSON field boundaries before uploading to the SmartCane system.
Features
🚦 Traffic Light System
- 🟢 GREEN: All checks passed
- 🟡 YELLOW: Warnings detected (non-critical issues)
- 🔴 RED: Errors detected (blocking issues)
✅ Validation Checks
-
Excel Column Validation
- Checks for all 8 required columns:
field,sub_field,year,season_start,season_end,age,sub_area,tonnage_ha - Identifies extra columns that will be ignored
- Shows missing columns that must be added
- Checks for all 8 required columns:
-
GeoJSON Properties Validation
- Checks all features have required properties:
field,sub_field - Identifies redundant properties that will be ignored
- Checks all features have required properties:
-
Coordinate Reference System (CRS)
- Validates correct CRS: EPSG:32736 (UTM Zone 36S)
- This CRS was validated from your Angata farm coordinates
- Explains why this specific CRS is required
-
Field Name Matching
- Compares field names between Excel and GeoJSON
- Shows which fields exist in only one dataset
- Highlights misspellings or missing fields
- Provides complete matching summary table
-
Data Type & Content Validation
- Checks column data types:
year: Must be integerseason_start,season_end: Must be valid datesage,sub_area,tonnage_ha: Must be numeric (decimal)
- Identifies rows with missing
season_startdates - Flags invalid date formats and numeric values
- Checks column data types:
File Requirements
Excel File (harvest.xlsx)
| field | sub_field | year | season_start | season_end | age | sub_area | tonnage_ha |
|----------|------------------|------|--------------|------------|-----|----------|-----------|
| kowawa | kowawa | 2023 | 2023-01-15 | 2024-01-14 | 1.5 | 45 | 125.5 |
| Tamu | Tamu Upper | 2023 | 2023-02-01 | 2024-01-31 | 1.0 | 30 | 98.0 |
Data Types:
field,sub_field: Text (can be numeric as text)year: Integerseason_start,season_end: Date (YYYY-MM-DD format)age,sub_area,tonnage_ha: Decimal/Float
Extra columns are allowed but will not be processed.
GeoJSON File (pivot.geojson)
{
"type": "FeatureCollection",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::32736"
}
},
"features": [
{
"type": "Feature",
"properties": {
"field": "kowawa",
"sub_field": "kowawa"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [...]
}
}
]
}
Required Properties:
field: Field identifier (must match Excel)sub_field: Sub-field identifier (must match Excel)
Optional Properties:
STATUS,name,age, etc. - These are allowed but not required
CRS:
- Must be EPSG:32736 (UTM Zone 36S)
- This was determined from analyzing your Angata farm coordinates
Deployment
Local Use (Recommended for Security)
- Download the
data_validation_toolfolder - Open
index.htmlin a web browser - Files are processed entirely client-side - no data is sent to servers
Netlify Deployment
- Connect to your GitHub repository
- Set build command:
None - Set publish directory:
data_validation_tool - Deploy
Or use Netlify CLI:
npm install -g netlify-cli
netlify deploy --dir data_validation_tool
Manual Testing
- Use the provided sample files:
- Excel:
laravel_app/storage/app/aura/Data/harvest.xlsx - GeoJSON:
laravel_app/storage/app/aura/Data/pivot.geojson
- Excel:
- Open
index.html - Upload both files
- Review validation results
Technical Details
Browser Requirements
- Modern browser with ES6 support (Chrome, Firefox, Safari, Edge)
- Must support FileReader API and JSON parsing
- Requires XLSX library for Excel parsing
Dependencies
- XLSX.js: For reading Excel files (loaded via CDN in index.html)
What Happens When You Upload
- File is read into memory (client-side only)
- Excel: Parsed using XLSX library into JSON
- GeoJSON: Parsed directly as JSON
- All validation runs in your browser
- Results displayed locally
- No files are sent to any server
Validation Rules
Traffic Light Logic
All GREEN (✓ Passed)
- All required columns/properties present
- Correct CRS
- All field names match
- All data types valid
YELLOW (⚠️ Warnings)
- Extra columns detected (will be ignored)
- Extra properties detected (will be ignored)
- Missing dates in some fields
- Data type issues in specific rows
RED (✗ Failed)
- Missing required columns/properties
- Wrong CRS
- Field names mismatch between files
- Fundamental data structure issues
CRS Explanation
From your project's geospatial analysis:
- Original issue: Angata farm GeoJSON had coordinates in UTM Zone 37S but marked as WGS84
- Root cause: UTM Zone mismatch - farm is actually in UTM Zone 36S
- Solution: Reproject to EPSG:32736 (UTM Zone 36S)
- Why: This aligns with actual Angata farm coordinates (longitude ~34.4°E)
Troubleshooting
"Failed to read Excel file"
- Ensure file is
.xlsxformat - File should not be open in Excel while uploading
- Try saving as Excel 2007+ format
"Failed to parse GeoJSON"
- Ensure file is valid JSON
- Check for syntax errors (extra commas, missing brackets)
- Use online JSON validator at jsonlint.com
"Wrong CRS detected"
- GeoJSON must explicitly state CRS as EPSG:32736
- Example:
"name": "urn:ogc:def:crs:EPSG::32736" - Reproject in QGIS or R if needed
"Field names don't match"
- Check for typos and capitalization differences
- Spaces at beginning/end of field names
- Use field names exactly as they appear in both files
Future Enhancements
- Download validation report as PDF
- Batch upload multiple Excel/GeoJSON pairs
- Auto-detect and suggest field mappings
- Geometry validity checks (self-intersecting polygons)
- Area comparison between Excel and GeoJSON
- Export cleaned/standardized files
Support
For questions about data validation requirements, contact the SmartCane team.
Tool Version: 1.0
Last Updated: December 2025
CRS Reference: EPSG:32736 (UTM Zone 36S)