# SmartCane Data Validation Tool A standalone, client-side data validation tool for validating Excel harvest data and GeoJSON field boundaries before uploading to the SmartCane system. ## Features ### 🚦 Traffic Light System - **🟢 GREEN**: All checks passed - **🟡 YELLOW**: Warnings detected (non-critical issues) - **🔴 RED**: Errors detected (blocking issues) ### ✅ Validation Checks 1. **Excel Column Validation** - Checks for all 8 required columns: `field`, `sub_field`, `year`, `season_start`, `season_end`, `age`, `sub_area`, `tonnage_ha` - Identifies extra columns that will be ignored - Shows missing columns that must be added 2. **GeoJSON Properties Validation** - Checks all features have required properties: `field`, `sub_field` - Identifies redundant properties that will be ignored 3. **Coordinate Reference System (CRS)** - Validates correct CRS: **EPSG:32736 (UTM Zone 36S)** - This CRS was validated from your Angata farm coordinates - Explains why this specific CRS is required 4. **Field Name Matching** - Compares field names between Excel and GeoJSON - Shows which fields exist in only one dataset - Highlights misspellings or missing fields - Provides complete matching summary table 5. **Data Type & Content Validation** - Checks column data types: - `year`: Must be integer - `season_start`, `season_end`: Must be valid dates - `age`, `sub_area`, `tonnage_ha`: Must be numeric (decimal) - Identifies rows with missing `season_start` dates - Flags invalid date formats and numeric values ## File Requirements ### Excel File (harvest.xlsx) ``` | field | sub_field | year | season_start | season_end | age | sub_area | tonnage_ha | |----------|------------------|------|--------------|------------|-----|----------|-----------| | kowawa | kowawa | 2023 | 2023-01-15 | 2024-01-14 | 1.5 | 45 | 125.5 | | Tamu | Tamu Upper | 2023 | 2023-02-01 | 2024-01-31 | 1.0 | 30 | 98.0 | ``` **Data Types:** - `field`, `sub_field`: Text (can be numeric as text) - `year`: Integer - `season_start`, `season_end`: Date (YYYY-MM-DD format) - `age`, `sub_area`, `tonnage_ha`: Decimal/Float **Extra columns** are allowed but will not be processed. ### GeoJSON File (pivot.geojson) ```json { "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::32736" } }, "features": [ { "type": "Feature", "properties": { "field": "kowawa", "sub_field": "kowawa" }, "geometry": { "type": "MultiPolygon", "coordinates": [...] } } ] } ``` **Required Properties:** - `field`: Field identifier (must match Excel) - `sub_field`: Sub-field identifier (must match Excel) **Optional Properties:** - `STATUS`, `name`, `age`, etc. - These are allowed but not required **CRS:** - Must be EPSG:32736 (UTM Zone 36S) - This was determined from analyzing your Angata farm coordinates ## Deployment ### Local Use (Recommended for Security) 1. Download the `data_validation_tool` folder 2. Open `index.html` in a web browser 3. Files are processed entirely client-side - no data is sent to servers ### Netlify Deployment 1. Connect to your GitHub repository 2. Set build command: `None` 3. Set publish directory: `data_validation_tool` 4. Deploy Or use Netlify CLI: ```bash npm install -g netlify-cli netlify deploy --dir data_validation_tool ``` ### Manual Testing 1. Use the provided sample files: - Excel: `laravel_app/storage/app/aura/Data/harvest.xlsx` - GeoJSON: `laravel_app/storage/app/aura/Data/pivot.geojson` 2. Open `index.html` 3. Upload both files 4. Review validation results ## Technical Details ### Browser Requirements - Modern browser with ES6 support (Chrome, Firefox, Safari, Edge) - Must support FileReader API and JSON parsing - Requires XLSX library for Excel parsing ### Dependencies - **XLSX.js**: For reading Excel files (loaded via CDN in index.html) ### What Happens When You Upload 1. File is read into memory (client-side only) 2. Excel: Parsed using XLSX library into JSON 3. GeoJSON: Parsed directly as JSON 4. All validation runs in your browser 5. Results displayed locally 6. **No files are sent to any server** ## Validation Rules ### Traffic Light Logic **All GREEN (✓ Passed)** - All required columns/properties present - Correct CRS - All field names match - All data types valid **YELLOW (⚠️ Warnings)** - Extra columns detected (will be ignored) - Extra properties detected (will be ignored) - Missing dates in some fields - Data type issues in specific rows **RED (✗ Failed)** - Missing required columns/properties - Wrong CRS - Field names mismatch between files - Fundamental data structure issues ### CRS Explanation From your project's geospatial analysis: - **Original issue**: Angata farm GeoJSON had coordinates in UTM Zone 37S but marked as WGS84 - **Root cause**: UTM Zone mismatch - farm is actually in UTM Zone 36S - **Solution**: Reproject to EPSG:32736 (UTM Zone 36S) - **Why**: This aligns with actual Angata farm coordinates (longitude ~34.4°E) ## Troubleshooting ### "Failed to read Excel file" - Ensure file is `.xlsx` format - File should not be open in Excel while uploading - Try saving as Excel 2007+ format ### "Failed to parse GeoJSON" - Ensure file is valid JSON - Check for syntax errors (extra commas, missing brackets) - Use online JSON validator at jsonlint.com ### "Wrong CRS detected" - GeoJSON must explicitly state CRS as EPSG:32736 - Example: `"name": "urn:ogc:def:crs:EPSG::32736"` - Reproject in QGIS or R if needed ### "Field names don't match" - Check for typos and capitalization differences - Spaces at beginning/end of field names - Use field names exactly as they appear in both files ## Future Enhancements - [ ] Download validation report as PDF - [ ] Batch upload multiple Excel/GeoJSON pairs - [ ] Auto-detect and suggest field mappings - [ ] Geometry validity checks (self-intersecting polygons) - [ ] Area comparison between Excel and GeoJSON - [ ] Export cleaned/standardized files ## Support For questions about data validation requirements, contact the SmartCane team. --- **Tool Version**: 1.0 **Last Updated**: December 2025 **CRS Reference**: EPSG:32736 (UTM Zone 36S)