SmartCane/data_validation_tool
2026-01-06 14:17:37 +01:00
..
index.html commit all stuff 2026-01-06 14:17:37 +01:00
README.md commit all stuff 2026-01-06 14:17:37 +01:00
validator.js commit all stuff 2026-01-06 14:17:37 +01:00

SmartCane Data Validation Tool

A standalone, client-side data validation tool for validating Excel harvest data and GeoJSON field boundaries before uploading to the SmartCane system.

Features

🚦 Traffic Light System

  • 🟢 GREEN: All checks passed
  • 🟡 YELLOW: Warnings detected (non-critical issues)
  • 🔴 RED: Errors detected (blocking issues)

Validation Checks

  1. Excel Column Validation

    • Checks for all 8 required columns: field, sub_field, year, season_start, season_end, age, sub_area, tonnage_ha
    • Identifies extra columns that will be ignored
    • Shows missing columns that must be added
  2. GeoJSON Properties Validation

    • Checks all features have required properties: field, sub_field
    • Identifies redundant properties that will be ignored
  3. Coordinate Reference System (CRS)

    • Validates correct CRS: EPSG:32736 (UTM Zone 36S)
    • This CRS was validated from your Angata farm coordinates
    • Explains why this specific CRS is required
  4. Field Name Matching

    • Compares field names between Excel and GeoJSON
    • Shows which fields exist in only one dataset
    • Highlights misspellings or missing fields
    • Provides complete matching summary table
  5. Data Type & Content Validation

    • Checks column data types:
      • year: Must be integer
      • season_start, season_end: Must be valid dates
      • age, sub_area, tonnage_ha: Must be numeric (decimal)
    • Identifies rows with missing season_start dates
    • Flags invalid date formats and numeric values

File Requirements

Excel File (harvest.xlsx)

| field    | sub_field        | year | season_start | season_end | age | sub_area | tonnage_ha |
|----------|------------------|------|--------------|------------|-----|----------|-----------|
| kowawa   | kowawa           | 2023 | 2023-01-15   | 2024-01-14 | 1.5 | 45       | 125.5     |
| Tamu     | Tamu Upper       | 2023 | 2023-02-01   | 2024-01-31 | 1.0 | 30       | 98.0      |

Data Types:

  • field, sub_field: Text (can be numeric as text)
  • year: Integer
  • season_start, season_end: Date (YYYY-MM-DD format)
  • age, sub_area, tonnage_ha: Decimal/Float

Extra columns are allowed but will not be processed.

GeoJSON File (pivot.geojson)

{
  "type": "FeatureCollection",
  "crs": { 
    "type": "name", 
    "properties": { 
      "name": "urn:ogc:def:crs:EPSG::32736" 
    } 
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "field": "kowawa",
        "sub_field": "kowawa"
      },
      "geometry": {
        "type": "MultiPolygon",
        "coordinates": [...]
      }
    }
  ]
}

Required Properties:

  • field: Field identifier (must match Excel)
  • sub_field: Sub-field identifier (must match Excel)

Optional Properties:

  • STATUS, name, age, etc. - These are allowed but not required

CRS:

  • Must be EPSG:32736 (UTM Zone 36S)
  • This was determined from analyzing your Angata farm coordinates

Deployment

  1. Download the data_validation_tool folder
  2. Open index.html in a web browser
  3. Files are processed entirely client-side - no data is sent to servers

Netlify Deployment

  1. Connect to your GitHub repository
  2. Set build command: None
  3. Set publish directory: data_validation_tool
  4. Deploy

Or use Netlify CLI:

npm install -g netlify-cli
netlify deploy --dir data_validation_tool

Manual Testing

  1. Use the provided sample files:
    • Excel: laravel_app/storage/app/aura/Data/harvest.xlsx
    • GeoJSON: laravel_app/storage/app/aura/Data/pivot.geojson
  2. Open index.html
  3. Upload both files
  4. Review validation results

Technical Details

Browser Requirements

  • Modern browser with ES6 support (Chrome, Firefox, Safari, Edge)
  • Must support FileReader API and JSON parsing
  • Requires XLSX library for Excel parsing

Dependencies

  • XLSX.js: For reading Excel files (loaded via CDN in index.html)

What Happens When You Upload

  1. File is read into memory (client-side only)
  2. Excel: Parsed using XLSX library into JSON
  3. GeoJSON: Parsed directly as JSON
  4. All validation runs in your browser
  5. Results displayed locally
  6. No files are sent to any server

Validation Rules

Traffic Light Logic

All GREEN (✓ Passed)

  • All required columns/properties present
  • Correct CRS
  • All field names match
  • All data types valid

YELLOW (⚠️ Warnings)

  • Extra columns detected (will be ignored)
  • Extra properties detected (will be ignored)
  • Missing dates in some fields
  • Data type issues in specific rows

RED (✗ Failed)

  • Missing required columns/properties
  • Wrong CRS
  • Field names mismatch between files
  • Fundamental data structure issues

CRS Explanation

From your project's geospatial analysis:

  • Original issue: Angata farm GeoJSON had coordinates in UTM Zone 37S but marked as WGS84
  • Root cause: UTM Zone mismatch - farm is actually in UTM Zone 36S
  • Solution: Reproject to EPSG:32736 (UTM Zone 36S)
  • Why: This aligns with actual Angata farm coordinates (longitude ~34.4°E)

Troubleshooting

"Failed to read Excel file"

  • Ensure file is .xlsx format
  • File should not be open in Excel while uploading
  • Try saving as Excel 2007+ format

"Failed to parse GeoJSON"

  • Ensure file is valid JSON
  • Check for syntax errors (extra commas, missing brackets)
  • Use online JSON validator at jsonlint.com

"Wrong CRS detected"

  • GeoJSON must explicitly state CRS as EPSG:32736
  • Example: "name": "urn:ogc:def:crs:EPSG::32736"
  • Reproject in QGIS or R if needed

"Field names don't match"

  • Check for typos and capitalization differences
  • Spaces at beginning/end of field names
  • Use field names exactly as they appear in both files

Future Enhancements

  • Download validation report as PDF
  • Batch upload multiple Excel/GeoJSON pairs
  • Auto-detect and suggest field mappings
  • Geometry validity checks (self-intersecting polygons)
  • Area comparison between Excel and GeoJSON
  • Export cleaned/standardized files

Support

For questions about data validation requirements, contact the SmartCane team.


Tool Version: 1.0
Last Updated: December 2025
CRS Reference: EPSG:32736 (UTM Zone 36S)