Data Cleaning

Fix Messy CSV Files
in One Command

You got a CSV from a client, a government portal, or a legacy system. It's broken. CSV Cleaner makes it not broken.

Messy Data Is Universal

Every data professional has wasted hours on files that should just work. Mixed encodings, rogue delimiters, phantom duplicates, trailing whitespace that breaks joins.

Encoding Detection

Auto-detects Shift-JIS, Latin-1, UTF-16, and 20+ other encodings. Converts everything to clean UTF-8.

Duplicate Removal

Finds exact and fuzzy duplicates. Choose to keep first, last, or flag for review.

Delimiter Fixing

Handles mixed tabs, semicolons, and pipes. Normalizes to your preferred delimiter.

Field Repair

Fixes unescaped quotes, mismatched columns, and line breaks inside fields.

Type Inference

Detects dates, numbers, booleans, and currencies. Standardizes formats across the file.

Batch Mode

Point it at a directory. Clean 500 files with the same command. Ideal for pipelines.

Before & After

One command transforms this mess into clean, analysis-ready data.

Before: raw_data.csv
name,email,joined,revenue
"John Doe",john@example.com,2024/01/15,$1,200
Jane Smith;jane@example.com;01-15-2024;1200
"John Doe",john@example.com ,2024/01/15,"$1,200"
Bob,,2024-1-15,
"Alice ""Wonder""land",alice@co.jp,15/01/2024,¥98000
After: raw_data_cleaned.csv
name,email,joined,revenue
John Doe,john@example.com,2024-01-15,1200.00
Jane Smith,jane@example.com,2024-01-15,1200.00
Bob,,2024-01-15,
Alice Wonderland,alice@co.jp,2024-01-15,98000.00

Removed 1 duplicate, fixed 4 issues
$ pip install csv-cleaner
$ csv-cleaner fix raw_data.csv
Detecting encoding... UTF-8 (confidence: 98%)
Scanned 5 rows, 4 columns
Fixed: 1 duplicate, 1 delimiter mismatch, 2 date formats, 2 currency formats
Saved: raw_data_cleaned.csv

Free vs Pro

The free version handles 90% of use cases. Pro adds batch processing, custom rules, and advanced repair.

FeatureFreePro ($19)
Encoding detection & conversionYesYes
Duplicate removalYesYes
Delimiter normalizationYesYes
Field repair (quotes, line breaks)YesYes
Date/number standardizationYesYes
Dry-run / diff previewYesYes
Batch directory processingYes
Custom rule definitions (YAML)Yes
Fuzzy duplicate detectionYes
Schema validation & enforcementYes
JSON / Parquet outputYes
CI/CD integration (exit codes)Yes
Priority supportYes

Stop wasting time on broken CSVs

Install in 10 seconds. Clean your first file in 20.