Data Cleaning

Fix Messy CSV Files
in One Command

You got a CSV from a client, a government portal, or a legacy system. It's broken. CSV Cleaner makes it not broken.

Messy Data Is Universal

Every data professional has wasted hours on files that should just work. Mixed encodings, rogue delimiters, phantom duplicates, trailing whitespace that breaks joins.

Encoding Detection

Auto-detects Shift-JIS, Latin-1, UTF-16, and 20+ other encodings. Converts everything to clean UTF-8.

Duplicate Removal

Finds exact and fuzzy duplicates. Choose to keep first, last, or flag for review.

Delimiter Fixing

Handles mixed tabs, semicolons, and pipes. Normalizes to your preferred delimiter.

Field Repair

Fixes unescaped quotes, mismatched columns, and line breaks inside fields.

Type Inference

Detects dates, numbers, booleans, and currencies. Standardizes formats across the file.

Batch Mode

Point it at a directory. Clean 500 files with the same command. Ideal for pipelines.

Before & After

One command transforms this mess into clean, analysis-ready data.

Before: raw_data.csv

name,email,joined,revenue
"John Doe",john@example.com,2024/01/15,$1,200
Jane Smith;jane@example.com;01-15-2024;1200
"John Doe",john@example.com ,2024/01/15,"$1,200"
Bob,,2024-1-15,
"Alice ""Wonder""land",alice@co.jp,15/01/2024,¥98000

After: raw_data_cleaned.csv

name,email,joined,revenue
John Doe,john@example.com,2024-01-15,1200.00
Jane Smith,jane@example.com,2024-01-15,1200.00
Bob,,2024-01-15,
Alice Wonderland,alice@co.jp,2024-01-15,98000.00

Removed 1 duplicate, fixed 4 issues

      $ pip install csv-cleaner

      $ csv-cleaner fix raw_data.csv

      Detecting encoding... UTF-8 (confidence: 98%)

      Scanned 5 rows, 4 columns

      Fixed: 1 duplicate, 1 delimiter mismatch, 2 date formats, 2 currency formats

      Saved: raw_data_cleaned.csv

Free vs Pro

The free version handles 90% of use cases. Pro adds batch processing, custom rules, and advanced repair.

Feature	Free	Pro ($19)
Encoding detection & conversion	Yes	Yes
Duplicate removal	Yes	Yes
Delimiter normalization	Yes	Yes
Field repair (quotes, line breaks)	Yes	Yes
Date/number standardization	Yes	Yes
Dry-run / diff preview	Yes	Yes
Batch directory processing	—	Yes
Custom rule definitions (YAML)	—	Yes
Fuzzy duplicate detection	—	Yes
Schema validation & enforcement	—	Yes
JSON / Parquet output	—	Yes
CI/CD integration (exit codes)	—	Yes
Priority support	—	Yes

Fix Messy CSV Filesin One Command