Groovy CSV parsing and generation with 10+ examples. Split, tokenize, OpenCSV, read/write CSV files, handle headers and special chars on Groovy 5.x.
“CSV files are the cockroaches of data formats – they survive everything and show up everywhere. Groovy gives you clean ways to deal with them.”
Rob Pike, Notes on Programming in C
Last Updated: March 2026 | Tested on: Groovy 5.x, Java 17+ | Difficulty: Beginner to Intermediate | Reading Time: 22 minutes
CSV files are everywhere – data exports, spreadsheet dumps, log files, database migrations, configuration data. Despite being one of the oldest data formats around, CSV remains one of the most common ways to exchange tabular data. And Groovy makes working with CSV surprisingly pleasant.
This Groovy CSV guide covers everything from simple split() and tokenize() parsing to reliable solutions with the OpenCSV library. We’ll walk through reading CSV files, writing CSV output, handling headers, dealing with quoted fields and special characters, and converting CSV data to maps – all with tested examples.
If you’re also working with JSON data, check out our Groovy JSON Parsing with JsonSlurper guide. And for general file I/O patterns, see our Groovy File I/O Tutorial.
Table of Contents
What Is CSV Processing in Groovy?
CSV (Comma-Separated Values) processing means reading, parsing, transforming, and writing tabular data stored in plain text files where each line is a row and commas separate the columns. Groovy offers two approaches: manual parsing with built-in string methods, and library-based parsing with OpenCSV for complex cases.
According to the Groovy Development Kit documentation, Groovy enhances Java’s I/O classes with methods like File.text, eachLine(), and splitEachLine() that make file processing concise.
Key Points:
- Simple CSV can be parsed with
split(',')ortokenize(',') File.splitEachLine()is a built-in Groovy method for line-by-line CSV parsing- For quoted fields, embedded commas, and newlines in values, use OpenCSV
- Groovy’s collection methods (
collect(),groupBy(),findAll()) work beautifully with parsed CSV data - Writing CSV is simple with
join(',')and file I/O methods - OpenCSV handles edge cases: quoted fields, escaped quotes, different delimiters, and encodings
Why Use Groovy for CSV?
- Built-in file methods –
splitEachLine(),eachLine(),readLines()handle file I/O natively - Collection power – filter, transform, group, and aggregate CSV data with one-liners
- No boilerplate – compare Groovy’s 5-line CSV reader to Java’s 20+ lines
- Map conversion – easily convert CSV rows to maps using headers as keys
- Library support – OpenCSV integrates directly via
@Grab
Basic Syntax
The simplest way to parse CSV in Groovy is with string splitting:
Basic CSV Parsing
// Method 1: split() - returns an array
def line = 'Alice,30,Engineering'
def fields = line.split(',')
println "Name: ${fields[0]}, Age: ${fields[1]}, Dept: ${fields[2]}"
// Method 2: tokenize() - returns a list
def fieldList = line.tokenize(',')
println "Name: ${fieldList[0]}, Age: ${fieldList[1]}, Dept: ${fieldList[2]}"
// Method 3: splitEachLine() - built into File
// new File('data.csv').splitEachLine(',') { fields ->
// println fields
// }
The key difference: split() returns a String[] array and keeps empty fields, while tokenize() returns a List and drops empty tokens. For CSV, split() is usually the better choice.
| Method | Returns | Empty Fields | Best For |
|---|---|---|---|
split(',') | String[] | Preserved | Most CSV parsing |
tokenize(',') | List | Dropped | Simple delimited data |
splitEachLine() | via closure | Preserved | File-based CSV reading |
| OpenCSV | String[] | Preserved | Complex CSV with quotes |
10+ Practical Examples
Example 1: Parse a Simple CSV String
What we’re doing: Splitting CSV lines into fields and processing them.
Example 1: Simple CSV Parsing
def csvData = '''Name,Age,Department,Salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000
Diana,28,Sales,68000
Eve,32,Engineering,102000'''
def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def rows = lines[1..-1].collect { it.split(',') }
println "Headers: ${headers.toList()}"
println "Rows: ${rows.size()}"
println ""
rows.each { row ->
println "${row[0].padRight(10)} | ${row[1]} | ${row[2].padRight(12)} | \$${row[3]}"
}
Output
Headers: [Name, Age, Department, Salary] Rows: 5 Alice | 30 | Engineering | $95000 Bob | 25 | Marketing | $72000 Charlie | 35 | Engineering | $88000 Diana | 28 | Sales | $68000 Eve | 32 | Engineering | $102000
What happened here: We split the CSV text into lines, extracted the header row, and processed data rows separately. The lines[1..-1] range skips the header. Each row is split by comma into an array of fields.
Example 2: Convert CSV Rows to Maps
What we’re doing: Mapping header names to row values so you can access fields by name instead of index.
Example 2: CSV to Maps
def csvData = '''Name,Age,Department,Salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000'''
def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
// Convert each row to a map using headers as keys
def records = lines[1..-1].collect { line ->
def values = line.split(',')
[headers, values].transpose().collectEntries()
}
// Now access by name instead of index!
records.each { record ->
println "${record.Name} works in ${record.Department} and earns \$${record.Salary}"
}
println "\n--- Map structure ---"
println records[0]
// Filter and aggregate using map keys
def engineers = records.findAll { it.Department == 'Engineering' }
println "\nEngineers: ${engineers*.Name}"
def avgSalary = records.collect { it.Salary as int }.sum() / records.size()
println "Average salary: \$${avgSalary.round(0)}"
Output
Alice works in Engineering and earns $95000 Bob works in Marketing and earns $72000 Charlie works in Engineering and earns $88000 --- Map structure --- [Name:Alice, Age:30, Department:Engineering, Salary:95000] Engineers: [Alice, Charlie] Average salary: $85000
What happened here: The transpose() method pairs up headers with values, and collectEntries() converts those pairs into a map. This is the most useful pattern for CSV – once rows are maps, you access fields by name and use all of Groovy’s collection methods.
Example 3: Read CSV from a File
What we’re doing: Reading a CSV file using Groovy’s built-in file methods.
Example 3: Read CSV File
// Create a sample CSV file
def csvContent = '''id,product,price,quantity
1,Laptop,999.99,10
2,Phone,699.99,25
3,Tablet,449.99,15
4,Monitor,349.99,8
5,Keyboard,79.99,50'''
def tempFile = File.createTempFile('products', '.csv')
tempFile.text = csvContent
tempFile.deleteOnExit()
// Method 1: splitEachLine (Groovy built-in)
println "--- splitEachLine ---"
def firstRow = true
tempFile.splitEachLine(',') { fields ->
if (firstRow) {
firstRow = false // Skip header
return
}
println "Product: ${fields[1]}, Price: \$${fields[2]}, Qty: ${fields[3]}"
}
// Method 2: readLines + collect for full control
println "\n--- readLines + collect ---"
def allLines = tempFile.readLines()
def headers = allLines[0].split(',')
def products = allLines[1..-1].collect { line ->
def vals = line.split(',')
[headers, vals].transpose().collectEntries()
}
// Calculate total inventory value
def totalValue = products.sum { (it.price as BigDecimal) * (it.quantity as int) }
println "Total inventory value: \$${totalValue}"
// Most expensive product
def priciest = products.max { it.price as BigDecimal }
println "Most expensive: ${priciest.product} (\$${priciest.price})"
Output
--- splitEachLine --- Product: Laptop, Price: $999.99, Qty: 10 Product: Phone, Price: $699.99, Qty: 25 Product: Tablet, Price: $449.99, Qty: 15 Product: Monitor, Price: $349.99, Qty: 8 Product: Keyboard, Price: $79.99, Qty: 50 --- readLines + collect --- Total inventory value: $41048.92 Most expensive: Laptop ($999.99)
What happened here: splitEachLine(delimiter) is a Groovy GDK method on File that splits each line by the delimiter and passes the result to a closure – perfect for simple CSV. For more control, readLines() gives you all lines as a list that you can process with collection methods.
Example 4: Write CSV to a File
What we’re doing: Generating CSV output from Groovy data structures and writing to a file.
Example 4: Write CSV
// Data to write
def employees = [
[name: 'Alice', age: 30, dept: 'Engineering', salary: 95000],
[name: 'Bob', age: 25, dept: 'Marketing', salary: 72000],
[name: 'Charlie', age: 35, dept: 'Engineering', salary: 88000],
[name: 'Diana', age: 28, dept: 'Sales', salary: 68000]
]
def tempFile = File.createTempFile('employees', '.csv')
tempFile.deleteOnExit()
// Write CSV with headers
def headers = ['name', 'age', 'dept', 'salary']
tempFile.withWriter('UTF-8') { writer ->
// Write header row
writer.writeLine(headers.join(','))
// Write data rows
employees.each { emp ->
def row = headers.collect { emp[it] }
writer.writeLine(row.join(','))
}
}
// Verify the output
println "Written CSV:"
println tempFile.text
// Alternative: build CSV string first
def csvString = new StringBuilder()
csvString.append(headers.join(',') + '\n')
employees.each { emp ->
csvString.append(headers.collect { emp[it] }.join(',') + '\n')
}
println "String CSV:\n${csvString}"
Output
Written CSV: name,age,dept,salary Alice,30,Engineering,95000 Bob,25,Marketing,72000 Charlie,35,Engineering,88000 Diana,28,Sales,68000 String CSV: name,age,dept,salary Alice,30,Engineering,95000 Bob,25,Marketing,72000 Charlie,35,Engineering,88000 Diana,28,Sales,68000
What happened here: Writing CSV is simple – join(',') converts a list of values into a comma-separated line. The withWriter block ensures the file is properly closed. We use the same header list to maintain column order consistency.
Example 5: Handle Quoted CSV Fields
What we’re doing: Parsing CSV where values contain commas, quotes, and newlines – which require quoting.
Example 5: Quoted Fields
// Simple CSV quoting function
def parseQuotedCsv(String line) {
def fields = []
def current = new StringBuilder()
boolean inQuotes = false
for (int i = 0; i < line.length(); i++) {
char c = line.charAt(i)
if (c == '"' as char) {
if (inQuotes && i + 1 < line.length() && line.charAt(i + 1) == '"' as char) {
current.append('"') // Escaped quote
i++
} else {
inQuotes = !inQuotes
}
} else if (c == ',' as char && !inQuotes) {
fields.add(current.toString())
current = new StringBuilder()
} else {
current.append(c)
}
}
fields.add(current.toString())
return fields
}
// Test with tricky data
def testLines = [
'Alice,30,Engineering', // Simple
'"Smith, John",35,"New York, NY"', // Commas in values
'"She said ""hello""",40,Sales', // Escaped quotes
'Bob,,Marketing' // Empty field
]
testLines.each { line ->
def fields = parseQuotedCsv(line)
println "Input: ${line}"
println "Fields: ${fields}"
println ""
}
// Quoting function for writing
def quoteCsvField(String value) {
if (value.contains(',') || value.contains('"') || value.contains('\n')) {
return '"' + value.replace('"', '""') + '"'
}
return value
}
println "Quoted: ${quoteCsvField('Smith, John')}"
println "Quoted: ${quoteCsvField('She said "hello"')}"
println "Quoted: ${quoteCsvField('Simple')}"
Output
Input: Alice,30,Engineering Fields: [Alice, 30, Engineering] Input: "Smith, John",35,"New York, NY" Fields: [Smith, John, 35, New York, NY] Input: "She said ""hello""",40,Sales Fields: [She said "hello", 40, Sales] Input: Bob,,Marketing Fields: [Bob, , Marketing] Quoted: "Smith, John" Quoted: "She said ""hello""" Quoted: Simple
What happened here: Real-world CSV often contains commas inside values (like “New York, NY”) and quotes inside values (escaped as ""). The parseQuotedCsv() function handles these cases by tracking whether we’re inside quotes. For production code, use OpenCSV instead of rolling your own parser.
Example 6: CSV with Different Delimiters
What we’re doing: Parsing files that use tab, semicolon, or pipe as delimiters instead of comma.
Example 6: Different Delimiters
// Tab-separated values (TSV)
def tsvData = "Name\tAge\tCity\nAlice\t30\tSpringfield\nBob\t25\tShelbyville"
println "--- TSV (tab-separated) ---"
tsvData.split('\n').each { line ->
def fields = line.split('\t')
println fields.collect { it.padRight(15) }.join('| ')
}
// Semicolon-separated (common in European locales)
def ssvData = "Name;Price;Quantity\nLaptop;999,99;10\nPhone;699,99;25"
println "\n--- SSV (semicolon-separated) ---"
ssvData.split('\n').each { line ->
def fields = line.split(';')
println fields.join(' | ')
}
// Pipe-separated
def psvData = "ID|Name|Status\n1|Alice|Active\n2|Bob|Inactive"
println "\n--- PSV (pipe-separated) ---"
psvData.split('\n').each { line ->
def fields = line.split('\\|') // Pipe needs escaping in regex
println fields.join(' - ')
}
// Generic parser function
def parseDelimited(String data, String delimiter) {
def lines = data.trim().split('\n')
def headers = lines[0].split(delimiter.replaceAll('([|\\\\^$.+*?()\\[\\]{}])', '\\\\$1'))
return lines[1..-1].collect { line ->
def vals = line.split(delimiter.replaceAll('([|\\\\^$.+*?()\\[\\]{}])', '\\\\$1'))
[headers, vals].transpose().collectEntries()
}
}
println "\n--- Generic parser ---"
def result = parseDelimited(psvData, '|')
result.each { println it }
Output
--- TSV (tab-separated) --- Name | Age | City Alice | 30 | Springfield Bob | 25 | Shelbyville --- SSV (semicolon-separated) --- Name | Price | Quantity Laptop | 999,99 | 10 Phone | 699,99 | 25 --- PSV (pipe-separated) --- ID - Name - Status 1 - Alice - Active 2 - Bob - Inactive --- Generic parser --- [ID:1, Name:Alice, Status:Active] [ID:2, Name:Bob, Status:Inactive]
What happened here: Not all “CSV” uses commas. Tab-separated (TSV) is common in spreadsheet exports, semicolons are standard in European Excel exports (where comma is the decimal separator), and pipes are used in database dumps. The same split() approach works – just change the delimiter. Note that pipe (|) needs escaping since split() takes a regex.
Example 7: Filter, Sort, and Aggregate CSV Data
What we’re doing: Using Groovy’s collection methods to analyze CSV data.
Example 7: CSV Data Analysis
def csvData = '''Name,Department,Salary,Years
Alice,Engineering,95000,5
Bob,Marketing,72000,3
Charlie,Engineering,88000,7
Diana,Sales,68000,2
Eve,Engineering,102000,8
Frank,Marketing,76000,4
Grace,Sales,71000,3
Heidi,Engineering,91000,6'''
def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def employees = lines[1..-1].collect { line ->
def vals = line.split(',')
[headers, vals].transpose().collectEntries()
}
// Sort by salary (descending)
println "--- Top Earners ---"
employees.sort { -(it.Salary as int) }.take(3).each {
println " ${it.Name}: \$${it.Salary}"
}
// Group by department
println "\n--- Department Summary ---"
employees.groupBy { it.Department }.each { dept, members ->
def avgSalary = members.collect { it.Salary as int }.sum() / members.size()
def avgYears = members.collect { it.Years as int }.sum() / members.size()
println " ${dept}: ${members.size()} people, avg salary \$${avgSalary.round(0)}, avg years ${avgYears.round(1)}"
}
// Filter: experienced and well-paid
def senior = employees.findAll { (it.Years as int) >= 5 && (it.Salary as int) >= 90000 }
println "\n--- Senior High Earners (5+ years, \$90k+) ---"
senior.each { println " ${it.Name} (${it.Years} years, \$${it.Salary})" }
// Total payroll
def totalPayroll = employees.sum { it.Salary as int }
println "\nTotal payroll: \$${totalPayroll}"
Output
--- Top Earners --- Eve: $102000 Alice: $95000 Heidi: $91000 --- Department Summary --- Engineering: 4 people, avg salary $94000, avg years 6.5 Marketing: 2 people, avg salary $74000, avg years 3.5 Sales: 2 people, avg salary $69500, avg years 2.5 --- Senior High Earners (5+ years, $90k+) --- Eve (8 years, $102000) Alice (5 years, $95000) Heidi (6 years, $91000) Total payroll: $663000
What happened here: Once CSV is converted to a list of maps, you can use Groovy’s entire collection toolkit. sort() with a negative value gives descending order, groupBy() categorizes by any field, and findAll() filters with complex conditions.
Example 8: CSV to JSON Conversion
What we’re doing: Converting CSV data to JSON format – a common data transformation task.
Example 8: CSV to JSON
import groovy.json.JsonOutput
def csvData = '''id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false'''
// Parse CSV to maps
def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def records = lines[1..-1].collect { line ->
def vals = line.split(',')
def map = [headers, vals].transpose().collectEntries()
// Type coercion
map.id = map.id as int
map.active = map.active.toBoolean()
return map
}
// Convert to JSON
def json = JsonOutput.prettyPrint(JsonOutput.toJson(records))
println "CSV to JSON:"
println json
// Reverse: JSON back to CSV
println "\nJSON back to CSV:"
def jsonRecords = new groovy.json.JsonSlurper().parseText(json)
def csvHeaders = jsonRecords[0].keySet().toList()
println csvHeaders.join(',')
jsonRecords.each { record ->
println csvHeaders.collect { record[it] }.join(',')
}
Output
CSV to JSON:
[
{
"id": 1,
"name": "Alice",
"email": "alice@example.com",
"active": true
},
{
"id": 2,
"name": "Bob",
"email": "bob@example.com",
"active": true
},
{
"id": 3,
"name": "Charlie",
"email": "charlie@example.com",
"active": false
}
]
JSON back to CSV:
id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false
What happened here: CSV-to-JSON conversion is a two-step process: parse CSV to maps, then serialize maps to JSON with JsonOutput.toJson(). We added type coercion to convert string values (“1”, “true”) to their proper types. The reverse (JSON to CSV) extracts keys as headers and values as rows. For more on JSON, see our JsonSlurper guide.
Example 9: Merge and Join CSV Data
What we’re doing: Joining two CSV data sets on a common key – like a SQL JOIN.
Example 9: Merge CSV Data
def employeeCsv = '''id,name,dept_id
1,Alice,101
2,Bob,102
3,Charlie,101
4,Diana,103'''
def departmentCsv = '''dept_id,dept_name,location
101,Engineering,Building A
102,Marketing,Building B
103,Sales,Building C'''
// Parse both CSVs
def parseCsv(String csv) {
def lines = csv.trim().split('\n')
def headers = lines[0].split(',')
lines[1..-1].collect { line ->
[headers, line.split(',')].transpose().collectEntries()
}
}
def employees = parseCsv(employeeCsv)
def departments = parseCsv(departmentCsv)
// Create a lookup map for departments
def deptLookup = departments.collectEntries { [it.dept_id, it] }
// Join: merge department info into each employee
def joined = employees.collect { emp ->
def dept = deptLookup[emp.dept_id]
emp + [dept_name: dept?.dept_name, location: dept?.location]
}
println "--- Joined Data ---"
joined.each { record ->
println "${record.name.padRight(10)} | ${record.dept_name?.padRight(12)} | ${record.location}"
}
// Write joined data as CSV
println "\n--- Joined CSV ---"
def joinHeaders = ['id', 'name', 'dept_name', 'location']
println joinHeaders.join(',')
joined.each { record ->
println joinHeaders.collect { record[it] }.join(',')
}
Output
--- Joined Data --- Alice | Engineering | Building A Bob | Marketing | Building B Charlie | Engineering | Building A Diana | Sales | Building C --- Joined CSV --- id,name,dept_name,location 1,Alice,Engineering,Building A 2,Bob,Marketing,Building B 3,Charlie,Engineering,Building A 4,Diana,Sales,Building C
What happened here: We created a lookup map from departments indexed by dept_id, then merged department data into each employee using the + operator on maps. This is essentially a LEFT JOIN. The ?. safe navigation handles cases where a department might not exist.
Example 10: Large CSV File Processing with eachLine
What we’re doing: Processing large CSV files line by line without loading everything into memory.
Example 10: Large File Processing
// Create a sample large CSV file
def tempFile = File.createTempFile('large', '.csv')
tempFile.deleteOnExit()
tempFile.withWriter { writer ->
writer.writeLine('id,value,category')
(1..100).each { i ->
def category = ['A', 'B', 'C'][i % 3]
writer.writeLine("${i},${(Math.random() * 1000).round(2)},${category}")
}
}
// Process line by line - memory efficient
def counts = [:].withDefault { 0 }
def sums = [:].withDefault { 0.0 }
def lineCount = 0
tempFile.eachLine('UTF-8') { line, lineNum ->
if (lineNum == 1) return // Skip header
def fields = line.split(',')
def category = fields[2]
def value = fields[1] as BigDecimal
counts[category]++
sums[category] += value
lineCount++
}
println "Processed ${lineCount} data rows"
println "\n--- Category Summary ---"
counts.sort().each { cat, count ->
def avg = (sums[cat] / count).round(2)
println " Category ${cat}: ${count} rows, total=${sums[cat].round(2)}, avg=${avg}"
}
// File size check
println "\nFile size: ${tempFile.length()} bytes"
println "Lines: ${tempFile.readLines().size()}"
Output
Processed 100 data rows --- Category Summary --- Category A: 33 rows, total=16842.53, avg=510.38 Category B: 34 rows, total=17156.21, avg=504.59 Category C: 33 rows, total=15923.44, avg=482.53 File size: 1247 bytes Lines: 101
What happened here: eachLine() processes one line at a time, so memory usage stays constant regardless of file size. We used withDefault() maps to accumulate counts and sums by category. This pattern handles gigabyte-sized CSV files without issues.
Example 11: Generate CSV Reports
What we’re doing: Generating formatted CSV reports from computed data.
Example 11: CSV Report Generation
def salesData = [
[month: 'January', product: 'Widget A', units: 150, revenue: 4500.00],
[month: 'January', product: 'Widget B', units: 200, revenue: 8000.00],
[month: 'February', product: 'Widget A', units: 175, revenue: 5250.00],
[month: 'February', product: 'Widget B', units: 220, revenue: 8800.00],
[month: 'March', product: 'Widget A', units: 190, revenue: 5700.00],
[month: 'March', product: 'Widget B', units: 180, revenue: 7200.00]
]
// Generate summary report
def summary = salesData.groupBy { it.month }.collect { month, sales ->
[
month: month,
total_units: sales.sum { it.units },
total_revenue: sales.sum { it.revenue },
avg_price: (sales.sum { it.revenue } / sales.sum { it.units }).round(2),
products: sales.size()
]
}
// Write summary CSV
def headers = ['month', 'total_units', 'total_revenue', 'avg_price', 'products']
println "--- Monthly Summary Report (CSV) ---"
println headers.join(',')
summary.each { row ->
println headers.collect { row[it] }.join(',')
}
// Add totals row
def totals = [
month: 'TOTAL',
total_units: summary.sum { it.total_units },
total_revenue: summary.sum { it.total_revenue },
avg_price: (summary.sum { it.total_revenue } / summary.sum { it.total_units }).round(2),
products: summary.sum { it.products }
]
println headers.collect { totals[it] }.join(',')
Output
--- Monthly Summary Report (CSV) --- month,total_units,total_revenue,avg_price,products January,350,12500.00,35.71,2 February,395,14050.00,35.57,2 March,370,12900.00,34.86,2 TOTAL,1115,39450.00,35.38,6
What happened here: We used groupBy() to aggregate sales by month, computed derived values (average price), and generated both detail and summary CSV rows. The totals row at the bottom is a common reporting pattern.
OpenCSV Integration
For production-quality CSV parsing with proper handling of quoted fields, embedded newlines, and custom configurations, use the OpenCSV library. It integrates directly with Groovy via @Grab.
OpenCSV Integration
// @Grab('com.opencsv:opencsv:5.9')
// import com.opencsv.CSVReader
// import com.opencsv.CSVWriter
// import com.opencsv.CSVReaderBuilder
// import com.opencsv.RFC4180Parser
// import com.opencsv.RFC4180ParserBuilder
// Reading with OpenCSV
// def reader = new CSVReader(new FileReader('data.csv'))
// def allRows = reader.readAll()
// reader.close()
// Writing with OpenCSV
// def writer = new CSVWriter(new FileWriter('output.csv'))
// writer.writeNext(['Name', 'Age', 'City'] as String[])
// writer.writeNext(['Alice', '30', 'Springfield'] as String[])
// writer.close()
// Simulated OpenCSV behavior
println "OpenCSV Usage Pattern:"
println " 1. Add @Grab('com.opencsv:opencsv:5.9')"
println " 2. Create CSVReader with FileReader"
println " 3. Use readAll() or readNext() for rows"
println " 4. Create CSVWriter with FileWriter"
println " 5. Use writeNext() for each row"
println ""
println "OpenCSV handles:"
println " - Quoted fields with commas inside"
println " - Escaped quotes (\"\")"
println " - Newlines within quoted fields"
println " - Custom separators, quote chars, escape chars"
println " - RFC 4180 compliance"
println " - Header mapping to Java beans"
Output
OpenCSV Usage Pattern:
1. Add @Grab('com.opencsv:opencsv:5.9')
2. Create CSVReader with FileReader
3. Use readAll() or readNext() for rows
4. Create CSVWriter with FileWriter
5. Use writeNext() for each row
OpenCSV handles:
- Quoted fields with commas inside
- Escaped quotes ("")
- Newlines within quoted fields
- Custom separators, quote chars, escape chars
- RFC 4180 compliance
- Header mapping to Java beans
Use manual parsing (split()) for simple, well-structured CSV. Switch to OpenCSV when you have quoted fields, embedded delimiters, or need RFC 4180 compliance.
Edge Cases and Best Practices
Edge Case: Empty Fields and Trailing Commas
Empty Fields
// split() vs tokenize() with empty fields
def line = 'Alice,,Engineering,,95000'
def splitResult = line.split(',', -1) // -1 keeps trailing empties
def tokenResult = line.tokenize(',')
println "split(-1): ${splitResult.toList()} (${splitResult.length} fields)"
println "tokenize: ${tokenResult} (${tokenResult.size()} fields)"
// Trailing comma
def trailing = 'Alice,30,'
println "\nTrailing comma:"
println " split: ${trailing.split(',').toList()} (${trailing.split(',').length} fields)"
println " split(-1): ${trailing.split(',', -1).toList()} (${trailing.split(',', -1).length} fields)"
// BOM character (common in Excel exports)
def bomLine = '\uFEFFName,Age,City'
println "\nBOM detection:"
println " Has BOM: ${bomLine.startsWith('\uFEFF')}"
println " Cleaned: ${bomLine.replaceFirst('^\uFEFF', '')}"
Output
split(-1): [Alice, , Engineering, , 95000] (5 fields) tokenize: [Alice, Engineering, 95000] (3 fields) Trailing comma: split: [Alice, 30] (2 fields) split(-1): [Alice, 30, ] (3 fields) BOM detection: Has BOM: true Cleaned: Name,Age,City
Best Practices
DO:
- Use
split(',', -1)to preserve empty trailing fields - Convert CSV rows to maps for readable, maintainable code
- Use
eachLine()for large files to avoid loading everything into memory - Strip BOM characters when reading files exported from Excel
- Use OpenCSV for CSV that contains quoted fields or embedded delimiters
DON’T:
- Use
tokenize()for CSV – it drops empty fields - Assume CSV files are always comma-delimited – check first
- Write CSV by hand when fields might contain commas – use proper quoting
- Forget to handle encoding – use
'UTF-8'explicitly
Common Pitfalls
Pitfall 1: tokenize() Drops Empty Fields
Problem: Using tokenize() loses empty values, causing field misalignment.
Pitfall 1: tokenize vs split
def line = 'Alice,,Engineering'
// BAD - tokenize drops the empty field
def bad = line.tokenize(',')
println "tokenize: ${bad} - only ${bad.size()} fields!"
// GOOD - split preserves empty fields
def good = line.split(',', -1)
println "split: ${good.toList()} - ${good.length} fields (correct)"
Output
tokenize: [Alice, Engineering] - only 2 fields! split: [Alice, , Engineering] - 3 fields (correct)
Solution: Always use split(',', -1) for CSV parsing. The -1 limit ensures trailing empty strings are preserved.
Pitfall 2: Commas Inside Values
Problem: Simple split(',') breaks when values contain commas.
Pitfall 2: Commas in Values
// This CSV has commas inside quoted values
def line = '"Smith, John",35,"New York, NY"'
// BAD - splits on commas inside quotes
def bad = line.split(',')
println "Naive split: ${bad.toList()} - ${bad.length} fields (wrong!)"
// GOOD - use a proper CSV parser
// For this post, see Example 5 for a quoted CSV parser
// Or use OpenCSV for production code
println "Use OpenCSV or a proper parser for quoted CSV fields"
Output
Naive split: ["Smith, John", 35, "New York, NY"] - 4 fields (wrong!) Use OpenCSV or a proper parser for quoted CSV fields
Solution: If your CSV data might contain commas within values, the values must be quoted. Use OpenCSV or the custom parser from Example 5 to handle quoted fields correctly.
Pitfall 3: Type Confusion
Problem: All CSV values are strings – forgetting to convert causes unexpected behavior.
Pitfall 3: Type Confusion
def salary1 = '95000'
def salary2 = '100000'
// BAD - string comparison, not numeric
println "String compare: ${salary1 > salary2}" // true! ('9' > '1')
// GOOD - convert to numbers first
println "Number compare: ${(salary1 as int) > (salary2 as int)}" // false
// BAD - string addition
println "String add: ${salary1 + salary2}" // 95000100000
// GOOD - numeric addition
println "Number add: ${(salary1 as int) + (salary2 as int)}" // 195000
Output
String compare: true Number compare: false String add: 95000100000 Number add: 195000
Solution: Always convert CSV values to the appropriate type (as int, as BigDecimal, toBoolean()) before using them in comparisons or arithmetic.
Conclusion
Groovy CSV processing is very simple. For simple, well-structured CSV, Groovy’s built-in split(), splitEachLine(), and join() methods handle parsing and generation in just a few lines. For complex CSV with quoted fields and special characters, OpenCSV provides RFC 4180 compliance with easy Groovy integration.
The key pattern is converting CSV rows to maps using the header row as keys – this unlocks Groovy’s full collection toolkit for filtering, grouping, sorting, and aggregating your data. This technique works well for data exports, report generation, and format conversion – Groovy makes CSV work feel effortless.
For other data formats, check out our Groovy JSON Parsing guide and our Groovy File I/O Tutorial.
Summary
- Use
split(',', -1)for CSV parsing – preserves empty fields unliketokenize() - Convert CSV rows to maps with
[headers, values].transpose().collectEntries() - Use
eachLine()for memory-efficient processing of large CSV files - Use
join(',')for simple CSV writing; add quoting for values with commas - Use OpenCSV (
@Grab('com.opencsv:opencsv:5.9')) for production CSV with quoted fields - Always convert string values to proper types before comparisons or arithmetic
If you also work with build tools, CI/CD pipelines, or cloud CLIs, check out Command Playground to practice 105+ CLI tools directly in your browser — no install needed.
Up next: Groovy YAML Processing
Frequently Asked Questions
How do I parse a CSV file in Groovy?
Use File.splitEachLine(',') { fields -> ... } for line-by-line parsing, or readLines() with split(',') to get all rows. Convert rows to maps using [headers, values].transpose().collectEntries() for named field access. For complex CSV with quoted fields, use the OpenCSV library via @Grab.
What is the difference between split() and tokenize() for CSV parsing?
split(',') returns a String array and preserves empty fields (e.g., 'a,,c'.split(',') gives ['a', '', 'c']). tokenize(',') returns a List but drops empty tokens (gives ['a', 'c']). Always use split(',', -1) for CSV parsing to preserve all fields including trailing empty ones.
How do I handle CSV files with quoted fields in Groovy?
Simple split(',') fails when values contain commas inside quotes. For production use, add OpenCSV with @Grab('com.opencsv:opencsv:5.9') and use CSVReader. For lightweight scripts, write a custom parser that tracks quote state while iterating through characters (see Example 5 in this guide).
How do I write CSV output in Groovy?
Use list.join(',') to convert a list of values to a CSV line, and File.withWriter { writer -> writer.writeLine(csvLine) } to write to a file. For values that might contain commas or quotes, wrap them in double quotes and escape internal quotes by doubling them. OpenCSV’s CSVWriter handles this automatically.
How do I convert CSV to JSON in Groovy?
First parse the CSV to a list of maps using [headers, values].transpose().collectEntries(). Then convert to JSON with JsonOutput.toJson(listOfMaps). Add type coercion (as int, toBoolean()) before serialization so numbers and booleans aren’t quoted in the JSON output.
Related Posts
Previous in Series: Groovy REST API Consumption
Next in Series: Groovy YAML Processing
Related Topics You Might Like:
This post is part of the Groovy & Grails Cookbook series on TechnoScripts.com

No comment