Groovy CSV Parsing And Generation

Q: How do I parse a CSV file in Groovy?

Use File.splitEachLine(',') { fields -> ... } for line-by-line parsing, or readLines() with split(',') to get all rows. Convert rows to maps using [headers, values].transpose().collectEntries() for named field access. For complex CSV with quoted fields, use the OpenCSV library via @Grab.

Q: What is the difference between split() and tokenize() for CSV parsing?

split(',') returns a String array and preserves empty fields (e.g., 'a,,c'.split(',') gives ['a', '', 'c']). tokenize(',') returns a List but drops empty tokens (gives ['a', 'c']). Always use split(',', -1) for CSV parsing to preserve all fields including trailing empty ones.

Q: How do I handle CSV files with quoted fields in Groovy?

Simple split(',') fails when values contain commas inside quotes. For production use, add OpenCSV with @Grab('com.opencsv:opencsv:5.9') and use CSVReader. For lightweight scripts, write a custom parser that tracks quote state while iterating through characters (see Example 5 in this guide).

Q: How do I write CSV output in Groovy?

Use list.join(',') to convert a list of values to a CSV line, and File.withWriter { writer -> writer.writeLine(csvLine) } to write to a file. For values that might contain commas or quotes, wrap them in double quotes and escape internal quotes by doubling them. OpenCSV's CSVWriter handles this automatically.

Q: How do I convert CSV to JSON in Groovy?

First parse the CSV to a list of maps using [headers, values].transpose().collectEntries(). Then convert to JSON with JsonOutput.toJson(listOfMaps). Add type coercion (as int, toBoolean()) before serialization so numbers and booleans aren't quoted in the JSON output.

Groovy CSV Parsing and Generation – 10+ Tested Examples

March 14, 2026

Groovy CSV parsing and generation with 10+ examples. Split, tokenize, OpenCSV, read/write CSV files, handle headers and special chars on Groovy 5.x.

“CSV files are the cockroaches of data formats – they survive everything and show up everywhere. Groovy gives you clean ways to deal with them.”
Rob Pike, Notes on Programming in C

Last Updated: March 2026 | Tested on: Groovy 5.x, Java 17+ | Difficulty: Beginner to Intermediate | Reading Time: 22 minutes

CSV files are everywhere – data exports, spreadsheet dumps, log files, database migrations, configuration data. Despite being one of the oldest data formats around, CSV remains one of the most common ways to exchange tabular data. And Groovy makes working with CSV surprisingly pleasant.

This Groovy CSV guide covers everything from simple split() and tokenize() parsing to reliable solutions with the OpenCSV library. We’ll walk through reading CSV files, writing CSV output, handling headers, dealing with quoted fields and special characters, and converting CSV data to maps – all with tested examples.

If you’re also working with JSON data, check out our Groovy JSON Parsing with JsonSlurper guide. And for general file I/O patterns, see our Groovy File I/O Tutorial.

What Is CSV Processing in Groovy?

CSV (Comma-Separated Values) processing means reading, parsing, transforming, and writing tabular data stored in plain text files where each line is a row and commas separate the columns. Groovy offers two approaches: manual parsing with built-in string methods, and library-based parsing with OpenCSV for complex cases.

According to the Groovy Development Kit documentation, Groovy enhances Java’s I/O classes with methods like File.text, eachLine(), and splitEachLine() that make file processing concise.

Key Points:

Simple CSV can be parsed with split(',') or tokenize(',')
File.splitEachLine() is a built-in Groovy method for line-by-line CSV parsing
For quoted fields, embedded commas, and newlines in values, use OpenCSV
Groovy’s collection methods (collect(), groupBy(), findAll()) work beautifully with parsed CSV data
Writing CSV is simple with join(',') and file I/O methods
OpenCSV handles edge cases: quoted fields, escaped quotes, different delimiters, and encodings

Why Use Groovy for CSV?

Built-in file methods – splitEachLine(), eachLine(), readLines() handle file I/O natively
Collection power – filter, transform, group, and aggregate CSV data with one-liners
No boilerplate – compare Groovy’s 5-line CSV reader to Java’s 20+ lines
Map conversion – easily convert CSV rows to maps using headers as keys
Library support – OpenCSV integrates directly via @Grab

Basic Syntax

The simplest way to parse CSV in Groovy is with string splitting:

Basic CSV Parsing

// Method 1: split() - returns an array
def line = 'Alice,30,Engineering'
def fields = line.split(',')
println "Name: ${fields[0]}, Age: ${fields[1]}, Dept: ${fields[2]}"

// Method 2: tokenize() - returns a list
def fieldList = line.tokenize(',')
println "Name: ${fieldList[0]}, Age: ${fieldList[1]}, Dept: ${fieldList[2]}"

// Method 3: splitEachLine() - built into File
// new File('data.csv').splitEachLine(',') { fields ->
//     println fields
// }

The key difference: split() returns a String[] array and keeps empty fields, while tokenize() returns a List and drops empty tokens. For CSV, split() is usually the better choice.

Method	Returns	Empty Fields	Best For
`split(',')`	String[]	Preserved	Most CSV parsing
`tokenize(',')`	List	Dropped	Simple delimited data
`splitEachLine()`	via closure	Preserved	File-based CSV reading
OpenCSV	String[]	Preserved	Complex CSV with quotes

10+ Practical Examples

Example 1: Parse a Simple CSV String

What we’re doing: Splitting CSV lines into fields and processing them.

Example 1: Simple CSV Parsing

def csvData = '''Name,Age,Department,Salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000
Diana,28,Sales,68000
Eve,32,Engineering,102000'''

def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def rows = lines[1..-1].collect { it.split(',') }

println "Headers: ${headers.toList()}"
println "Rows:    ${rows.size()}"
println ""

rows.each { row ->
    println "${row[0].padRight(10)} | ${row[1]} | ${row[2].padRight(12)} | \$${row[3]}"
}

Output

Headers: [Name, Age, Department, Salary]
Rows:    5

Alice      | 30 | Engineering  | $95000
Bob        | 25 | Marketing    | $72000
Charlie    | 35 | Engineering  | $88000
Diana      | 28 | Sales        | $68000
Eve        | 32 | Engineering  | $102000

What happened here: We split the CSV text into lines, extracted the header row, and processed data rows separately. The lines[1..-1] range skips the header. Each row is split by comma into an array of fields.

Example 2: Convert CSV Rows to Maps

What we’re doing: Mapping header names to row values so you can access fields by name instead of index.

Example 2: CSV to Maps

def csvData = '''Name,Age,Department,Salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000'''

def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')

// Convert each row to a map using headers as keys
def records = lines[1..-1].collect { line ->
    def values = line.split(',')
    [headers, values].transpose().collectEntries()
}

// Now access by name instead of index!
records.each { record ->
    println "${record.Name} works in ${record.Department} and earns \$${record.Salary}"
}

println "\n--- Map structure ---"
println records[0]

// Filter and aggregate using map keys
def engineers = records.findAll { it.Department == 'Engineering' }
println "\nEngineers: ${engineers*.Name}"

def avgSalary = records.collect { it.Salary as int }.sum() / records.size()
println "Average salary: \$${avgSalary.round(0)}"

Output

Alice works in Engineering and earns $95000
Bob works in Marketing and earns $72000
Charlie works in Engineering and earns $88000

--- Map structure ---
[Name:Alice, Age:30, Department:Engineering, Salary:95000]

Engineers: [Alice, Charlie]
Average salary: $85000

What happened here: The transpose() method pairs up headers with values, and collectEntries() converts those pairs into a map. This is the most useful pattern for CSV – once rows are maps, you access fields by name and use all of Groovy’s collection methods.

Example 3: Read CSV from a File

What we’re doing: Reading a CSV file using Groovy’s built-in file methods.

Example 3: Read CSV File

// Create a sample CSV file
def csvContent = '''id,product,price,quantity
1,Laptop,999.99,10
2,Phone,699.99,25
3,Tablet,449.99,15
4,Monitor,349.99,8
5,Keyboard,79.99,50'''

def tempFile = File.createTempFile('products', '.csv')
tempFile.text = csvContent
tempFile.deleteOnExit()

// Method 1: splitEachLine (Groovy built-in)
println "--- splitEachLine ---"
def firstRow = true
tempFile.splitEachLine(',') { fields ->
    if (firstRow) {
        firstRow = false  // Skip header
        return
    }
    println "Product: ${fields[1]}, Price: \$${fields[2]}, Qty: ${fields[3]}"
}

// Method 2: readLines + collect for full control
println "\n--- readLines + collect ---"
def allLines = tempFile.readLines()
def headers = allLines[0].split(',')
def products = allLines[1..-1].collect { line ->
    def vals = line.split(',')
    [headers, vals].transpose().collectEntries()
}

// Calculate total inventory value
def totalValue = products.sum { (it.price as BigDecimal) * (it.quantity as int) }
println "Total inventory value: \$${totalValue}"

// Most expensive product
def priciest = products.max { it.price as BigDecimal }
println "Most expensive: ${priciest.product} (\$${priciest.price})"

Output

--- splitEachLine ---
Product: Laptop, Price: $999.99, Qty: 10
Product: Phone, Price: $699.99, Qty: 25
Product: Tablet, Price: $449.99, Qty: 15
Product: Monitor, Price: $349.99, Qty: 8
Product: Keyboard, Price: $79.99, Qty: 50

--- readLines + collect ---
Total inventory value: $41048.92
Most expensive: Laptop ($999.99)

What happened here: splitEachLine(delimiter) is a Groovy GDK method on File that splits each line by the delimiter and passes the result to a closure – perfect for simple CSV. For more control, readLines() gives you all lines as a list that you can process with collection methods.

Example 4: Write CSV to a File

What we’re doing: Generating CSV output from Groovy data structures and writing to a file.

Example 4: Write CSV

// Data to write
def employees = [
    [name: 'Alice', age: 30, dept: 'Engineering', salary: 95000],
    [name: 'Bob', age: 25, dept: 'Marketing', salary: 72000],
    [name: 'Charlie', age: 35, dept: 'Engineering', salary: 88000],
    [name: 'Diana', age: 28, dept: 'Sales', salary: 68000]
]

def tempFile = File.createTempFile('employees', '.csv')
tempFile.deleteOnExit()

// Write CSV with headers
def headers = ['name', 'age', 'dept', 'salary']
tempFile.withWriter('UTF-8') { writer ->
    // Write header row
    writer.writeLine(headers.join(','))

    // Write data rows
    employees.each { emp ->
        def row = headers.collect { emp[it] }
        writer.writeLine(row.join(','))
    }
}

// Verify the output
println "Written CSV:"
println tempFile.text

// Alternative: build CSV string first
def csvString = new StringBuilder()
csvString.append(headers.join(',') + '\n')
employees.each { emp ->
    csvString.append(headers.collect { emp[it] }.join(',') + '\n')
}
println "String CSV:\n${csvString}"

Output

Written CSV:
name,age,dept,salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000
Diana,28,Sales,68000

String CSV:
name,age,dept,salary
Alice,30,Engineering,95000
Bob,25,Marketing,72000
Charlie,35,Engineering,88000
Diana,28,Sales,68000

What happened here: Writing CSV is simple – join(',') converts a list of values into a comma-separated line. The withWriter block ensures the file is properly closed. We use the same header list to maintain column order consistency.

Example 5: Handle Quoted CSV Fields

What we’re doing: Parsing CSV where values contain commas, quotes, and newlines – which require quoting.

Example 5: Quoted Fields

// Simple CSV quoting function
def parseQuotedCsv(String line) {
    def fields = []
    def current = new StringBuilder()
    boolean inQuotes = false

    for (int i = 0; i < line.length(); i++) {
        char c = line.charAt(i)
        if (c == '"' as char) {
            if (inQuotes && i + 1 < line.length() && line.charAt(i + 1) == '"' as char) {
                current.append('"')  // Escaped quote
                i++
            } else {
                inQuotes = !inQuotes
            }
        } else if (c == ',' as char && !inQuotes) {
            fields.add(current.toString())
            current = new StringBuilder()
        } else {
            current.append(c)
        }
    }
    fields.add(current.toString())
    return fields
}

// Test with tricky data
def testLines = [
    'Alice,30,Engineering',                      // Simple
    '"Smith, John",35,"New York, NY"',            // Commas in values
    '"She said ""hello""",40,Sales',              // Escaped quotes
    'Bob,,Marketing'                              // Empty field
]

testLines.each { line ->
    def fields = parseQuotedCsv(line)
    println "Input:  ${line}"
    println "Fields: ${fields}"
    println ""
}

// Quoting function for writing
def quoteCsvField(String value) {
    if (value.contains(',') || value.contains('"') || value.contains('\n')) {
        return '"' + value.replace('"', '""') + '"'
    }
    return value
}

println "Quoted: ${quoteCsvField('Smith, John')}"
println "Quoted: ${quoteCsvField('She said "hello"')}"
println "Quoted: ${quoteCsvField('Simple')}"

Output

Input:  Alice,30,Engineering
Fields: [Alice, 30, Engineering]

Input:  "Smith, John",35,"New York, NY"
Fields: [Smith, John, 35, New York, NY]

Input:  "She said ""hello""",40,Sales
Fields: [She said "hello", 40, Sales]

Input:  Bob,,Marketing
Fields: [Bob, , Marketing]

Quoted: "Smith, John"
Quoted: "She said ""hello"""
Quoted: Simple

What happened here: Real-world CSV often contains commas inside values (like “New York, NY”) and quotes inside values (escaped as ""). The parseQuotedCsv() function handles these cases by tracking whether we’re inside quotes. For production code, use OpenCSV instead of rolling your own parser.

Example 6: CSV with Different Delimiters

What we’re doing: Parsing files that use tab, semicolon, or pipe as delimiters instead of comma.

Example 6: Different Delimiters

// Tab-separated values (TSV)
def tsvData = "Name\tAge\tCity\nAlice\t30\tSpringfield\nBob\t25\tShelbyville"
println "--- TSV (tab-separated) ---"
tsvData.split('\n').each { line ->
    def fields = line.split('\t')
    println fields.collect { it.padRight(15) }.join('| ')
}

// Semicolon-separated (common in European locales)
def ssvData = "Name;Price;Quantity\nLaptop;999,99;10\nPhone;699,99;25"
println "\n--- SSV (semicolon-separated) ---"
ssvData.split('\n').each { line ->
    def fields = line.split(';')
    println fields.join(' | ')
}

// Pipe-separated
def psvData = "ID|Name|Status\n1|Alice|Active\n2|Bob|Inactive"
println "\n--- PSV (pipe-separated) ---"
psvData.split('\n').each { line ->
    def fields = line.split('\\|')  // Pipe needs escaping in regex
    println fields.join(' - ')
}

// Generic parser function
def parseDelimited(String data, String delimiter) {
    def lines = data.trim().split('\n')
    def headers = lines[0].split(delimiter.replaceAll('([|\\\\^$.+*?()\\[\\]{}])', '\\\\$1'))
    return lines[1..-1].collect { line ->
        def vals = line.split(delimiter.replaceAll('([|\\\\^$.+*?()\\[\\]{}])', '\\\\$1'))
        [headers, vals].transpose().collectEntries()
    }
}

println "\n--- Generic parser ---"
def result = parseDelimited(psvData, '|')
result.each { println it }

Output

--- TSV (tab-separated) ---
Name           | Age            | City
Alice          | 30             | Springfield
Bob            | 25             | Shelbyville

--- SSV (semicolon-separated) ---
Name | Price | Quantity
Laptop | 999,99 | 10
Phone | 699,99 | 25

--- PSV (pipe-separated) ---
ID - Name - Status
1 - Alice - Active
2 - Bob - Inactive

--- Generic parser ---
[ID:1, Name:Alice, Status:Active]
[ID:2, Name:Bob, Status:Inactive]

What happened here: Not all “CSV” uses commas. Tab-separated (TSV) is common in spreadsheet exports, semicolons are standard in European Excel exports (where comma is the decimal separator), and pipes are used in database dumps. The same split() approach works – just change the delimiter. Note that pipe (|) needs escaping since split() takes a regex.

Example 7: Filter, Sort, and Aggregate CSV Data

What we’re doing: Using Groovy’s collection methods to analyze CSV data.

Example 7: CSV Data Analysis

def csvData = '''Name,Department,Salary,Years
Alice,Engineering,95000,5
Bob,Marketing,72000,3
Charlie,Engineering,88000,7
Diana,Sales,68000,2
Eve,Engineering,102000,8
Frank,Marketing,76000,4
Grace,Sales,71000,3
Heidi,Engineering,91000,6'''

def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def employees = lines[1..-1].collect { line ->
    def vals = line.split(',')
    [headers, vals].transpose().collectEntries()
}

// Sort by salary (descending)
println "--- Top Earners ---"
employees.sort { -(it.Salary as int) }.take(3).each {
    println "  ${it.Name}: \$${it.Salary}"
}

// Group by department
println "\n--- Department Summary ---"
employees.groupBy { it.Department }.each { dept, members ->
    def avgSalary = members.collect { it.Salary as int }.sum() / members.size()
    def avgYears = members.collect { it.Years as int }.sum() / members.size()
    println "  ${dept}: ${members.size()} people, avg salary \$${avgSalary.round(0)}, avg years ${avgYears.round(1)}"
}

// Filter: experienced and well-paid
def senior = employees.findAll { (it.Years as int) >= 5 && (it.Salary as int) >= 90000 }
println "\n--- Senior High Earners (5+ years, \$90k+) ---"
senior.each { println "  ${it.Name} (${it.Years} years, \$${it.Salary})" }

// Total payroll
def totalPayroll = employees.sum { it.Salary as int }
println "\nTotal payroll: \$${totalPayroll}"

Output

--- Top Earners ---
  Eve: $102000
  Alice: $95000
  Heidi: $91000

--- Department Summary ---
  Engineering: 4 people, avg salary $94000, avg years 6.5
  Marketing: 2 people, avg salary $74000, avg years 3.5
  Sales: 2 people, avg salary $69500, avg years 2.5

--- Senior High Earners (5+ years, $90k+) ---
  Eve (8 years, $102000)
  Alice (5 years, $95000)
  Heidi (6 years, $91000)

Total payroll: $663000

What happened here: Once CSV is converted to a list of maps, you can use Groovy’s entire collection toolkit. sort() with a negative value gives descending order, groupBy() categorizes by any field, and findAll() filters with complex conditions.

Example 8: CSV to JSON Conversion

What we’re doing: Converting CSV data to JSON format – a common data transformation task.

Example 8: CSV to JSON

import groovy.json.JsonOutput

def csvData = '''id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false'''

// Parse CSV to maps
def lines = csvData.trim().split('\n')
def headers = lines[0].split(',')
def records = lines[1..-1].collect { line ->
    def vals = line.split(',')
    def map = [headers, vals].transpose().collectEntries()
    // Type coercion
    map.id = map.id as int
    map.active = map.active.toBoolean()
    return map
}

// Convert to JSON
def json = JsonOutput.prettyPrint(JsonOutput.toJson(records))
println "CSV to JSON:"
println json

// Reverse: JSON back to CSV
println "\nJSON back to CSV:"
def jsonRecords = new groovy.json.JsonSlurper().parseText(json)
def csvHeaders = jsonRecords[0].keySet().toList()
println csvHeaders.join(',')
jsonRecords.each { record ->
    println csvHeaders.collect { record[it] }.join(',')
}

Output

CSV to JSON:
[
    {
        "id": 1,
        "name": "Alice",
        "email": "alice@example.com",
        "active": true
    },
    {
        "id": 2,
        "name": "Bob",
        "email": "bob@example.com",
        "active": true
    },
    {
        "id": 3,
        "name": "Charlie",
        "email": "charlie@example.com",
        "active": false
    }
]

JSON back to CSV:
id,name,email,active
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false

What happened here: CSV-to-JSON conversion is a two-step process: parse CSV to maps, then serialize maps to JSON with JsonOutput.toJson(). We added type coercion to convert string values (“1”, “true”) to their proper types. The reverse (JSON to CSV) extracts keys as headers and values as rows. For more on JSON, see our JsonSlurper guide.

Example 9: Merge and Join CSV Data

What we’re doing: Joining two CSV data sets on a common key – like a SQL JOIN.

Example 9: Merge CSV Data

def employeeCsv = '''id,name,dept_id
1,Alice,101
2,Bob,102
3,Charlie,101
4,Diana,103'''

def departmentCsv = '''dept_id,dept_name,location
101,Engineering,Building A
102,Marketing,Building B
103,Sales,Building C'''

// Parse both CSVs
def parseCsv(String csv) {
    def lines = csv.trim().split('\n')
    def headers = lines[0].split(',')
    lines[1..-1].collect { line ->
        [headers, line.split(',')].transpose().collectEntries()
    }
}

def employees = parseCsv(employeeCsv)
def departments = parseCsv(departmentCsv)

// Create a lookup map for departments
def deptLookup = departments.collectEntries { [it.dept_id, it] }

// Join: merge department info into each employee
def joined = employees.collect { emp ->
    def dept = deptLookup[emp.dept_id]
    emp + [dept_name: dept?.dept_name, location: dept?.location]
}

println "--- Joined Data ---"
joined.each { record ->
    println "${record.name.padRight(10)} | ${record.dept_name?.padRight(12)} | ${record.location}"
}

// Write joined data as CSV
println "\n--- Joined CSV ---"
def joinHeaders = ['id', 'name', 'dept_name', 'location']
println joinHeaders.join(',')
joined.each { record ->
    println joinHeaders.collect { record[it] }.join(',')
}

Output

--- Joined Data ---
Alice      | Engineering  | Building A
Bob        | Marketing    | Building B
Charlie    | Engineering  | Building A
Diana      | Sales        | Building C

--- Joined CSV ---
id,name,dept_name,location
1,Alice,Engineering,Building A
2,Bob,Marketing,Building B
3,Charlie,Engineering,Building A
4,Diana,Sales,Building C

What happened here: We created a lookup map from departments indexed by dept_id, then merged department data into each employee using the + operator on maps. This is essentially a LEFT JOIN. The ?. safe navigation handles cases where a department might not exist.

Example 10: Large CSV File Processing with eachLine

What we’re doing: Processing large CSV files line by line without loading everything into memory.

Example 10: Large File Processing

// Create a sample large CSV file
def tempFile = File.createTempFile('large', '.csv')
tempFile.deleteOnExit()

tempFile.withWriter { writer ->
    writer.writeLine('id,value,category')
    (1..100).each { i ->
        def category = ['A', 'B', 'C'][i % 3]
        writer.writeLine("${i},${(Math.random() * 1000).round(2)},${category}")
    }
}

// Process line by line - memory efficient
def counts = [:].withDefault { 0 }
def sums = [:].withDefault { 0.0 }
def lineCount = 0

tempFile.eachLine('UTF-8') { line, lineNum ->
    if (lineNum == 1) return  // Skip header

    def fields = line.split(',')
    def category = fields[2]
    def value = fields[1] as BigDecimal

    counts[category]++
    sums[category] += value
    lineCount++
}

println "Processed ${lineCount} data rows"
println "\n--- Category Summary ---"
counts.sort().each { cat, count ->
    def avg = (sums[cat] / count).round(2)
    println "  Category ${cat}: ${count} rows, total=${sums[cat].round(2)}, avg=${avg}"
}

// File size check
println "\nFile size: ${tempFile.length()} bytes"
println "Lines:     ${tempFile.readLines().size()}"

Output

Processed 100 data rows

--- Category Summary ---
  Category A: 33 rows, total=16842.53, avg=510.38
  Category B: 34 rows, total=17156.21, avg=504.59
  Category C: 33 rows, total=15923.44, avg=482.53

File size: 1247 bytes
Lines:     101

What happened here: eachLine() processes one line at a time, so memory usage stays constant regardless of file size. We used withDefault() maps to accumulate counts and sums by category. This pattern handles gigabyte-sized CSV files without issues.

Example 11: Generate CSV Reports

What we’re doing: Generating formatted CSV reports from computed data.

Example 11: CSV Report Generation

def salesData = [
    [month: 'January',  product: 'Widget A', units: 150, revenue: 4500.00],
    [month: 'January',  product: 'Widget B', units: 200, revenue: 8000.00],
    [month: 'February', product: 'Widget A', units: 175, revenue: 5250.00],
    [month: 'February', product: 'Widget B', units: 220, revenue: 8800.00],
    [month: 'March',    product: 'Widget A', units: 190, revenue: 5700.00],
    [month: 'March',    product: 'Widget B', units: 180, revenue: 7200.00]
]

// Generate summary report
def summary = salesData.groupBy { it.month }.collect { month, sales ->
    [
        month: month,
        total_units: sales.sum { it.units },
        total_revenue: sales.sum { it.revenue },
        avg_price: (sales.sum { it.revenue } / sales.sum { it.units }).round(2),
        products: sales.size()
    ]
}

// Write summary CSV
def headers = ['month', 'total_units', 'total_revenue', 'avg_price', 'products']
println "--- Monthly Summary Report (CSV) ---"
println headers.join(',')
summary.each { row ->
    println headers.collect { row[it] }.join(',')
}

// Add totals row
def totals = [
    month: 'TOTAL',
    total_units: summary.sum { it.total_units },
    total_revenue: summary.sum { it.total_revenue },
    avg_price: (summary.sum { it.total_revenue } / summary.sum { it.total_units }).round(2),
    products: summary.sum { it.products }
]
println headers.collect { totals[it] }.join(',')

Output

--- Monthly Summary Report (CSV) ---
month,total_units,total_revenue,avg_price,products
January,350,12500.00,35.71,2
February,395,14050.00,35.57,2
March,370,12900.00,34.86,2
TOTAL,1115,39450.00,35.38,6

What happened here: We used groupBy() to aggregate sales by month, computed derived values (average price), and generated both detail and summary CSV rows. The totals row at the bottom is a common reporting pattern.

OpenCSV Integration

For production-quality CSV parsing with proper handling of quoted fields, embedded newlines, and custom configurations, use the OpenCSV library. It integrates directly with Groovy via @Grab.

OpenCSV Integration

// @Grab('com.opencsv:opencsv:5.9')
// import com.opencsv.CSVReader
// import com.opencsv.CSVWriter
// import com.opencsv.CSVReaderBuilder
// import com.opencsv.RFC4180Parser
// import com.opencsv.RFC4180ParserBuilder

// Reading with OpenCSV
// def reader = new CSVReader(new FileReader('data.csv'))
// def allRows = reader.readAll()
// reader.close()

// Writing with OpenCSV
// def writer = new CSVWriter(new FileWriter('output.csv'))
// writer.writeNext(['Name', 'Age', 'City'] as String[])
// writer.writeNext(['Alice', '30', 'Springfield'] as String[])
// writer.close()

// Simulated OpenCSV behavior
println "OpenCSV Usage Pattern:"
println "  1. Add @Grab('com.opencsv:opencsv:5.9')"
println "  2. Create CSVReader with FileReader"
println "  3. Use readAll() or readNext() for rows"
println "  4. Create CSVWriter with FileWriter"
println "  5. Use writeNext() for each row"
println ""
println "OpenCSV handles:"
println "  - Quoted fields with commas inside"
println "  - Escaped quotes (\"\")"
println "  - Newlines within quoted fields"
println "  - Custom separators, quote chars, escape chars"
println "  - RFC 4180 compliance"
println "  - Header mapping to Java beans"

Output

OpenCSV Usage Pattern:
  1. Add @Grab('com.opencsv:opencsv:5.9')
  2. Create CSVReader with FileReader
  3. Use readAll() or readNext() for rows
  4. Create CSVWriter with FileWriter
  5. Use writeNext() for each row

OpenCSV handles:
  - Quoted fields with commas inside
  - Escaped quotes ("")
  - Newlines within quoted fields
  - Custom separators, quote chars, escape chars
  - RFC 4180 compliance
  - Header mapping to Java beans

Use manual parsing (split()) for simple, well-structured CSV. Switch to OpenCSV when you have quoted fields, embedded delimiters, or need RFC 4180 compliance.

Edge Cases and Best Practices

Edge Case: Empty Fields and Trailing Commas

Empty Fields

// split() vs tokenize() with empty fields
def line = 'Alice,,Engineering,,95000'

def splitResult = line.split(',', -1)  // -1 keeps trailing empties
def tokenResult = line.tokenize(',')

println "split(-1): ${splitResult.toList()} (${splitResult.length} fields)"
println "tokenize:  ${tokenResult} (${tokenResult.size()} fields)"

// Trailing comma
def trailing = 'Alice,30,'
println "\nTrailing comma:"
println "  split:     ${trailing.split(',').toList()} (${trailing.split(',').length} fields)"
println "  split(-1): ${trailing.split(',', -1).toList()} (${trailing.split(',', -1).length} fields)"

// BOM character (common in Excel exports)
def bomLine = '\uFEFFName,Age,City'
println "\nBOM detection:"
println "  Has BOM:  ${bomLine.startsWith('\uFEFF')}"
println "  Cleaned:  ${bomLine.replaceFirst('^\uFEFF', '')}"

Output

split(-1): [Alice, , Engineering, , 95000] (5 fields)
tokenize:  [Alice, Engineering, 95000] (3 fields)

Trailing comma:
  split:     [Alice, 30] (2 fields)
  split(-1): [Alice, 30, ] (3 fields)

BOM detection:
  Has BOM:  true
  Cleaned:  Name,Age,City

Best Practices

DO:

Use split(',', -1) to preserve empty trailing fields
Convert CSV rows to maps for readable, maintainable code
Use eachLine() for large files to avoid loading everything into memory
Strip BOM characters when reading files exported from Excel
Use OpenCSV for CSV that contains quoted fields or embedded delimiters

DON’T:

Use tokenize() for CSV – it drops empty fields
Assume CSV files are always comma-delimited – check first
Write CSV by hand when fields might contain commas – use proper quoting
Forget to handle encoding – use 'UTF-8' explicitly

Common Pitfalls

Pitfall 1: tokenize() Drops Empty Fields

Problem: Using tokenize() loses empty values, causing field misalignment.

Pitfall 1: tokenize vs split

def line = 'Alice,,Engineering'

// BAD - tokenize drops the empty field
def bad = line.tokenize(',')
println "tokenize: ${bad} - only ${bad.size()} fields!"

// GOOD - split preserves empty fields
def good = line.split(',', -1)
println "split:    ${good.toList()} - ${good.length} fields (correct)"

Output

tokenize: [Alice, Engineering] - only 2 fields!
split:    [Alice, , Engineering] - 3 fields (correct)

Solution: Always use split(',', -1) for CSV parsing. The -1 limit ensures trailing empty strings are preserved.

Pitfall 2: Commas Inside Values

Problem: Simple split(',') breaks when values contain commas.

Pitfall 2: Commas in Values

// This CSV has commas inside quoted values
def line = '"Smith, John",35,"New York, NY"'

// BAD - splits on commas inside quotes
def bad = line.split(',')
println "Naive split: ${bad.toList()} - ${bad.length} fields (wrong!)"

// GOOD - use a proper CSV parser
// For this post, see Example 5 for a quoted CSV parser
// Or use OpenCSV for production code
println "Use OpenCSV or a proper parser for quoted CSV fields"

Output

Naive split: ["Smith,  John", 35, "New York,  NY"] - 4 fields (wrong!)
Use OpenCSV or a proper parser for quoted CSV fields

Solution: If your CSV data might contain commas within values, the values must be quoted. Use OpenCSV or the custom parser from Example 5 to handle quoted fields correctly.

Pitfall 3: Type Confusion

Problem: All CSV values are strings – forgetting to convert causes unexpected behavior.

Pitfall 3: Type Confusion

def salary1 = '95000'
def salary2 = '100000'

// BAD - string comparison, not numeric
println "String compare: ${salary1 > salary2}"  // true! ('9' > '1')

// GOOD - convert to numbers first
println "Number compare: ${(salary1 as int) > (salary2 as int)}"  // false

// BAD - string addition
println "String add: ${salary1 + salary2}"  // 95000100000

// GOOD - numeric addition
println "Number add: ${(salary1 as int) + (salary2 as int)}"  // 195000

Output

String compare: true
Number compare: false
String add: 95000100000
Number add: 195000

Solution: Always convert CSV values to the appropriate type (as int, as BigDecimal, toBoolean()) before using them in comparisons or arithmetic.

Conclusion

Groovy CSV processing is very simple. For simple, well-structured CSV, Groovy’s built-in split(), splitEachLine(), and join() methods handle parsing and generation in just a few lines. For complex CSV with quoted fields and special characters, OpenCSV provides RFC 4180 compliance with easy Groovy integration.

The key pattern is converting CSV rows to maps using the header row as keys – this unlocks Groovy’s full collection toolkit for filtering, grouping, sorting, and aggregating your data. This technique works well for data exports, report generation, and format conversion – Groovy makes CSV work feel effortless.

For other data formats, check out our Groovy JSON Parsing guide and our Groovy File I/O Tutorial.

Summary

Use split(',', -1) for CSV parsing – preserves empty fields unlike tokenize()
Convert CSV rows to maps with [headers, values].transpose().collectEntries()
Use eachLine() for memory-efficient processing of large CSV files
Use join(',') for simple CSV writing; add quoting for values with commas
Use OpenCSV (@Grab('com.opencsv:opencsv:5.9')) for production CSV with quoted fields
Always convert string values to proper types before comparisons or arithmetic

If you also work with build tools, CI/CD pipelines, or cloud CLIs, check out Command Playground to practice 105+ CLI tools directly in your browser — no install needed.

Up next: Groovy YAML Processing

Frequently Asked Questions

How do I parse a CSV file in Groovy?

Use File.splitEachLine(',') { fields -> ... } for line-by-line parsing, or readLines() with split(',') to get all rows. Convert rows to maps using [headers, values].transpose().collectEntries() for named field access. For complex CSV with quoted fields, use the OpenCSV library via @Grab.

What is the difference between split() and tokenize() for CSV parsing?

split(',') returns a String array and preserves empty fields (e.g., 'a,,c'.split(',') gives ['a', '', 'c']). tokenize(',') returns a List but drops empty tokens (gives ['a', 'c']). Always use split(',', -1) for CSV parsing to preserve all fields including trailing empty ones.

How do I handle CSV files with quoted fields in Groovy?

Simple split(',') fails when values contain commas inside quotes. For production use, add OpenCSV with @Grab('com.opencsv:opencsv:5.9') and use CSVReader. For lightweight scripts, write a custom parser that tracks quote state while iterating through characters (see Example 5 in this guide).

How do I write CSV output in Groovy?

Use list.join(',') to convert a list of values to a CSV line, and File.withWriter { writer -> writer.writeLine(csvLine) } to write to a file. For values that might contain commas or quotes, wrap them in double quotes and escape internal quotes by doubling them. OpenCSV’s CSVWriter handles this automatically.

How do I convert CSV to JSON in Groovy?

First parse the CSV to a list of maps using [headers, values].transpose().collectEntries(). Then convert to JSON with JsonOutput.toJson(listOfMaps). Add type coercion (as int, toBoolean()) before serialization so numbers and booleans aren’t quoted in the JSON output.

Previous in Series: Groovy REST API Consumption

Next in Series: Groovy YAML Processing

Groovy, Groovy Metaprogramming

Groovy CSV Parsing and Generation – 10+ Tested Examples

Table of Contents

What Is CSV Processing in Groovy?

Why Use Groovy for CSV?

Basic Syntax

10+ Practical Examples

Example 1: Parse a Simple CSV String

Example 2: Convert CSV Rows to Maps

Example 3: Read CSV from a File

Example 4: Write CSV to a File

Example 5: Handle Quoted CSV Fields

Example 6: CSV with Different Delimiters

Example 7: Filter, Sort, and Aggregate CSV Data

Example 8: CSV to JSON Conversion

Example 9: Merge and Join CSV Data

Example 10: Large CSV File Processing with eachLine

Example 11: Generate CSV Reports

OpenCSV Integration

Edge Cases and Best Practices

Edge Case: Empty Fields and Trailing Commas

Best Practices

Common Pitfalls

Pitfall 1: tokenize() Drops Empty Fields

Pitfall 2: Commas Inside Values

Pitfall 3: Type Confusion

Conclusion

Summary

Frequently Asked Questions

How do I parse a CSV file in Groovy?

What is the difference between split() and tokenize() for CSV parsing?

How do I handle CSV files with quoted fields in Groovy?

How do I write CSV output in Groovy?

How do I convert CSV to JSON in Groovy?

Related Posts

Related Topics You Might Like:

PreviousGroovy REST API Consumption – Fetch and Parse JSON with 10+ Examples

NextGroovy Date and Time – Modern API Guide with 12 Tested Examples

RahulAuthor posts

Related Posts ...

Groovy Advanced AST Transforms – Part 2 with 10+ Examples

Groovy Modern Features – Cookbook Guide with 10+ Examples

Groovy Number Math – Cookbook Guide with 10+ Examples

GPars Tutorial – Groovy Parallel Programming with 10+ Examples

Groovy HTTP REST – Cookbook Guide with 10+ Examples

Groovy Script vs Class – 10+ Tested Examples

No comment

Leave a Reply Cancel reply