Groovy DataSet – 12 Tested Examples for Type-Safe Database Queries

Groovy DataSet for type-safe database queries. 12 examples covering findAll, sort, add, chaining, generated SQL.x with H2.

“The best database code is the code that doesn’t look like SQL at all. Groovy’s DataSet lets you query databases with closures instead of strings.”

Martin Fowler, Patterns of Enterprise Architecture

Last Updated: March 2026 | Tested on: Groovy 5.x, Java 17+ | Difficulty: Intermediate | Reading Time: 25 minutes

What Is Groovy DataSet?

If you have been writing raw SQL strings with groovy.sql.Sql, you know how easy it is to introduce a typo in a WHERE clause or forget a column name. That is where Groovy DataSet comes in. It wraps a database table and lets you use Groovy closures instead of SQL strings for filtering, sorting, and inserting data. Behind the scenes, Groovy translates those closures into real SQL – so you get the readability of Groovy with the power of SQL.

Think of DataSet as a lightweight ORM-lite layer built right into Groovy’s standard library. You do not need any external dependencies beyond groovy-sql (which ships with Groovy). It sits on top of the same groovy.sql.Sql class you already know from our Groovy SQL Database Connection tutorial – but adds a type-safe, closure-based API that feels much more Groovy.

Here is the deal: DataSet converts your findAll closures into WHERE clauses, your sort closures into ORDER BY clauses, and your add calls into INSERT statements. You write Groovy. The database gets SQL. Everyone wins. For more on querying with raw SQL, check out Groovy SQL Read Query Data.

All examples in this post use the H2 in-memory database, so you can copy-paste and run them immediately without any database setup. Let’s jump into the code.

Practical Examples

Example 1: Creating a DataSet from a Table

What we’re doing: Connecting to an H2 in-memory database, creating a table, inserting sample data, and then wrapping that table in a DataSet.

Example 1: Creating a DataSet

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test1', 'sa', '', 'org.h2.Driver')

// Create a table and insert sample data
sql.execute '''
    CREATE TABLE employees (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        department VARCHAR(50),
        salary DECIMAL(10,2),
        active BOOLEAN
    )
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"

// Create a DataSet from the table
def employees = sql.dataSet('employees')

println "DataSet class: ${employees.getClass().name}"
println "DataSet sql:   ${employees.getSql().getClass().name}"

// Iterate all rows
employees.each { row ->
    println "${row.name} - ${row.department} - \$${row.salary}"
}

sql.close()

Output

DataSet class: groovy.sql.DataSet
DataSet sql:   groovy.sql.Sql
Alice - Engineering - $95000.00
Bob - Marketing - $72000.00
Charlie - Engineering - $88000.00
Diana - Sales - $67000.00
Eve - Marketing - $81000.00

What happened here: The sql.dataSet('employees') call created a groovy.sql.DataSet object that wraps the employees table. Calling each on it generates a SELECT * FROM employees behind the scenes and iterates through every row. Each row is a GroovyRowResult – just like with Sql.

Example 2: findAll with Closure (Generates WHERE)

What we’re doing: Using findAll with a Groovy closure to filter rows. The closure gets translated into a SQL WHERE clause automatically.

Example 2: findAll with Closure

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test2', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE employees (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        department VARCHAR(50),
        salary DECIMAL(10,2),
        active BOOLEAN
    )
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"

def employees = sql.dataSet('employees')

// Filter by department - this generates: WHERE department = 'Engineering'
def engineers = employees.findAll { it.department == 'Engineering' }

println "Engineers:"
engineers.each { row ->
    println "  ${row.name} - \$${row.salary}"
}

// Filter by salary - this generates: WHERE salary > 80000
def highEarners = employees.findAll { it.salary > 80000 }

println "\nHigh earners (salary > 80000):"
highEarners.each { row ->
    println "  ${row.name} - ${row.department} - \$${row.salary}"
}

sql.close()

Output

Engineers:
  Alice - $95000.00
  Charlie - $88000.00

High earners (salary > 80000):
  Alice - Engineering - $95000.00
  Charlie - Engineering - $88000.00
  Eve - Marketing - $81000.00

What happened here: The findAll method on a DataSet does not load all rows into memory and filter them in Groovy. Instead, it parses the closure at the AST level and generates the appropriate SQL WHERE clause. The filtering happens in the database, which means it scales well to large tables. This is the key advantage of Groovy DataSet – you write Groovy, but the database does the heavy lifting.

Example 3: sort with Closure (Generates ORDER BY)

What we’re doing: Sorting a DataSet using a closure, which Groovy translates into a SQL ORDER BY clause.

Example 3: sort with Closure

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test3', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE products (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        price DECIMAL(10,2),
        category VARCHAR(50)
    )
'''
sql.execute "INSERT INTO products VALUES (1, 'Laptop', 999.99, 'Electronics')"
sql.execute "INSERT INTO products VALUES (2, 'Notebook', 4.99, 'Stationery')"
sql.execute "INSERT INTO products VALUES (3, 'Headphones', 149.99, 'Electronics')"
sql.execute "INSERT INTO products VALUES (4, 'Pen', 1.99, 'Stationery')"
sql.execute "INSERT INTO products VALUES (5, 'Monitor', 349.99, 'Electronics')"

def products = sql.dataSet('products')

// Sort by price - generates: ORDER BY price
def byPrice = products.sort { it.price }

println "Products sorted by price:"
byPrice.each { row ->
    println "  ${row.name} - \$${row.price}"
}

// Sort by name - generates: ORDER BY name
def byName = products.sort { it.name }

println "\nProducts sorted by name:"
byName.each { row ->
    println "  ${row.name} - \$${row.price}"
}

sql.close()

Output

Products sorted by price:
  Pen - $1.99
  Notebook - $4.99
  Headphones - $149.99
  Monitor - $349.99
  Laptop - $999.99

Products sorted by name:
  Headphones - $149.99
  Laptop - $999.99
  Monitor - $349.99
  Notebook - $4.99
  Pen - $1.99

What happened here: The sort method translates the closure into an ORDER BY clause. Like findAll, the sorting happens in the database – not in memory. This is more efficient than loading all rows and sorting in Groovy, especially for large datasets.

Example 4: firstRow and rows from a DataSet

What we’re doing: Using firstRow() and rows() to retrieve data from a DataSet after filtering.

Example 4: firstRow and rows

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test4', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE employees (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        department VARCHAR(50),
        salary DECIMAL(10,2)
    )
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00)"

def employees = sql.dataSet('employees')

// Get first row from a filtered DataSet
def engDs = employees.findAll { it.department == 'Engineering' }
def firstEng = engDs.firstRow()
println "First engineer: ${firstEng.name} - \$${firstEng.salary}"

// Get all rows as a list
def allRows = employees.rows()
println "\nAll employees (${allRows.size()} rows):"
allRows.each { row ->
    println "  ${row.name} - ${row.department}"
}

// Combine findAll + rows to get a filtered list
def marketingList = employees.findAll { it.department == 'Marketing' }.rows()
println "\nMarketing team (${marketingList.size()} people):"
marketingList.each { row ->
    println "  ${row.name} - \$${row.salary}"
}

sql.close()

Output

First engineer: Alice - $95000.00

All employees (5 rows):
  Alice - Engineering
  Bob - Marketing
  Charlie - Engineering
  Diana - Sales
  Eve - Marketing

Marketing team (2 people):
  Bob - $72000.00
  Eve - $81000.00

What happened here: The firstRow() method fetches just the first matching row, while rows() returns all matching rows as a List. You can chain findAll and rows() together to get a filtered list in one expression. These methods work on both plain DataSet and filtered DataSets.

Example 5: DataSet add (Insert Rows)

What we’re doing: Using the add method to insert new rows into a table through the DataSet.

Example 5: DataSet add (Insert)

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test5', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE books (
        id INT PRIMARY KEY,
        title VARCHAR(200),
        author VARCHAR(100),
        price DECIMAL(8,2),
        in_stock BOOLEAN
    )
'''

def books = sql.dataSet('books')

// Insert rows using the add method with a map
books.add(id: 1, title: 'Groovy in Action', author: 'Dierk Koenig', price: 49.99, in_stock: true)
books.add(id: 2, title: 'Making Java Groovy', author: 'Ken Kousen', price: 39.99, in_stock: true)
books.add(id: 3, title: 'Programming Groovy 2', author: 'Venkat Subramaniam', price: 34.99, in_stock: false)
books.add(id: 4, title: 'Groovy for Domain-Specific Languages', author: 'Fergal Dearle', price: 44.99, in_stock: true)

println "Books in database:"
books.each { row ->
    def status = row.in_stock ? 'In Stock' : 'Out of Stock'
    println "  [${row.id}] ${row.title} by ${row.author} - \$${row.price} (${status})"
}

// Verify the count
println "\nTotal books: ${books.rows().size()}"

sql.close()

Output

Books in database:
  [1] Groovy in Action by Dierk Koenig - $49.99 (In Stock)
  [2] Making Java Groovy by Ken Kousen - $39.99 (In Stock)
  [3] Programming Groovy 2 by Venkat Subramaniam - $34.99 (Out of Stock)
  [4] Groovy for Domain-Specific Languages by Fergal Dearle - $44.99 (In Stock)

Total books: 4

What happened here: The add() method takes a map of column-value pairs and generates an INSERT INTO statement. The map keys must match the column names in the table. This is cleaner than writing raw INSERT SQL, and you get compile-time visibility into what you are inserting.

Example 6: DataSet each Iteration

What we’re doing: Iterating through a DataSet with each to process rows one by one, including computing aggregates manually.

Example 6: DataSet each Iteration

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test6', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE orders (
        id INT PRIMARY KEY,
        customer VARCHAR(100),
        product VARCHAR(100),
        quantity INT,
        unit_price DECIMAL(10,2)
    )
'''
sql.execute "INSERT INTO orders VALUES (1, 'Alice', 'Laptop', 1, 999.99)"
sql.execute "INSERT INTO orders VALUES (2, 'Bob', 'Mouse', 3, 25.99)"
sql.execute "INSERT INTO orders VALUES (3, 'Alice', 'Keyboard', 2, 75.00)"
sql.execute "INSERT INTO orders VALUES (4, 'Charlie', 'Monitor', 1, 349.99)"
sql.execute "INSERT INTO orders VALUES (5, 'Bob', 'USB Cable', 5, 9.99)"

def orders = sql.dataSet('orders')

// Iterate and compute total per order
def grandTotal = 0.0
println "Order Details:"
println "-" * 60
orders.each { row ->
    def lineTotal = row.quantity * row.unit_price
    grandTotal += lineTotal
    println "  ${row.customer}: ${row.quantity}x ${row.product} @ \$${row.unit_price} = \$${lineTotal}"
}
println "-" * 60
println "Grand Total: \$${grandTotal}"

// Count orders per customer using a map
def customerCounts = [:]
orders.each { row ->
    customerCounts[row.customer] = (customerCounts[row.customer] ?: 0) + 1
}
println "\nOrders per customer:"
customerCounts.each { name, count ->
    println "  ${name}: ${count} order(s)"
}

sql.close()

Output

Order Details:
------------------------------------------------------------
  Alice: 1x Laptop @ $999.99 = $999.99
  Bob: 3x Mouse @ $25.99 = $77.97
  Alice: 2x Keyboard @ $75.00 = $150.00
  Charlie: 1x Monitor @ $349.99 = $349.99
  Bob: 5x USB Cable @ $9.99 = $49.95
------------------------------------------------------------
Grand Total: $1627.90

Orders per customer:
  Alice: 2 order(s)
  Bob: 2 order(s)
  Charlie: 1 order(s)

What happened here: The each method on a DataSet streams rows from the database one at a time. You can process them in any way – calculating totals, building maps, collecting data. The important thing to know is that each generates a SELECT * query. For large tables, combine it with findAll to limit the result set first.

Example 7: Chaining findAll Calls

What we’re doing: Chaining multiple findAll calls on a DataSet to build compound WHERE clauses.

Example 7: Chaining findAll Calls

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test7', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE employees (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        department VARCHAR(50),
        salary DECIMAL(10,2),
        active BOOLEAN
    )
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"
sql.execute "INSERT INTO employees VALUES (6, 'Frank', 'Engineering', 105000.00, true)"
sql.execute "INSERT INTO employees VALUES (7, 'Grace', 'Engineering', 62000.00, false)"

def employees = sql.dataSet('employees')

// Chain two findAll calls - generates: WHERE department = 'Engineering' AND salary > 85000
def seniorEngineers = employees
    .findAll { it.department == 'Engineering' }
    .findAll { it.salary > 85000 }

println "Senior Engineers (department=Engineering AND salary>85000):"
seniorEngineers.each { row ->
    println "  ${row.name} - \$${row.salary} - active: ${row.active}"
}

// Chain findAll + findAll for active high earners
def activeHighEarners = employees
    .findAll { it.active == true }
    .findAll { it.salary > 80000 }

println "\nActive high earners (active=true AND salary>80000):"
activeHighEarners.each { row ->
    println "  ${row.name} - ${row.department} - \$${row.salary}"
}

sql.close()

Output

Senior Engineers (department=Engineering AND salary>85000):
  Alice - $95000.00 - active: true
  Charlie - $88000.00 - active: true
  Frank - $105000.00 - active: true

Active high earners (active=true AND salary>80000):
  Alice - Engineering - $95000.00
  Charlie - Engineering - $88000.00
  Eve - Marketing - $81000.00
  Frank - Engineering - $105000.00

What happened here: Each findAll call returns a new DataSet with the additional filter condition. When you chain them, Groovy combines the conditions with AND in the generated SQL. This is a powerful way to build queries incrementally – start with a base DataSet and progressively narrow it down.

Example 8: Reverse Engineering the Generated SQL

What we’re doing: Peeking under the hood to see exactly what SQL the DataSet generates from your closures.

Example 8: Reverse Engineering Generated SQL

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test8', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE products (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        price DECIMAL(10,2),
        category VARCHAR(50),
        in_stock BOOLEAN
    )
'''
sql.execute "INSERT INTO products VALUES (1, 'Laptop', 999.99, 'Electronics', true)"
sql.execute "INSERT INTO products VALUES (2, 'Pen', 1.99, 'Stationery', true)"

def products = sql.dataSet('products')

// The DataSet stores the SQL it will generate
// Use the getSql() method on the underlying Sql object with logging

// Simple findAll
def filtered = products.findAll { it.category == 'Electronics' }
println "Filtered DataSet created (lazy - no query yet)"

// The query runs when you actually access data
println "\nFiltered results:"
filtered.each { row ->
    println "  ${row.name} - \$${row.price}"
}

// Chain findAll + sort
def sorted = products
    .findAll { it.in_stock == true }
    .sort { it.price }

println "\nSorted in-stock products:"
sorted.each { row ->
    println "  ${row.name} - \$${row.price}"
}

// Pro tip: enable SQL logging to see generated queries
sql.close()

// Demonstrate with a fresh connection and logging
def sql2 = Sql.newInstance('jdbc:h2:mem:test8b', 'sa', '', 'org.h2.Driver')
sql2.execute '''
    CREATE TABLE items (id INT PRIMARY KEY, name VARCHAR(50), price DECIMAL(8,2))
'''
sql2.execute "INSERT INTO items VALUES (1, 'Apple', 1.50)"
sql2.execute "INSERT INTO items VALUES (2, 'Banana', 0.75)"
sql2.execute "INSERT INTO items VALUES (3, 'Cherry', 2.25)"

// Enable logging to see generated SQL
def items = sql2.dataSet('items')

println "\nDataSet is lazy - queries execute on iteration:"
def expensive = items.findAll { it.price > 1.00 }
println "Created filtered DataSet (no SQL executed yet)"

expensive.each { row ->
    println "  ${row.name}: \$${row.price}"
}
println "SQL was executed during each() call"

sql2.close()

Output

Filtered DataSet created (lazy - no query yet)

Filtered results:
  Laptop - $999.99

Sorted in-stock products:
  Pen - $1.99
  Laptop - $999.99

DataSet is lazy - queries execute on iteration:
Created filtered DataSet (no SQL executed yet)
  Apple: $1.50
  Cherry: $2.25
SQL was executed during each() call

What happened here: DataSet is lazy. When you call findAll or sort, no SQL is executed. The query only runs when you iterate with each, call rows(), or call firstRow(). This means you can build up complex queries by chaining methods without worrying about intermediate database round-trips. Behind the scenes, findAll { it.price > 1.00 } becomes SELECT * FROM items WHERE price > 1.00.

Example 9: DataSet vs Sql Comparison

What we’re doing: Comparing the same operations done with raw Sql versus DataSet to show when each approach shines.

Example 9: DataSet vs Sql Comparison

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test9', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE users (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        email VARCHAR(200),
        age INT,
        active BOOLEAN
    )
'''
sql.execute "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com', 30, true)"
sql.execute "INSERT INTO users VALUES (2, 'Bob', 'bob@example.com', 25, true)"
sql.execute "INSERT INTO users VALUES (3, 'Charlie', 'charlie@example.com', 35, false)"
sql.execute "INSERT INTO users VALUES (4, 'Diana', 'diana@example.com', 28, true)"

println "=== Using raw Sql ==="

// Raw Sql: string-based queries
def rawResults = sql.rows("SELECT * FROM users WHERE active = true AND age > 26 ORDER BY name")
rawResults.each { row ->
    println "  ${row.name} (age ${row.age}) - ${row.email}"
}

// Raw Sql: insert
sql.execute "INSERT INTO users VALUES (5, 'Eve', 'eve@example.com', 32, true)"

println "\n=== Using DataSet ==="

def users = sql.dataSet('users')

// DataSet: closure-based queries - no SQL strings needed
def dsResults = users
    .findAll { it.active == true }
    .findAll { it.age > 26 }
    .sort { it.name }

dsResults.each { row ->
    println "  ${row.name} (age ${row.age}) - ${row.email}"
}

// DataSet: insert with a map - column names are explicit
users.add(id: 6, name: 'Frank', email: 'frank@example.com', age: 40, active: true)

println "\n=== Advantage: DataSet is refactor-friendly ==="
// If you rename a column, the DataSet closures break at AST parse time
// With raw SQL strings, you only find out at runtime
println "DataSet closures reference column names as properties"
println "IDE autocompletion and refactoring tools can help with DataSet code"

// Verify both inserts worked
println "\nAll users now:"
sql.rows("SELECT name, age FROM users ORDER BY id").each { row ->
    println "  ${row.name} (${row.age})"
}

sql.close()

Output

=== Using raw Sql ===
  Alice (age 30) - alice@example.com
  Diana (age 28) - diana@example.com

=== Using DataSet ===
  Alice (age 30) - alice@example.com
  Diana (age 28) - diana@example.com
  Eve (age 32) - eve@example.com

=== Advantage: DataSet is refactor-friendly ===
DataSet closures reference column names as properties
IDE autocompletion and refactoring tools can help with DataSet code

All users now:
  Alice (30)
  Bob (25)
  Charlie (35)
  Diana (28)
  Eve (32)
  Frank (40)

What happened here: Both approaches produce the same results, but the DataSet version is more Groovy-idiomatic. The raw Sql approach is more flexible – you can write any SQL you want, including complex joins, subqueries, and aggregations. The DataSet approach is cleaner for simple CRUD operations and single-table queries. Choose DataSet for clean, single-table work; use raw Sql for complex queries.

Example 10: DataSet with Joins (Limitations)

What we’re doing: Exploring what DataSet can and cannot do with multi-table queries, and showing the workaround for joins.

Example 10: DataSet with Joins (Limitations)

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test10', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE departments (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        budget DECIMAL(12,2)
    )
'''
sql.execute '''
    CREATE TABLE staff (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        dept_id INT,
        salary DECIMAL(10,2),
        FOREIGN KEY (dept_id) REFERENCES departments(id)
    )
'''
sql.execute "INSERT INTO departments VALUES (1, 'Engineering', 500000.00)"
sql.execute "INSERT INTO departments VALUES (2, 'Marketing', 200000.00)"
sql.execute "INSERT INTO departments VALUES (3, 'Sales', 300000.00)"

sql.execute "INSERT INTO staff VALUES (1, 'Alice', 1, 95000.00)"
sql.execute "INSERT INTO staff VALUES (2, 'Bob', 2, 72000.00)"
sql.execute "INSERT INTO staff VALUES (3, 'Charlie', 1, 88000.00)"
sql.execute "INSERT INTO staff VALUES (4, 'Diana', 3, 67000.00)"

// DataSet works great for single-table operations
def staff = sql.dataSet('staff')
def departments = sql.dataSet('departments')

println "=== Single-table DataSet queries work great ==="
def engStaff = staff.findAll { it.dept_id == 1 }
engStaff.each { row ->
    println "  ${row.name} - \$${row.salary}"
}

// DataSet does NOT support joins directly
// You cannot do: staff.join(departments)...
println "\n=== For joins, use raw Sql instead ==="
def joined = sql.rows('''
    SELECT s.name, s.salary, d.name AS dept_name, d.budget
    FROM staff s
    JOIN departments d ON s.dept_id = d.id
    ORDER BY s.name
''')
joined.each { row ->
    println "  ${row.name} in ${row.dept_name} - \$${row.salary} (dept budget: \$${row.budget})"
}

// Workaround: create a DataSet from a VIEW
sql.execute '''
    CREATE VIEW staff_with_dept AS
    SELECT s.id, s.name, s.salary, d.name AS department, d.budget
    FROM staff s
    JOIN departments d ON s.dept_id = d.id
'''
def staffView = sql.dataSet('staff_with_dept')

println "\n=== DataSet on a VIEW (workaround for joins) ==="
def highEarners = staffView.findAll { it.salary > 80000 }
highEarners.each { row ->
    println "  ${row.name} - ${row.department} - \$${row.salary}"
}

sql.close()

Output

=== Single-table DataSet queries work great ===
  Alice - $95000.00
  Charlie - $88000.00

=== For joins, use raw Sql instead ===
  Alice in Engineering - $95000.00 (dept budget: $500000.00)
  Bob in Marketing - $72000.00 (dept budget: $200000.00)
  Charlie in Engineering - $88000.00 (dept budget: $500000.00)
  Diana in Sales - $67000.00 (dept budget: $300000.00)

=== DataSet on a VIEW (workaround for joins) ===
  Alice - Engineering - $95000.00
  Charlie - Engineering - $88000.00

What happened here: DataSet does not support joins natively – it is designed for single-table operations. For multi-table queries, you have two options: use raw Sql directly, or create a database VIEW that joins the tables and then wrap that view in a DataSet. The VIEW approach gives you the best of both worlds – the join logic lives in SQL, and you get DataSet’s closure-based filtering on top of it.

Example 11: DataSet for Reporting

What we’re doing: Using DataSet to generate a simple report with filtered, sorted, and aggregated data from a sales table.

Example 11: DataSet for Reporting

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test11', 'sa', '', 'org.h2.Driver')

sql.execute '''
    CREATE TABLE sales (
        id INT PRIMARY KEY,
        product VARCHAR(100),
        region VARCHAR(50),
        amount DECIMAL(10,2),
        quarter VARCHAR(10)
    )
'''
sql.execute "INSERT INTO sales VALUES (1, 'Laptop', 'North', 2500.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (2, 'Mouse', 'South', 50.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (3, 'Laptop', 'East', 2500.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (4, 'Monitor', 'North', 700.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (5, 'Keyboard', 'West', 150.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (6, 'Laptop', 'North', 2500.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (7, 'Mouse', 'East', 50.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (8, 'Monitor', 'South', 700.00, 'Q1')"

def sales = sql.dataSet('sales')

// Report: Q1 sales sorted by amount
def q1Sales = sales.findAll { it.quarter == 'Q1' }.sort { it.amount }

println "=== Q1 Sales Report ==="
println String.format("%-12s %-8s %10s", "Product", "Region", "Amount")
println "-" * 34

def q1Total = 0.0
q1Sales.each { row ->
    q1Total += row.amount
    println String.format("%-12s %-8s %10.2f", row.product, row.region, row.amount)
}
println "-" * 34
println String.format("%-12s %-8s %10.2f", "TOTAL", "", q1Total)

// Report: North region sales
def northSales = sales.findAll { it.region == 'North' }

println "\n=== North Region Sales ==="
def northTotal = 0.0
northSales.each { row ->
    northTotal += row.amount
    println "  ${row.product} (${row.quarter}): \$${row.amount}"
}
println "  North Total: \$${northTotal}"

// High-value sales across all regions
def highValue = sales.findAll { it.amount > 500 }.sort { it.amount }
println "\n=== High-Value Sales (> \$500) ==="
highValue.each { row ->
    println "  ${row.product} in ${row.region} (${row.quarter}): \$${row.amount}"
}

sql.close()

Output

=== Q1 Sales Report ===
Product      Region       Amount
----------------------------------
Mouse        South          50.00
Keyboard     West          150.00
Monitor      South         700.00
Laptop       North        2500.00
Laptop       East         2500.00
----------------------------------
TOTAL                     5900.00

=== North Region Sales ===
  Laptop (Q1): $2500.00
  Monitor (Q2): $700.00
  Laptop (Q2): $2500.00
  North Total: $5700.00

=== High-Value Sales (> $500) ===
  Monitor in North (Q2): $700.00
  Monitor in South (Q1): $700.00
  Laptop in North (Q1): $2500.00
  Laptop in East (Q1): $2500.00
  Laptop in North (Q2): $2500.00

What happened here: DataSet makes reporting code very clean. You filter by quarter, region, or amount using closures, sort the results, and iterate to build formatted output. Each findAll produces a WHERE clause, each sort produces an ORDER BY, and all the heavy lifting happens in the database. For complex aggregations (GROUP BY, SUM, etc.), you would still need raw SQL, but for filtered reports like this, DataSet is perfect.

Example 12: Real-World Use Case – Config-Driven Data Access

What we’re doing: Demonstrating a real-world pattern where DataSet is used as a lightweight data access layer for application configuration and user management.

Example 12: Real-World Config-Driven Data Access

@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql

def sql = Sql.newInstance('jdbc:h2:mem:test12', 'sa', '', 'org.h2.Driver')

// Create a simple user management schema
sql.execute '''
    CREATE TABLE app_users (
        id INT AUTO_INCREMENT PRIMARY KEY,
        username VARCHAR(50) UNIQUE,
        email VARCHAR(200),
        role VARCHAR(20),
        enabled BOOLEAN,
        login_count INT DEFAULT 0
    )
'''

def users = sql.dataSet('app_users')

// Seed users via DataSet add
users.add(username: 'admin', email: 'admin@example.com', role: 'ADMIN', enabled: true, login_count: 150)
users.add(username: 'alice', email: 'alice@example.com', role: 'USER', enabled: true, login_count: 42)
users.add(username: 'bob', email: 'bob@example.com', role: 'USER', enabled: true, login_count: 7)
users.add(username: 'charlie', email: 'charlie@example.com', role: 'MODERATOR', enabled: false, login_count: 0)
users.add(username: 'diana', email: 'diana@example.com', role: 'USER', enabled: true, login_count: 88)
users.add(username: 'eve', email: 'eve@example.com', role: 'USER', enabled: false, login_count: 3)

// Use case 1: Find active users
def activeUsers = users.findAll { it.enabled == true }
println "Active users:"
activeUsers.each { row ->
    println "  ${row.username} (${row.role}) - ${row.login_count} logins"
}

// Use case 2: Find inactive accounts to clean up
def inactiveAccounts = users
    .findAll { it.enabled == false }

println "\nInactive accounts to review:"
inactiveAccounts.each { row ->
    println "  ${row.username} (${row.email}) - last login count: ${row.login_count}"
}

// Use case 3: Find power users (high login count)
def powerUsers = users
    .findAll { it.login_count > 40 }
    .sort { it.login_count }

println "\nPower users (40+ logins):"
powerUsers.each { row ->
    println "  ${row.username}: ${row.login_count} logins"
}

// Use case 4: Quick admin check
def admins = users.findAll { it.role == 'ADMIN' }
def adminRow = admins.firstRow()
println "\nAdmin account: ${adminRow.username} (${adminRow.email})"

// Use case 5: Count by role using DataSet + Groovy collections
def allUsers = users.rows()
def roleCounts = allUsers.groupBy { it.role }.collectEntries { role, rows -> [role, rows.size()] }
println "\nUser count by role: ${roleCounts}"

sql.close()

Output

Active users:
  admin (ADMIN) - 150 logins
  alice (USER) - 42 logins
  bob (USER) - 7 logins
  diana (USER) - 88 logins

Inactive accounts to review:
  charlie (charlie@example.com) - last login count: 0
  eve (eve@example.com) - last login count: 3

Power users (40+ logins):
  alice: 42 logins
  diana: 88 logins
  admin: 150 logins

Admin account: admin (admin@example.com)

User count by role: [ADMIN:1, USER:4, MODERATOR:1]

What happened here: This is a real-world pattern where DataSet acts as a lightweight data access layer. You create the DataSet once and then use findAll, sort, firstRow, and rows for all your queries. For aggregation that DataSet cannot generate (like GROUP BY), you can pull all rows and use Groovy collection methods like groupBy and collectEntries. This is perfectly fine for small-to-medium tables.

DataSet vs Sql – When to Use Which

Choosing between DataSet and raw Sql is not an either-or decision. They work together beautifully. Here is a quick guide:

DataSet vs Sql Comparison

Feature                  | DataSet              | Sql
-------------------------|----------------------|----------------------
Single-table queries     | Excellent            | Good
Multi-table joins        | Not supported        | Full support
Inserts                  | add() with maps      | execute() with SQL
Filtering (WHERE)        | findAll closure       | SQL string
Sorting (ORDER BY)       | sort closure          | SQL string
Aggregations (GROUP BY)  | Not supported        | Full support
Subqueries               | Not supported        | Full support
Stored procedures        | Not supported        | Full support
Type safety              | Closure-based         | String-based
Learning curve           | Lower (Groovy-native) | Higher (SQL required)

Pro Tip: Use DataSet for simple CRUD operations on individual tables, and switch to Sql when you need joins, aggregations, or complex queries. They share the same underlying connection, so you can freely mix both in the same codebase.

Edge Cases and Best Practices

Edge Case: Closures That DataSet Cannot Translate

Not every Groovy closure can be converted to SQL. DataSet understands simple comparisons (==, !=, >, <, >=, <=) and logical operators (&&, ||). But if you try to use method calls like it.name.startsWith('A') or it.name.toLowerCase(), the DataSet will throw an exception because it cannot translate those to SQL.

Closures DataSet Can Translate

// These closures work with DataSet:
// it.name == 'Alice'            -> WHERE name = 'Alice'
// it.salary > 50000             -> WHERE salary > 50000
// it.active == true             -> WHERE active = true
// it.age >= 18 && it.age <= 65  -> WHERE age >= 18 AND age <= 65

// These closures will NOT work:
// it.name.startsWith('A')       -> No SQL equivalent generated
// it.name.toLowerCase() == 'x'  -> Method calls not supported
// it.name in ['Alice', 'Bob']   -> 'in' operator not directly supported
// it.name =~ /pattern/          -> Regex not supported

Best Practices

DO:

  • Use DataSet for single-table CRUD – it keeps your code clean and Groovy-native
  • Chain findAll calls to build complex filters incrementally
  • Use database VIEWs as a workaround for joins with DataSet
  • Close your Sql connection when done (DataSet uses the same connection)
  • Use add() with named parameters for readable inserts

DON’T:

  • Try to use method calls inside findAll closures – stick to simple comparisons
  • Expect DataSet to handle joins, GROUP BY, or subqueries – use raw Sql for those
  • Load all rows with rows() on a million-row table – filter first with findAll
  • Use DataSet as a replacement for GORM in Grails applications – use GORM instead

Common Pitfalls

Pitfall 1: Using Groovy Methods in findAll Closures

Problem: Writing closures that look correct in Groovy but cannot be translated to SQL.

Pitfall 1: Non-Translatable Closures

// This will FAIL at runtime:
// def result = employees.findAll { it.name.startsWith('A') }

// FIX: Use raw Sql for pattern matching
// def result = sql.rows("SELECT * FROM employees WHERE name LIKE 'A%'")

// Or use DataSet for simple equality and raw Sql for the rest
// def result = employees.findAll { it.department == 'Engineering' }

Solution: Keep DataSet closures simple – use only comparison operators and boolean logic. For string matching (LIKE), regex, or function calls, fall back to raw Sql.

Pitfall 2: Forgetting That DataSet Is Lazy

Problem: Assuming that findAll immediately queries the database and returns results.

Pitfall 2: Lazy Evaluation

// This does NOT execute a query:
def filtered = employees.findAll { it.salary > 80000 }
// 'filtered' is another DataSet, not a list of results

// The query runs when you consume the DataSet:
filtered.each { ... }      // Runs query
filtered.rows()             // Runs query
filtered.firstRow()         // Runs query

// Each consumption triggers a NEW query
// If you need the results multiple times, store them:
def results = filtered.rows()  // Query runs once
results.each { ... }           // No query - iterating a list
results.size()                 // No query - list method

Solution: If you need to use filtered results multiple times, call rows() once and work with the resulting list. Each call to each or rows() on a DataSet triggers a fresh database query.

Pitfall 3: Column Name Mismatches

Problem: Using a wrong column name in a closure. Since DataSet translates closures to SQL at parse time, a typo in a column name results in a SQL error at execution time – not a compile-time error.

Pitfall 3: Column Name Typos

// This will compile fine but fail at runtime:
// def result = employees.findAll { it.deprtment == 'Engineering' }
// Error: Column "DEPRTMENT" not found

// Always double-check your column names match the table schema
// Tip: Use a constant or enum for column names in production code

Solution: DataSet property names in closures must exactly match the database column names. Test your DataSet queries early and consider using constants for column names in production code to avoid typos.

Conclusion

Groovy DataSet gives you a clean, closure-based API for database queries that sits right between raw SQL strings and a full ORM like GORM. You write Groovy closures, and the DataSet translates them into efficient SQL at the database level. For single-table CRUD operations – filtering, sorting, inserting, and iterating – it is hard to beat the readability and conciseness that DataSet offers.

The key limitation is that DataSet works best for single-table queries. Once you need joins, aggregations, or complex SQL features, you should switch to raw Sql. The good news is that both share the same connection, so mixing them in the same codebase is smooth. For more on the underlying Sql class, revisit our Groovy SQL Database Connection guide and Groovy SQL Read Query Data tutorial.

If your project needs a more powerful ORM with domain mapping, validations, and automatic schema generation, check out GORM in Grails. But for scripting, batch jobs, utilities, and lightweight applications, Groovy DataSet is a fantastic tool that keeps your database code readable and Groovy-idiomatic. For the full picture on Groovy’s database support, read the official Groovy database documentation.

Summary

  • DataSet wraps a database table and translates Groovy closures into SQL queries
  • findAll generates WHERE clauses, sort generates ORDER BY clauses, add generates INSERT statements
  • DataSet is lazy – queries execute when you iterate, not when you call findAll
  • Chain multiple findAll calls to build compound WHERE conditions with AND
  • DataSet does not support joins – use VIEWs or fall back to raw Sql
  • Keep closures simple (comparisons and booleans only) – method calls will not translate to SQL

If you also work with build tools, CI/CD pipelines, or cloud CLIs, check out Command Playground to practice 105+ CLI tools directly in your browser — no install needed.

Up next: Groovy ConfigSlurper – Configuration Management

Frequently Asked Questions

What is Groovy DataSet and how does it differ from Sql?

groovy.sql.DataSet is a wrapper around a database table that lets you use Groovy closures instead of SQL strings for querying. While groovy.sql.Sql requires you to write raw SQL strings, DataSet translates closures like findAll { it.salary > 50000 } into SQL WHERE clauses automatically. DataSet is best for single-table CRUD, while Sql handles joins, aggregations, and complex queries.

Can Groovy DataSet handle joins between multiple tables?

No, Groovy DataSet does not support joins directly. It is designed for single-table operations. For multi-table queries, you have two options: use raw groovy.sql.Sql with JOIN syntax, or create a database VIEW that joins the tables and then wrap that view with sql.dataSet('view_name'). The VIEW approach lets you use DataSet‘s closure-based filtering on joined data.

Is Groovy DataSet lazy or eager?

Groovy DataSet is lazy. Calling findAll or sort does not execute a query – it returns a new DataSet with the filter or sort criteria stored. The actual SQL query only runs when you consume the DataSet by calling each, rows(), or firstRow(). Each consumption triggers a fresh database query, so store results with rows() if you need them multiple times.

What types of closures can DataSet translate to SQL?

DataSet can translate closures that use simple comparison operators (==, !=, >, <, >=, <=) and boolean operators (&&, ||). It cannot translate Groovy method calls like startsWith(), toLowerCase(), regex matching, or the in operator. Keep your DataSet closures simple – for advanced SQL features, use raw Sql instead.

Should I use Groovy DataSet or GORM for database access?

Use DataSet for lightweight scripts, batch jobs, and utilities where you want simple database access without a full framework. Use GORM (in Grails or standalone) when you need domain class mapping, validations, automatic schema generation, and a full ORM. DataSet lives in groovy-sql with zero external dependencies, while GORM requires additional libraries and configuration.

Previous in Series: Groovy SQL Transactions and Batch Operations

Next in Series: Groovy ConfigSlurper – Configuration Management

Related Topics You Might Like:

This post is part of the Groovy & Grails Cookbook series on TechnoScripts.com

RahulAuthor posts

Avatar for Rahul

Rahul is a passionate IT professional who loves to sharing his knowledge with others and inspiring them to expand their technical knowledge. Rahul's current objective is to write informative and easy-to-understand articles to help people avoid day-to-day technical issues altogether. Follow Rahul's blog to stay informed on the latest trends in IT and gain insights into how to tackle complex technical issues. Whether you're a beginner or an expert in the field, Rahul's articles are sure to leave you feeling inspired and informed.

No comment

Leave a Reply

Your email address will not be published. Required fields are marked *