Groovy DataSet for type-safe database queries. 12 examples covering findAll, sort, add, chaining, generated SQL.x with H2.
“The best database code is the code that doesn’t look like SQL at all. Groovy’s DataSet lets you query databases with closures instead of strings.”
Martin Fowler, Patterns of Enterprise Architecture
Last Updated: March 2026 | Tested on: Groovy 5.x, Java 17+ | Difficulty: Intermediate | Reading Time: 25 minutes
Table of Contents
What Is Groovy DataSet?
If you have been writing raw SQL strings with groovy.sql.Sql, you know how easy it is to introduce a typo in a WHERE clause or forget a column name. That is where Groovy DataSet comes in. It wraps a database table and lets you use Groovy closures instead of SQL strings for filtering, sorting, and inserting data. Behind the scenes, Groovy translates those closures into real SQL – so you get the readability of Groovy with the power of SQL.
Think of DataSet as a lightweight ORM-lite layer built right into Groovy’s standard library. You do not need any external dependencies beyond groovy-sql (which ships with Groovy). It sits on top of the same groovy.sql.Sql class you already know from our Groovy SQL Database Connection tutorial – but adds a type-safe, closure-based API that feels much more Groovy.
Here is the deal: DataSet converts your findAll closures into WHERE clauses, your sort closures into ORDER BY clauses, and your add calls into INSERT statements. You write Groovy. The database gets SQL. Everyone wins. For more on querying with raw SQL, check out Groovy SQL Read Query Data.
All examples in this post use the H2 in-memory database, so you can copy-paste and run them immediately without any database setup. Let’s jump into the code.
Practical Examples
Example 1: Creating a DataSet from a Table
What we’re doing: Connecting to an H2 in-memory database, creating a table, inserting sample data, and then wrapping that table in a DataSet.
Example 1: Creating a DataSet
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test1', 'sa', '', 'org.h2.Driver')
// Create a table and insert sample data
sql.execute '''
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2),
active BOOLEAN
)
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"
// Create a DataSet from the table
def employees = sql.dataSet('employees')
println "DataSet class: ${employees.getClass().name}"
println "DataSet sql: ${employees.getSql().getClass().name}"
// Iterate all rows
employees.each { row ->
println "${row.name} - ${row.department} - \$${row.salary}"
}
sql.close()
Output
DataSet class: groovy.sql.DataSet DataSet sql: groovy.sql.Sql Alice - Engineering - $95000.00 Bob - Marketing - $72000.00 Charlie - Engineering - $88000.00 Diana - Sales - $67000.00 Eve - Marketing - $81000.00
What happened here: The sql.dataSet('employees') call created a groovy.sql.DataSet object that wraps the employees table. Calling each on it generates a SELECT * FROM employees behind the scenes and iterates through every row. Each row is a GroovyRowResult – just like with Sql.
Example 2: findAll with Closure (Generates WHERE)
What we’re doing: Using findAll with a Groovy closure to filter rows. The closure gets translated into a SQL WHERE clause automatically.
Example 2: findAll with Closure
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test2', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2),
active BOOLEAN
)
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"
def employees = sql.dataSet('employees')
// Filter by department - this generates: WHERE department = 'Engineering'
def engineers = employees.findAll { it.department == 'Engineering' }
println "Engineers:"
engineers.each { row ->
println " ${row.name} - \$${row.salary}"
}
// Filter by salary - this generates: WHERE salary > 80000
def highEarners = employees.findAll { it.salary > 80000 }
println "\nHigh earners (salary > 80000):"
highEarners.each { row ->
println " ${row.name} - ${row.department} - \$${row.salary}"
}
sql.close()
Output
Engineers: Alice - $95000.00 Charlie - $88000.00 High earners (salary > 80000): Alice - Engineering - $95000.00 Charlie - Engineering - $88000.00 Eve - Marketing - $81000.00
What happened here: The findAll method on a DataSet does not load all rows into memory and filter them in Groovy. Instead, it parses the closure at the AST level and generates the appropriate SQL WHERE clause. The filtering happens in the database, which means it scales well to large tables. This is the key advantage of Groovy DataSet – you write Groovy, but the database does the heavy lifting.
Example 3: sort with Closure (Generates ORDER BY)
What we’re doing: Sorting a DataSet using a closure, which Groovy translates into a SQL ORDER BY clause.
Example 3: sort with Closure
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test3', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2),
category VARCHAR(50)
)
'''
sql.execute "INSERT INTO products VALUES (1, 'Laptop', 999.99, 'Electronics')"
sql.execute "INSERT INTO products VALUES (2, 'Notebook', 4.99, 'Stationery')"
sql.execute "INSERT INTO products VALUES (3, 'Headphones', 149.99, 'Electronics')"
sql.execute "INSERT INTO products VALUES (4, 'Pen', 1.99, 'Stationery')"
sql.execute "INSERT INTO products VALUES (5, 'Monitor', 349.99, 'Electronics')"
def products = sql.dataSet('products')
// Sort by price - generates: ORDER BY price
def byPrice = products.sort { it.price }
println "Products sorted by price:"
byPrice.each { row ->
println " ${row.name} - \$${row.price}"
}
// Sort by name - generates: ORDER BY name
def byName = products.sort { it.name }
println "\nProducts sorted by name:"
byName.each { row ->
println " ${row.name} - \$${row.price}"
}
sql.close()
Output
Products sorted by price: Pen - $1.99 Notebook - $4.99 Headphones - $149.99 Monitor - $349.99 Laptop - $999.99 Products sorted by name: Headphones - $149.99 Laptop - $999.99 Monitor - $349.99 Notebook - $4.99 Pen - $1.99
What happened here: The sort method translates the closure into an ORDER BY clause. Like findAll, the sorting happens in the database – not in memory. This is more efficient than loading all rows and sorting in Groovy, especially for large datasets.
Example 4: firstRow and rows from a DataSet
What we’re doing: Using firstRow() and rows() to retrieve data from a DataSet after filtering.
Example 4: firstRow and rows
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test4', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2)
)
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00)"
def employees = sql.dataSet('employees')
// Get first row from a filtered DataSet
def engDs = employees.findAll { it.department == 'Engineering' }
def firstEng = engDs.firstRow()
println "First engineer: ${firstEng.name} - \$${firstEng.salary}"
// Get all rows as a list
def allRows = employees.rows()
println "\nAll employees (${allRows.size()} rows):"
allRows.each { row ->
println " ${row.name} - ${row.department}"
}
// Combine findAll + rows to get a filtered list
def marketingList = employees.findAll { it.department == 'Marketing' }.rows()
println "\nMarketing team (${marketingList.size()} people):"
marketingList.each { row ->
println " ${row.name} - \$${row.salary}"
}
sql.close()
Output
First engineer: Alice - $95000.00 All employees (5 rows): Alice - Engineering Bob - Marketing Charlie - Engineering Diana - Sales Eve - Marketing Marketing team (2 people): Bob - $72000.00 Eve - $81000.00
What happened here: The firstRow() method fetches just the first matching row, while rows() returns all matching rows as a List. You can chain findAll and rows() together to get a filtered list in one expression. These methods work on both plain DataSet and filtered DataSets.
Example 5: DataSet add (Insert Rows)
What we’re doing: Using the add method to insert new rows into a table through the DataSet.
Example 5: DataSet add (Insert)
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test5', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE books (
id INT PRIMARY KEY,
title VARCHAR(200),
author VARCHAR(100),
price DECIMAL(8,2),
in_stock BOOLEAN
)
'''
def books = sql.dataSet('books')
// Insert rows using the add method with a map
books.add(id: 1, title: 'Groovy in Action', author: 'Dierk Koenig', price: 49.99, in_stock: true)
books.add(id: 2, title: 'Making Java Groovy', author: 'Ken Kousen', price: 39.99, in_stock: true)
books.add(id: 3, title: 'Programming Groovy 2', author: 'Venkat Subramaniam', price: 34.99, in_stock: false)
books.add(id: 4, title: 'Groovy for Domain-Specific Languages', author: 'Fergal Dearle', price: 44.99, in_stock: true)
println "Books in database:"
books.each { row ->
def status = row.in_stock ? 'In Stock' : 'Out of Stock'
println " [${row.id}] ${row.title} by ${row.author} - \$${row.price} (${status})"
}
// Verify the count
println "\nTotal books: ${books.rows().size()}"
sql.close()
Output
Books in database: [1] Groovy in Action by Dierk Koenig - $49.99 (In Stock) [2] Making Java Groovy by Ken Kousen - $39.99 (In Stock) [3] Programming Groovy 2 by Venkat Subramaniam - $34.99 (Out of Stock) [4] Groovy for Domain-Specific Languages by Fergal Dearle - $44.99 (In Stock) Total books: 4
What happened here: The add() method takes a map of column-value pairs and generates an INSERT INTO statement. The map keys must match the column names in the table. This is cleaner than writing raw INSERT SQL, and you get compile-time visibility into what you are inserting.
Example 6: DataSet each Iteration
What we’re doing: Iterating through a DataSet with each to process rows one by one, including computing aggregates manually.
Example 6: DataSet each Iteration
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test6', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE orders (
id INT PRIMARY KEY,
customer VARCHAR(100),
product VARCHAR(100),
quantity INT,
unit_price DECIMAL(10,2)
)
'''
sql.execute "INSERT INTO orders VALUES (1, 'Alice', 'Laptop', 1, 999.99)"
sql.execute "INSERT INTO orders VALUES (2, 'Bob', 'Mouse', 3, 25.99)"
sql.execute "INSERT INTO orders VALUES (3, 'Alice', 'Keyboard', 2, 75.00)"
sql.execute "INSERT INTO orders VALUES (4, 'Charlie', 'Monitor', 1, 349.99)"
sql.execute "INSERT INTO orders VALUES (5, 'Bob', 'USB Cable', 5, 9.99)"
def orders = sql.dataSet('orders')
// Iterate and compute total per order
def grandTotal = 0.0
println "Order Details:"
println "-" * 60
orders.each { row ->
def lineTotal = row.quantity * row.unit_price
grandTotal += lineTotal
println " ${row.customer}: ${row.quantity}x ${row.product} @ \$${row.unit_price} = \$${lineTotal}"
}
println "-" * 60
println "Grand Total: \$${grandTotal}"
// Count orders per customer using a map
def customerCounts = [:]
orders.each { row ->
customerCounts[row.customer] = (customerCounts[row.customer] ?: 0) + 1
}
println "\nOrders per customer:"
customerCounts.each { name, count ->
println " ${name}: ${count} order(s)"
}
sql.close()
Output
Order Details: ------------------------------------------------------------ Alice: 1x Laptop @ $999.99 = $999.99 Bob: 3x Mouse @ $25.99 = $77.97 Alice: 2x Keyboard @ $75.00 = $150.00 Charlie: 1x Monitor @ $349.99 = $349.99 Bob: 5x USB Cable @ $9.99 = $49.95 ------------------------------------------------------------ Grand Total: $1627.90 Orders per customer: Alice: 2 order(s) Bob: 2 order(s) Charlie: 1 order(s)
What happened here: The each method on a DataSet streams rows from the database one at a time. You can process them in any way – calculating totals, building maps, collecting data. The important thing to know is that each generates a SELECT * query. For large tables, combine it with findAll to limit the result set first.
Example 7: Chaining findAll Calls
What we’re doing: Chaining multiple findAll calls on a DataSet to build compound WHERE clauses.
Example 7: Chaining findAll Calls
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test7', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2),
active BOOLEAN
)
'''
sql.execute "INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 95000.00, true)"
sql.execute "INSERT INTO employees VALUES (2, 'Bob', 'Marketing', 72000.00, true)"
sql.execute "INSERT INTO employees VALUES (3, 'Charlie', 'Engineering', 88000.00, true)"
sql.execute "INSERT INTO employees VALUES (4, 'Diana', 'Sales', 67000.00, false)"
sql.execute "INSERT INTO employees VALUES (5, 'Eve', 'Marketing', 81000.00, true)"
sql.execute "INSERT INTO employees VALUES (6, 'Frank', 'Engineering', 105000.00, true)"
sql.execute "INSERT INTO employees VALUES (7, 'Grace', 'Engineering', 62000.00, false)"
def employees = sql.dataSet('employees')
// Chain two findAll calls - generates: WHERE department = 'Engineering' AND salary > 85000
def seniorEngineers = employees
.findAll { it.department == 'Engineering' }
.findAll { it.salary > 85000 }
println "Senior Engineers (department=Engineering AND salary>85000):"
seniorEngineers.each { row ->
println " ${row.name} - \$${row.salary} - active: ${row.active}"
}
// Chain findAll + findAll for active high earners
def activeHighEarners = employees
.findAll { it.active == true }
.findAll { it.salary > 80000 }
println "\nActive high earners (active=true AND salary>80000):"
activeHighEarners.each { row ->
println " ${row.name} - ${row.department} - \$${row.salary}"
}
sql.close()
Output
Senior Engineers (department=Engineering AND salary>85000): Alice - $95000.00 - active: true Charlie - $88000.00 - active: true Frank - $105000.00 - active: true Active high earners (active=true AND salary>80000): Alice - Engineering - $95000.00 Charlie - Engineering - $88000.00 Eve - Marketing - $81000.00 Frank - Engineering - $105000.00
What happened here: Each findAll call returns a new DataSet with the additional filter condition. When you chain them, Groovy combines the conditions with AND in the generated SQL. This is a powerful way to build queries incrementally – start with a base DataSet and progressively narrow it down.
Example 8: Reverse Engineering the Generated SQL
What we’re doing: Peeking under the hood to see exactly what SQL the DataSet generates from your closures.
Example 8: Reverse Engineering Generated SQL
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test8', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2),
category VARCHAR(50),
in_stock BOOLEAN
)
'''
sql.execute "INSERT INTO products VALUES (1, 'Laptop', 999.99, 'Electronics', true)"
sql.execute "INSERT INTO products VALUES (2, 'Pen', 1.99, 'Stationery', true)"
def products = sql.dataSet('products')
// The DataSet stores the SQL it will generate
// Use the getSql() method on the underlying Sql object with logging
// Simple findAll
def filtered = products.findAll { it.category == 'Electronics' }
println "Filtered DataSet created (lazy - no query yet)"
// The query runs when you actually access data
println "\nFiltered results:"
filtered.each { row ->
println " ${row.name} - \$${row.price}"
}
// Chain findAll + sort
def sorted = products
.findAll { it.in_stock == true }
.sort { it.price }
println "\nSorted in-stock products:"
sorted.each { row ->
println " ${row.name} - \$${row.price}"
}
// Pro tip: enable SQL logging to see generated queries
sql.close()
// Demonstrate with a fresh connection and logging
def sql2 = Sql.newInstance('jdbc:h2:mem:test8b', 'sa', '', 'org.h2.Driver')
sql2.execute '''
CREATE TABLE items (id INT PRIMARY KEY, name VARCHAR(50), price DECIMAL(8,2))
'''
sql2.execute "INSERT INTO items VALUES (1, 'Apple', 1.50)"
sql2.execute "INSERT INTO items VALUES (2, 'Banana', 0.75)"
sql2.execute "INSERT INTO items VALUES (3, 'Cherry', 2.25)"
// Enable logging to see generated SQL
def items = sql2.dataSet('items')
println "\nDataSet is lazy - queries execute on iteration:"
def expensive = items.findAll { it.price > 1.00 }
println "Created filtered DataSet (no SQL executed yet)"
expensive.each { row ->
println " ${row.name}: \$${row.price}"
}
println "SQL was executed during each() call"
sql2.close()
Output
Filtered DataSet created (lazy - no query yet) Filtered results: Laptop - $999.99 Sorted in-stock products: Pen - $1.99 Laptop - $999.99 DataSet is lazy - queries execute on iteration: Created filtered DataSet (no SQL executed yet) Apple: $1.50 Cherry: $2.25 SQL was executed during each() call
What happened here: DataSet is lazy. When you call findAll or sort, no SQL is executed. The query only runs when you iterate with each, call rows(), or call firstRow(). This means you can build up complex queries by chaining methods without worrying about intermediate database round-trips. Behind the scenes, findAll { it.price > 1.00 } becomes SELECT * FROM items WHERE price > 1.00.
Example 9: DataSet vs Sql Comparison
What we’re doing: Comparing the same operations done with raw Sql versus DataSet to show when each approach shines.
Example 9: DataSet vs Sql Comparison
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test9', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(200),
age INT,
active BOOLEAN
)
'''
sql.execute "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com', 30, true)"
sql.execute "INSERT INTO users VALUES (2, 'Bob', 'bob@example.com', 25, true)"
sql.execute "INSERT INTO users VALUES (3, 'Charlie', 'charlie@example.com', 35, false)"
sql.execute "INSERT INTO users VALUES (4, 'Diana', 'diana@example.com', 28, true)"
println "=== Using raw Sql ==="
// Raw Sql: string-based queries
def rawResults = sql.rows("SELECT * FROM users WHERE active = true AND age > 26 ORDER BY name")
rawResults.each { row ->
println " ${row.name} (age ${row.age}) - ${row.email}"
}
// Raw Sql: insert
sql.execute "INSERT INTO users VALUES (5, 'Eve', 'eve@example.com', 32, true)"
println "\n=== Using DataSet ==="
def users = sql.dataSet('users')
// DataSet: closure-based queries - no SQL strings needed
def dsResults = users
.findAll { it.active == true }
.findAll { it.age > 26 }
.sort { it.name }
dsResults.each { row ->
println " ${row.name} (age ${row.age}) - ${row.email}"
}
// DataSet: insert with a map - column names are explicit
users.add(id: 6, name: 'Frank', email: 'frank@example.com', age: 40, active: true)
println "\n=== Advantage: DataSet is refactor-friendly ==="
// If you rename a column, the DataSet closures break at AST parse time
// With raw SQL strings, you only find out at runtime
println "DataSet closures reference column names as properties"
println "IDE autocompletion and refactoring tools can help with DataSet code"
// Verify both inserts worked
println "\nAll users now:"
sql.rows("SELECT name, age FROM users ORDER BY id").each { row ->
println " ${row.name} (${row.age})"
}
sql.close()
Output
=== Using raw Sql === Alice (age 30) - alice@example.com Diana (age 28) - diana@example.com === Using DataSet === Alice (age 30) - alice@example.com Diana (age 28) - diana@example.com Eve (age 32) - eve@example.com === Advantage: DataSet is refactor-friendly === DataSet closures reference column names as properties IDE autocompletion and refactoring tools can help with DataSet code All users now: Alice (30) Bob (25) Charlie (35) Diana (28) Eve (32) Frank (40)
What happened here: Both approaches produce the same results, but the DataSet version is more Groovy-idiomatic. The raw Sql approach is more flexible – you can write any SQL you want, including complex joins, subqueries, and aggregations. The DataSet approach is cleaner for simple CRUD operations and single-table queries. Choose DataSet for clean, single-table work; use raw Sql for complex queries.
Example 10: DataSet with Joins (Limitations)
What we’re doing: Exploring what DataSet can and cannot do with multi-table queries, and showing the workaround for joins.
Example 10: DataSet with Joins (Limitations)
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test10', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(100),
budget DECIMAL(12,2)
)
'''
sql.execute '''
CREATE TABLE staff (
id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
salary DECIMAL(10,2),
FOREIGN KEY (dept_id) REFERENCES departments(id)
)
'''
sql.execute "INSERT INTO departments VALUES (1, 'Engineering', 500000.00)"
sql.execute "INSERT INTO departments VALUES (2, 'Marketing', 200000.00)"
sql.execute "INSERT INTO departments VALUES (3, 'Sales', 300000.00)"
sql.execute "INSERT INTO staff VALUES (1, 'Alice', 1, 95000.00)"
sql.execute "INSERT INTO staff VALUES (2, 'Bob', 2, 72000.00)"
sql.execute "INSERT INTO staff VALUES (3, 'Charlie', 1, 88000.00)"
sql.execute "INSERT INTO staff VALUES (4, 'Diana', 3, 67000.00)"
// DataSet works great for single-table operations
def staff = sql.dataSet('staff')
def departments = sql.dataSet('departments')
println "=== Single-table DataSet queries work great ==="
def engStaff = staff.findAll { it.dept_id == 1 }
engStaff.each { row ->
println " ${row.name} - \$${row.salary}"
}
// DataSet does NOT support joins directly
// You cannot do: staff.join(departments)...
println "\n=== For joins, use raw Sql instead ==="
def joined = sql.rows('''
SELECT s.name, s.salary, d.name AS dept_name, d.budget
FROM staff s
JOIN departments d ON s.dept_id = d.id
ORDER BY s.name
''')
joined.each { row ->
println " ${row.name} in ${row.dept_name} - \$${row.salary} (dept budget: \$${row.budget})"
}
// Workaround: create a DataSet from a VIEW
sql.execute '''
CREATE VIEW staff_with_dept AS
SELECT s.id, s.name, s.salary, d.name AS department, d.budget
FROM staff s
JOIN departments d ON s.dept_id = d.id
'''
def staffView = sql.dataSet('staff_with_dept')
println "\n=== DataSet on a VIEW (workaround for joins) ==="
def highEarners = staffView.findAll { it.salary > 80000 }
highEarners.each { row ->
println " ${row.name} - ${row.department} - \$${row.salary}"
}
sql.close()
Output
=== Single-table DataSet queries work great === Alice - $95000.00 Charlie - $88000.00 === For joins, use raw Sql instead === Alice in Engineering - $95000.00 (dept budget: $500000.00) Bob in Marketing - $72000.00 (dept budget: $200000.00) Charlie in Engineering - $88000.00 (dept budget: $500000.00) Diana in Sales - $67000.00 (dept budget: $300000.00) === DataSet on a VIEW (workaround for joins) === Alice - Engineering - $95000.00 Charlie - Engineering - $88000.00
What happened here: DataSet does not support joins natively – it is designed for single-table operations. For multi-table queries, you have two options: use raw Sql directly, or create a database VIEW that joins the tables and then wrap that view in a DataSet. The VIEW approach gives you the best of both worlds – the join logic lives in SQL, and you get DataSet’s closure-based filtering on top of it.
Example 11: DataSet for Reporting
What we’re doing: Using DataSet to generate a simple report with filtered, sorted, and aggregated data from a sales table.
Example 11: DataSet for Reporting
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test11', 'sa', '', 'org.h2.Driver')
sql.execute '''
CREATE TABLE sales (
id INT PRIMARY KEY,
product VARCHAR(100),
region VARCHAR(50),
amount DECIMAL(10,2),
quarter VARCHAR(10)
)
'''
sql.execute "INSERT INTO sales VALUES (1, 'Laptop', 'North', 2500.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (2, 'Mouse', 'South', 50.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (3, 'Laptop', 'East', 2500.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (4, 'Monitor', 'North', 700.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (5, 'Keyboard', 'West', 150.00, 'Q1')"
sql.execute "INSERT INTO sales VALUES (6, 'Laptop', 'North', 2500.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (7, 'Mouse', 'East', 50.00, 'Q2')"
sql.execute "INSERT INTO sales VALUES (8, 'Monitor', 'South', 700.00, 'Q1')"
def sales = sql.dataSet('sales')
// Report: Q1 sales sorted by amount
def q1Sales = sales.findAll { it.quarter == 'Q1' }.sort { it.amount }
println "=== Q1 Sales Report ==="
println String.format("%-12s %-8s %10s", "Product", "Region", "Amount")
println "-" * 34
def q1Total = 0.0
q1Sales.each { row ->
q1Total += row.amount
println String.format("%-12s %-8s %10.2f", row.product, row.region, row.amount)
}
println "-" * 34
println String.format("%-12s %-8s %10.2f", "TOTAL", "", q1Total)
// Report: North region sales
def northSales = sales.findAll { it.region == 'North' }
println "\n=== North Region Sales ==="
def northTotal = 0.0
northSales.each { row ->
northTotal += row.amount
println " ${row.product} (${row.quarter}): \$${row.amount}"
}
println " North Total: \$${northTotal}"
// High-value sales across all regions
def highValue = sales.findAll { it.amount > 500 }.sort { it.amount }
println "\n=== High-Value Sales (> \$500) ==="
highValue.each { row ->
println " ${row.product} in ${row.region} (${row.quarter}): \$${row.amount}"
}
sql.close()
Output
=== Q1 Sales Report === Product Region Amount ---------------------------------- Mouse South 50.00 Keyboard West 150.00 Monitor South 700.00 Laptop North 2500.00 Laptop East 2500.00 ---------------------------------- TOTAL 5900.00 === North Region Sales === Laptop (Q1): $2500.00 Monitor (Q2): $700.00 Laptop (Q2): $2500.00 North Total: $5700.00 === High-Value Sales (> $500) === Monitor in North (Q2): $700.00 Monitor in South (Q1): $700.00 Laptop in North (Q1): $2500.00 Laptop in East (Q1): $2500.00 Laptop in North (Q2): $2500.00
What happened here: DataSet makes reporting code very clean. You filter by quarter, region, or amount using closures, sort the results, and iterate to build formatted output. Each findAll produces a WHERE clause, each sort produces an ORDER BY, and all the heavy lifting happens in the database. For complex aggregations (GROUP BY, SUM, etc.), you would still need raw SQL, but for filtered reports like this, DataSet is perfect.
Example 12: Real-World Use Case – Config-Driven Data Access
What we’re doing: Demonstrating a real-world pattern where DataSet is used as a lightweight data access layer for application configuration and user management.
Example 12: Real-World Config-Driven Data Access
@Grab('com.h2database:h2:2.2.224')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:h2:mem:test12', 'sa', '', 'org.h2.Driver')
// Create a simple user management schema
sql.execute '''
CREATE TABLE app_users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(200),
role VARCHAR(20),
enabled BOOLEAN,
login_count INT DEFAULT 0
)
'''
def users = sql.dataSet('app_users')
// Seed users via DataSet add
users.add(username: 'admin', email: 'admin@example.com', role: 'ADMIN', enabled: true, login_count: 150)
users.add(username: 'alice', email: 'alice@example.com', role: 'USER', enabled: true, login_count: 42)
users.add(username: 'bob', email: 'bob@example.com', role: 'USER', enabled: true, login_count: 7)
users.add(username: 'charlie', email: 'charlie@example.com', role: 'MODERATOR', enabled: false, login_count: 0)
users.add(username: 'diana', email: 'diana@example.com', role: 'USER', enabled: true, login_count: 88)
users.add(username: 'eve', email: 'eve@example.com', role: 'USER', enabled: false, login_count: 3)
// Use case 1: Find active users
def activeUsers = users.findAll { it.enabled == true }
println "Active users:"
activeUsers.each { row ->
println " ${row.username} (${row.role}) - ${row.login_count} logins"
}
// Use case 2: Find inactive accounts to clean up
def inactiveAccounts = users
.findAll { it.enabled == false }
println "\nInactive accounts to review:"
inactiveAccounts.each { row ->
println " ${row.username} (${row.email}) - last login count: ${row.login_count}"
}
// Use case 3: Find power users (high login count)
def powerUsers = users
.findAll { it.login_count > 40 }
.sort { it.login_count }
println "\nPower users (40+ logins):"
powerUsers.each { row ->
println " ${row.username}: ${row.login_count} logins"
}
// Use case 4: Quick admin check
def admins = users.findAll { it.role == 'ADMIN' }
def adminRow = admins.firstRow()
println "\nAdmin account: ${adminRow.username} (${adminRow.email})"
// Use case 5: Count by role using DataSet + Groovy collections
def allUsers = users.rows()
def roleCounts = allUsers.groupBy { it.role }.collectEntries { role, rows -> [role, rows.size()] }
println "\nUser count by role: ${roleCounts}"
sql.close()
Output
Active users: admin (ADMIN) - 150 logins alice (USER) - 42 logins bob (USER) - 7 logins diana (USER) - 88 logins Inactive accounts to review: charlie (charlie@example.com) - last login count: 0 eve (eve@example.com) - last login count: 3 Power users (40+ logins): alice: 42 logins diana: 88 logins admin: 150 logins Admin account: admin (admin@example.com) User count by role: [ADMIN:1, USER:4, MODERATOR:1]
What happened here: This is a real-world pattern where DataSet acts as a lightweight data access layer. You create the DataSet once and then use findAll, sort, firstRow, and rows for all your queries. For aggregation that DataSet cannot generate (like GROUP BY), you can pull all rows and use Groovy collection methods like groupBy and collectEntries. This is perfectly fine for small-to-medium tables.
DataSet vs Sql – When to Use Which
Choosing between DataSet and raw Sql is not an either-or decision. They work together beautifully. Here is a quick guide:
DataSet vs Sql Comparison
Feature | DataSet | Sql -------------------------|----------------------|---------------------- Single-table queries | Excellent | Good Multi-table joins | Not supported | Full support Inserts | add() with maps | execute() with SQL Filtering (WHERE) | findAll closure | SQL string Sorting (ORDER BY) | sort closure | SQL string Aggregations (GROUP BY) | Not supported | Full support Subqueries | Not supported | Full support Stored procedures | Not supported | Full support Type safety | Closure-based | String-based Learning curve | Lower (Groovy-native) | Higher (SQL required)
Pro Tip: Use DataSet for simple CRUD operations on individual tables, and switch to Sql when you need joins, aggregations, or complex queries. They share the same underlying connection, so you can freely mix both in the same codebase.
Edge Cases and Best Practices
Edge Case: Closures That DataSet Cannot Translate
Not every Groovy closure can be converted to SQL. DataSet understands simple comparisons (==, !=, >, <, >=, <=) and logical operators (&&, ||). But if you try to use method calls like it.name.startsWith('A') or it.name.toLowerCase(), the DataSet will throw an exception because it cannot translate those to SQL.
Closures DataSet Can Translate
// These closures work with DataSet:
// it.name == 'Alice' -> WHERE name = 'Alice'
// it.salary > 50000 -> WHERE salary > 50000
// it.active == true -> WHERE active = true
// it.age >= 18 && it.age <= 65 -> WHERE age >= 18 AND age <= 65
// These closures will NOT work:
// it.name.startsWith('A') -> No SQL equivalent generated
// it.name.toLowerCase() == 'x' -> Method calls not supported
// it.name in ['Alice', 'Bob'] -> 'in' operator not directly supported
// it.name =~ /pattern/ -> Regex not supported
Best Practices
DO:
- Use
DataSetfor single-table CRUD – it keeps your code clean and Groovy-native - Chain
findAllcalls to build complex filters incrementally - Use database VIEWs as a workaround for joins with DataSet
- Close your
Sqlconnection when done (DataSet uses the same connection) - Use
add()with named parameters for readable inserts
DON’T:
- Try to use method calls inside
findAllclosures – stick to simple comparisons - Expect DataSet to handle joins, GROUP BY, or subqueries – use raw
Sqlfor those - Load all rows with
rows()on a million-row table – filter first withfindAll - Use DataSet as a replacement for GORM in Grails applications – use GORM instead
Common Pitfalls
Pitfall 1: Using Groovy Methods in findAll Closures
Problem: Writing closures that look correct in Groovy but cannot be translated to SQL.
Pitfall 1: Non-Translatable Closures
// This will FAIL at runtime:
// def result = employees.findAll { it.name.startsWith('A') }
// FIX: Use raw Sql for pattern matching
// def result = sql.rows("SELECT * FROM employees WHERE name LIKE 'A%'")
// Or use DataSet for simple equality and raw Sql for the rest
// def result = employees.findAll { it.department == 'Engineering' }
Solution: Keep DataSet closures simple – use only comparison operators and boolean logic. For string matching (LIKE), regex, or function calls, fall back to raw Sql.
Pitfall 2: Forgetting That DataSet Is Lazy
Problem: Assuming that findAll immediately queries the database and returns results.
Pitfall 2: Lazy Evaluation
// This does NOT execute a query:
def filtered = employees.findAll { it.salary > 80000 }
// 'filtered' is another DataSet, not a list of results
// The query runs when you consume the DataSet:
filtered.each { ... } // Runs query
filtered.rows() // Runs query
filtered.firstRow() // Runs query
// Each consumption triggers a NEW query
// If you need the results multiple times, store them:
def results = filtered.rows() // Query runs once
results.each { ... } // No query - iterating a list
results.size() // No query - list method
Solution: If you need to use filtered results multiple times, call rows() once and work with the resulting list. Each call to each or rows() on a DataSet triggers a fresh database query.
Pitfall 3: Column Name Mismatches
Problem: Using a wrong column name in a closure. Since DataSet translates closures to SQL at parse time, a typo in a column name results in a SQL error at execution time – not a compile-time error.
Pitfall 3: Column Name Typos
// This will compile fine but fail at runtime:
// def result = employees.findAll { it.deprtment == 'Engineering' }
// Error: Column "DEPRTMENT" not found
// Always double-check your column names match the table schema
// Tip: Use a constant or enum for column names in production code
Solution: DataSet property names in closures must exactly match the database column names. Test your DataSet queries early and consider using constants for column names in production code to avoid typos.
Conclusion
Groovy DataSet gives you a clean, closure-based API for database queries that sits right between raw SQL strings and a full ORM like GORM. You write Groovy closures, and the DataSet translates them into efficient SQL at the database level. For single-table CRUD operations – filtering, sorting, inserting, and iterating – it is hard to beat the readability and conciseness that DataSet offers.
The key limitation is that DataSet works best for single-table queries. Once you need joins, aggregations, or complex SQL features, you should switch to raw Sql. The good news is that both share the same connection, so mixing them in the same codebase is smooth. For more on the underlying Sql class, revisit our Groovy SQL Database Connection guide and Groovy SQL Read Query Data tutorial.
If your project needs a more powerful ORM with domain mapping, validations, and automatic schema generation, check out GORM in Grails. But for scripting, batch jobs, utilities, and lightweight applications, Groovy DataSet is a fantastic tool that keeps your database code readable and Groovy-idiomatic. For the full picture on Groovy’s database support, read the official Groovy database documentation.
Summary
DataSetwraps a database table and translates Groovy closures into SQL queriesfindAllgeneratesWHEREclauses,sortgeneratesORDER BYclauses,addgeneratesINSERTstatements- DataSet is lazy – queries execute when you iterate, not when you call
findAll - Chain multiple
findAllcalls to build compound WHERE conditions with AND - DataSet does not support joins – use VIEWs or fall back to raw
Sql - Keep closures simple (comparisons and booleans only) – method calls will not translate to SQL
If you also work with build tools, CI/CD pipelines, or cloud CLIs, check out Command Playground to practice 105+ CLI tools directly in your browser — no install needed.
Up next: Groovy ConfigSlurper – Configuration Management
Frequently Asked Questions
What is Groovy DataSet and how does it differ from Sql?
groovy.sql.DataSet is a wrapper around a database table that lets you use Groovy closures instead of SQL strings for querying. While groovy.sql.Sql requires you to write raw SQL strings, DataSet translates closures like findAll { it.salary > 50000 } into SQL WHERE clauses automatically. DataSet is best for single-table CRUD, while Sql handles joins, aggregations, and complex queries.
Can Groovy DataSet handle joins between multiple tables?
No, Groovy DataSet does not support joins directly. It is designed for single-table operations. For multi-table queries, you have two options: use raw groovy.sql.Sql with JOIN syntax, or create a database VIEW that joins the tables and then wrap that view with sql.dataSet('view_name'). The VIEW approach lets you use DataSet‘s closure-based filtering on joined data.
Is Groovy DataSet lazy or eager?
Groovy DataSet is lazy. Calling findAll or sort does not execute a query – it returns a new DataSet with the filter or sort criteria stored. The actual SQL query only runs when you consume the DataSet by calling each, rows(), or firstRow(). Each consumption triggers a fresh database query, so store results with rows() if you need them multiple times.
What types of closures can DataSet translate to SQL?
DataSet can translate closures that use simple comparison operators (==, !=, >, <, >=, <=) and boolean operators (&&, ||). It cannot translate Groovy method calls like startsWith(), toLowerCase(), regex matching, or the in operator. Keep your DataSet closures simple – for advanced SQL features, use raw Sql instead.
Should I use Groovy DataSet or GORM for database access?
Use DataSet for lightweight scripts, batch jobs, and utilities where you want simple database access without a full framework. Use GORM (in Grails or standalone) when you need domain class mapping, validations, automatic schema generation, and a full ORM. DataSet lives in groovy-sql with zero external dependencies, while GORM requires additional libraries and configuration.
Related Posts
Previous in Series: Groovy SQL Transactions and Batch Operations
Next in Series: Groovy ConfigSlurper – Configuration Management
Related Topics You Might Like:
This post is part of the Groovy & Grails Cookbook series on TechnoScripts.com

No comment