Groovy regex and regular expressions come with dedicated operators and concise syntax. See 12 tested examples covering regex operators, pattern matching, find, match, and replace. Complete guide for Groovy 5.x.
“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems. Unless they’re using Groovy – then they just have one elegant solution.”
Adapted from Jamie Zawinski
Last Updated: March 2026 | Tested on: Groovy 5.x, Java 17+ | Difficulty: Intermediate | Reading Time: 18 minutes
A groovy regex is far less painful than its Java equivalent. Instead of double-escaping backslashes and wrapping everything in Pattern.compile(), Groovy gives you slashy strings, three dedicated regex operators, and pattern matching syntax that keeps your code shorter than the regex itself.
This post covers the core techniques for Groovy regular expressions. We’ll walk through 12 tested examples showing Groovy’s three regex operators, slashy strings, pattern matching, find vs match, capturing groups, replacements, and real-world parsing. If you need a refresher on strings first, check out our Groovy String Tutorial. And if you’ve already used regex with split(), you’ll find that the operators covered here take things much further.
Every example is tested on Groovy 5.x with exact output shown. Let’s get into it.
Table of Contents
Why Groovy Makes Regex Better
If you’ve written regex in Java, you know the pain of code like Pattern.compile("\\d+\\.\\d+"). Those double backslashes exist because Java strings use backslash as an escape character, so you need to escape the escape. It makes patterns nearly unreadable.
Groovy gives you three things that change the game:
- Slashy strings –
/pattern/syntax where backslashes are literal, so/\d+\.\d+/just works - Three regex operators –
=~(find),==~(match), and~//(pattern) built into the language - GDK enhancements – methods like
replaceAllwith closures,findAll, and Matcher iteration
According to the official Groovy documentation, these features make regular expressions a natural part of the language rather than a clunky library bolted on. Here are the operators in detail.
Groovy Regex Operators and Syntax
Groovy provides three dedicated regex operators. Each one serves a different purpose:
| Operator | Name | Returns | Description |
|---|---|---|---|
~/pattern/ | Pattern operator | java.util.regex.Pattern | Creates a compiled regex pattern |
=~ | Find operator | java.util.regex.Matcher | Creates a Matcher – true if pattern found anywhere |
==~ | Match operator | boolean | Tests if entire string matches the pattern |
Groovy Regex Operators Overview
// Pattern operator - creates a compiled Pattern
def pattern = ~/\d+/
println "Pattern type: ${pattern.getClass().name}"
// Find operator - creates a Matcher (partial match)
def matcher = "abc 123 def" =~ /\d+/
println "Matcher type: ${matcher.getClass().name}"
println "Found? ${matcher as boolean}"
// Match operator - full string match (returns boolean)
def fullMatch = "12345" ==~ /\d+/
def partialFail = "abc 123" ==~ /\d+/
println "Full match: ${fullMatch}"
println "Partial string: ${partialFail}"
Output
Pattern type: java.util.regex.Pattern Matcher type: java.util.regex.Matcher Found? true Full match: true Partial string: false
The key distinction: =~ checks if the pattern exists anywhere in the string (like Java’s Matcher.find()), while ==~ requires the entire string to match (like Matcher.matches()). This is the most common source of confusion, so keep this distinction clear.
12 Practical Regex Examples
Example 1: Basic Pattern Matching with =~
What we’re doing: Using the find operator to check if a string contains a pattern, and extracting what was found.
Example 1: Find Operator =~
def text = "Order #12345 placed on 2026-03-08"
// Check if there's a number in the string
def matcher = text =~ /\d+/
if (matcher) {
println "Found a number: ${matcher[0]}"
}
// Find all matches
matcher.reset()
def allNumbers = []
while (matcher.find()) {
allNumbers << matcher.group()
}
println "All numbers: ${allNumbers}"
// Simpler way using Groovy's findAll on String
def found = (text =~ /\d+/).collect { it }
println "Collected: ${found}"
Output
Found a number: 12345 All numbers: [12345, 2026, 03, 08] Collected: [12345, 2026, 03, 08]
What happened here: The =~ operator creates a Matcher object. When used in a boolean context (like an if statement), it returns true if the pattern is found anywhere in the string. You can index into the matcher with matcher[0] to get the first match. The collect call iterates over all matches – Groovy makes the Matcher iterable.
Example 2: Exact Match with ==~
What we’re doing: Using the match operator to validate that an entire string conforms to a pattern.
Example 2: Match Operator ==~
// Email validation (simplified)
def emails = ["user@example.com", "bad-email", "name@domain.co.uk", "@missing.com", "test@.com"]
emails.each { email ->
def valid = email ==~ /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
println "${email.padRight(20)} -> ${valid ? 'VALID' : 'INVALID'}"
}
println ""
// Phone number validation
def phones = ["555-123-4567", "5551234567", "(555) 123-4567", "123-45"]
phones.each { phone ->
def valid = phone ==~ /(\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4})/
println "${phone.padRight(20)} -> ${valid ? 'VALID' : 'INVALID'}"
}
Output
user@example.com -> VALID bad-email -> INVALID name@domain.co.uk -> VALID @missing.com -> INVALID test@.com -> INVALID 555-123-4567 -> VALID 5551234567 -> VALID (555) 123-4567 -> VALID 123-45 -> INVALID
What happened here: The ==~ operator returns a boolean – true only if the entire string matches the pattern from start to finish. This makes it perfect for validation. Notice we don’t need ^ and $ anchors because ==~ implicitly matches the whole string. The slashy string syntax keeps those regex patterns readable.
Example 3: Capturing Groups
What we’re doing: Extracting specific parts of a match using capturing groups with parentheses.
Example 3: Capturing Groups
def dateStr = "Today is 2026-03-08 and tomorrow is 2026-03-09"
// Single match with groups
def matcher = dateStr =~ /(\d{4})-(\d{2})-(\d{2})/
if (matcher) {
println "Full match: ${matcher[0][0]}"
println "Year: ${matcher[0][1]}"
println "Month: ${matcher[0][2]}"
println "Day: ${matcher[0][3]}"
}
println ""
// All matches with groups
println "All dates found:"
(0..<matcher.count).each { i ->
def (full, year, month, day) = matcher[i]
println " ${full} -> Year=${year}, Month=${month}, Day=${day}"
}
println ""
// Named groups (Groovy/Java 7+)
def log = "ERROR 2026-03-08 14:30:22 Connection failed"
def logMatcher = log =~ /(?<level>\w+)\s+(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<msg>.+)/
if (logMatcher) {
println "Level: ${logMatcher.group('level')}"
println "Date: ${logMatcher.group('date')}"
println "Time: ${logMatcher.group('time')}"
println "Message: ${logMatcher.group('msg')}"
}
Output
Full match: 2026-03-08 Year: 2026 Month: 03 Day: 08 All dates found: 2026-03-08 -> Year=2026, Month=03, Day=08 2026-03-09 -> Year=2026, Month=03, Day=09 Level: ERROR Date: 2026-03-08 Time: 14:30:22 Message: Connection failed
What happened here: Groovy’s Matcher indexing is beautifully intuitive. matcher[0] gives the first match, and when groups are present it returns a list where index 0 is the full match and subsequent indices are the capturing groups. You can destructure these directly with def (full, year, month, day) = matcher[i]. Named groups using (?<name>...) syntax make your patterns self-documenting.
Example 4: The Pattern Operator ~/regex/
What we’re doing: Creating precompiled Pattern objects with Groovy’s pattern operator for reuse.
Example 4: Pattern Operator ~/
import java.util.regex.Pattern
// Create a compiled Pattern
def emailPattern = ~/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
println "Type: ${emailPattern.getClass().name}"
// Reuse across multiple strings
def inputs = ["Contact us at support@example.com", "No email here", "Send to admin@tech.org please"]
inputs.each { input ->
def matcher = emailPattern.matcher(input)
if (matcher.find()) {
println "Found email in '${input}': ${matcher.group()}"
} else {
println "No email in '${input}'"
}
}
println ""
// Pattern with flags
def caseInsensitive = ~/(?i)groovy/
def texts = ["Groovy is great", "I love GROOVY", "groovy scripts", "Java only"]
texts.each { t ->
println "${t.padRight(20)} matches: ${caseInsensitive.matcher(t).find()}"
}
println ""
// Comparing approaches
def ipPattern = ~/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
def servers = ["192.168.1.1", "10.0.0.255", "not-an-ip", "256.1.2.3"]
servers.each { s ->
println "${s.padRight(15)} -> ${(s ==~ ipPattern) ? 'IP format' : 'Not IP format'}"
}
Output
Type: java.util.regex.Pattern Found email in 'Contact us at support@example.com': support@example.com No email in 'No email here' Found email in 'Send to admin@tech.org please': admin@tech.org Groovy is great matches: true I love GROOVY matches: true groovy scripts matches: true Java only matches: false 192.168.1.1 -> IP format 10.0.0.255 -> IP format not-an-ip -> Not IP format 256.1.2.3 -> IP format
What happened here: The ~/pattern/ operator is equivalent to Pattern.compile("pattern") but far more readable. The compiled pattern can be reused across multiple strings without recompilation – important for performance in loops. You embed flags like case-insensitivity directly in the pattern using (?i). Note that the IP pattern matches the format but doesn’t validate the range – 256.1.2.3 passes the format check even though 256 isn’t valid.
Example 5: String replaceAll with Regex
What we’re doing: Using regex-powered replaceAll and replaceFirst to transform strings.
Example 5: replaceAll with Regex
// Basic replacement
def text = "Call 555-123-4567 or 555-987-6543"
def masked = text.replaceAll(/\d{3}-\d{3}-\d{4}/, "XXX-XXX-XXXX")
println "Masked: ${masked}"
// Using backreferences
def names = "John Smith, Jane Doe, Bob Wilson"
def reversed = names.replaceAll(/(\w+)\s(\w+)/, '$2, $1')
println "Reversed: ${reversed}"
// replaceAll with a closure (Groovy GDK)
def prices = 'Item A costs $19.99 and Item B costs $5.50'
def increased = prices.replaceAll(/\$(\d+\.\d{2})/) { fullMatch, amount ->
def newPrice = (amount as BigDecimal) * 1.10
'$' + String.format('%.2f', newPrice)
}
println "10% increase: ${increased}"
// Replace first occurrence only
def sentence = "the cat sat on the mat by the hat"
def first = sentence.replaceFirst(/the/, "THE")
println "Replace first: ${first}"
// Remove all non-alphanumeric characters
def dirty = "Hello, World! How's it going? #great"
def clean = dirty.replaceAll(/[^a-zA-Z0-9\s]/, '')
println "Cleaned: ${clean}"
Output
Masked: Call XXX-XXX-XXXX or XXX-XXX-XXXX Reversed: Smith, John, Doe, Jane, Wilson, Bob 10% increase: Item A costs $21.99 and Item B costs $6.05 Replace first: THE cat sat on the mat by the hat Cleaned: Hello World Hows it going great
What happened here: Groovy’s replaceAll accepts regex patterns just like Java. The backreference syntax $1, $2 lets you rearrange captured groups. But the real power is the closure version – Groovy’s GDK lets you pass a closure to replaceAll that receives each match and its groups, so you can compute the replacement dynamically. The price increase example would be painful in Java but is clean and readable in Groovy.
Example 6: Matcher Iteration and Indexing
What we’re doing: Iterating over all regex matches in a string using Groovy’s enhanced Matcher.
Example 6: Matcher Iteration
def text = 'Prices: $12.99, $5.50, $199.00, $0.99'
// Index into matches
def matcher = text =~ /\$(\d+\.\d{2})/
println "Match count: ${matcher.count}"
println "First match: ${matcher[0][0]}"
println "First price: ${matcher[0][1]}"
println "Third match: ${matcher[2][0]}"
println ""
// Iterate with each
def total = 0.0
(text =~ /\$(\d+\.\d{2})/).each { full, price ->
println "Found: ${full} -> price = ${price}"
total += price as BigDecimal
}
println "Total: ${'$'}${total}"
println ""
// Collect all matches into a list
def allPrices = (text =~ /\$(\d+\.\d{2})/).collect { it[1] as BigDecimal }
println "All prices: ${allPrices}"
println "Average: ${'$'}${String.format('%.2f', allPrices.sum() / allPrices.size())}"
Output
Match count: 4 First match: $12.99 First price: 12.99 Third match: $199.00 Found: $12.99 -> price = 12.99 Found: $5.50 -> price = 5.50 Found: $199.00 -> price = 199.00 Found: $0.99 -> price = 0.99 Total: $218.48 All prices: [12.99, 5.50, 199.00, 0.99] Average: $54.62
What happened here: Groovy makes the Matcher class iterable, which means you can use each, collect, and other Groovy collection methods on it. The matcher.count property tells you how many matches exist. When you use each with a closure that has multiple parameters, Groovy destructures the match – the first parameter gets the full match and subsequent parameters get the capturing groups.
Example 7: Slashy Strings and Dollar Slashy Strings
What we’re doing: Comparing Groovy’s string types for regex and when to use each one.
Example 7: Slashy and Dollar Slashy Strings
// Regular string - must double-escape backslashes
def p1 = "\\d+\\.\\d+"
println "Regular string pattern: ${p1}"
println "Match: ${'3.14' ==~ p1}"
// Slashy string - backslashes are literal
def p2 = /\d+\.\d+/
println "Slashy string pattern: ${p2}"
println "Match: ${'3.14' ==~ p2}"
// Dollar slashy string - useful when pattern contains forward slashes
def pathPattern = $/[a-z]+/[a-z]+/\d+/$
println "Dollar slashy: ${pathPattern}"
println "Match: ${'users/profiles/123' ==~ pathPattern}"
// Slashy strings support interpolation
def digits = 3
def dynPattern = /\d{${digits}}/
println "Dynamic pattern: ${dynPattern}"
println "Match '123': ${'123' ==~ dynPattern}"
println "Match '12': ${'12' ==~ dynPattern}"
// Multi-line regex with comments using (?x) flag
def complexPattern = /(?x)
^ # Start of string
[a-zA-Z0-9._%+-]+ # Username
@ # At symbol
[a-zA-Z0-9.-]+ # Domain
\. # Dot
[a-zA-Z]{2,} # TLD
$ # End of string
/
println "Email valid: ${'user@example.com' ==~ complexPattern}"
Output
Regular string pattern: \d+\.\d+
Match: true
Slashy string pattern: \d+\.\d+
Match: true
Dollar slashy: [a-z]+/[a-z]+/\d+
Match: true
Dynamic pattern: \d{3}
Match '123': true
Match '12': false
Email valid: true
What happened here: Slashy strings /pattern/ are the workhorse for regex in Groovy – they let you write \d instead of \\d. Dollar slashy strings $/pattern/$ are useful when your pattern contains forward slashes, like file paths. Both support GString interpolation with ${}. The (?x) flag enables comments inside your regex, which is a lifesaver for complex patterns.
Example 8: Regex in switch Statements
What we’re doing: Using regex patterns directly in Groovy’s switch statement for pattern-based branching.
Example 8: Regex in switch
def classify(String input) {
switch (input) {
case ~/\d+/:
return "Number: ${input}"
case ~/[a-zA-Z]+@[a-zA-Z]+\.\w+/:
return "Email: ${input}"
case ~/\d{3}-\d{3}-\d{4}/:
return "Phone: ${input}"
case ~/https?:\/\/.+/:
return "URL: ${input}"
case ~/[A-Z]{2,}/:
return "Acronym: ${input}"
default:
return "Unknown: ${input}"
}
}
def inputs = ["42", "user@mail.com", "555-123-4567", "https://groovy-lang.org", "NASA", "hello world"]
inputs.each { println classify(it) }
Output
Number: 42 Email: user@mail.com Phone: 555-123-4567 URL: https://groovy-lang.org Acronym: NASA Unknown: hello world
What happened here: This is one of Groovy’s most elegant features. In a switch statement, case can take a Pattern, and Groovy will test it using the ==~ match operator under the hood. This gives you regex-powered dispatching with clean, readable syntax. In Java, you’d need a chain of if-else statements with Pattern.matches() calls.
Example 9: Lookahead and Lookbehind Assertions
What we’re doing: Using zero-width assertions to match patterns based on what comes before or after them, without including that context in the match.
Example 9: Lookahead and Lookbehind
// Positive lookahead - match digits followed by "px"
def css = "font-size: 14px; margin: 20px; width: 100%; padding: 8px"
def pxValues = (css =~ /\d+(?=px)/).collect { it }
println "Pixel values: ${pxValues}"
// Negative lookahead - match digits NOT followed by "px"
def nonPxValues = (css =~ /\d+(?!px|[0-9])/).collect { it }
println "Non-pixel values: ${nonPxValues}"
// Positive lookbehind - match text after a dollar sign
def pricing = "Basic \$9.99, Pro \$29.99, Enterprise \$99.99"
def amounts = (pricing =~ /(?<=\$)\d+\.\d{2}/).collect { it }
println "Amounts: ${amounts}"
// Password validation using lookaheads
def validatePassword(String pwd) {
def hasUpper = pwd ==~ /.*(?=.*[A-Z]).*/
def hasLower = pwd ==~ /.*(?=.*[a-z]).*/
def hasDigit = pwd ==~ /.*(?=.*\d).*/
def hasLength = pwd.length() >= 8
return [upper: hasUpper, lower: hasLower, digit: hasDigit, length: hasLength]
}
["Groovy123", "short", "NoDigitsHere", "abc123def"].each { pwd ->
def result = validatePassword(pwd)
println "${pwd.padRight(15)} -> ${result}"
}
Output
Pixel values: [14, 20, 8] Non-pixel values: [100] Amounts: [9.99, 29.99, 99.99] Groovy123 -> [upper:true, lower:true, digit:true, length:true] short -> [upper:false, lower:true, digit:false, length:false] NoDigitsHere -> [upper:true, lower:true, digit:false, length:true] abc123def -> [upper:false, lower:true, digit:true, length:true]
What happened here: Lookahead (?=...) and lookbehind (?<=...) are zero-width assertions – they check what’s around a match without including it in the result. The CSS example extracts only the numbers that are pixel values. The pricing example extracts amounts that follow a dollar sign. These assertions are especially useful for validation – the password checker uses multiple lookaheads to verify different criteria independently.
Example 10: Regex with Groovy’s findAll and find
What we’re doing: Using Groovy’s GDK string methods that accept regex patterns for concise pattern extraction.
Example 10: GDK String Regex Methods
def text = "Server logs: ERROR at 14:30, WARN at 14:32, ERROR at 14:35, INFO at 14:40"
// findAll - returns all matches as a list of strings
def times = text.findAll(/\d{2}:\d{2}/)
println "All times: ${times}"
// findAll with groups - returns list of lists
def entries = text.findAll(/(\w+) at (\d{2}:\d{2})/) { full, level, time -> "${level}@${time}" }
println "Entries: ${entries}"
// find - returns first match
def firstError = text.find(/ERROR at \d{2}:\d{2}/)
println "First error: ${firstError}"
// find with closure
def firstTime = text.find(/(\w+) at (\d{2}:\d{2})/) { full, level, time -> time }
println "First time: ${firstTime}"
// count occurrences
def errorCount = text.findAll(/ERROR/).size()
def warnCount = text.findAll(/WARN/).size()
println "Errors: ${errorCount}, Warnings: ${warnCount}"
// matches() - same as ==~
println "Is all digits: ${'12345'.matches(/\d+/)}"
println "Contains letter: ${'abc123'.matches(/.*[a-zA-Z].*/)}"
Output
All times: [14:30, 14:32, 14:35, 14:40] Entries: [ERROR@14:30, WARN@14:32, ERROR@14:35, INFO@14:40] First error: ERROR at 14:30 First time: 14:30 Errors: 2, Warnings: 1 Is all digits: true Contains letter: true
What happened here: The GDK adds findAll(regex) and find(regex) directly to String. These are often cleaner than creating a Matcher explicitly. When you pass a closure, it receives the match and any groups, letting you transform the results inline. The findAll with closure pattern is particularly powerful – it replaces what would be a multi-line Matcher loop in Java with a single expressive line.
Example 11: Multiline and DOTALL Modes
What we’re doing: Working with regex across multiple lines using flags for multiline and dotall modes.
Example 11: Multiline Regex
def multiline = """Line 1: Hello World
Line 2: Groovy Rules
Line 3: Regex Power
Line 4: Pattern Match"""
// Default - ^ and $ match start/end of entire string
def defaultMatch = multiline.findAll(/^Line \d+.*/)
println "Default: ${defaultMatch}"
// MULTILINE mode - ^ and $ match start/end of each line
def multilineMatch = multiline.findAll(/(?m)^Line \d+:.*$/)
println "Multiline: ${multilineMatch}"
// Extract specific lines
def groovyLine = multiline.find(/(?m)^.*Groovy.*$/)
println "Groovy line: ${groovyLine}"
// DOTALL mode - dot matches newlines too
def html = """<div>
<p>Hello World</p>
</div>"""
def withoutDotall = html.find(/<div>.*<\/div>/)
println "Without DOTALL: ${withoutDotall}"
def withDotall = html.find(/(?s)<div>.*<\/div>/)
println "With DOTALL: ${withDotall?.replaceAll(/\n/, '\\\\n')}"
// Combined flags
def lines = multiline.findAll(/(?mi)^line \d+:.*groovy.*$/)
println "Case insensitive multiline: ${lines}"
Output
Default: [Line 1: Hello World] Multiline: [Line 1: Hello World, Line 2: Groovy Rules, Line 3: Regex Power, Line 4: Pattern Match] Groovy line: Line 2: Groovy Rules Without DOTALL: null With DOTALL: null Case insensitive multiline: [Line 2: Groovy Rules]
What happened here: By default, ^ and $ only match the start and end of the entire string. The (?m) flag enables multiline mode where they match start and end of each line. The (?s) flag (DOTALL) makes . match newline characters too – without it, . matches everything except newlines. You can combine flags: (?mi) gives you both multiline and case-insensitive matching.
Example 12: Real-World Regex Parsing
What we’re doing: Practical real-world regex applications combining multiple Groovy features.
Example 12: Real-World Regex
// Parse a URL into components
def url = "https://www.example.com:8080/path/to/page?key=value&foo=bar#section"
def urlMatcher = url =~ /^(https?):\/\/([^:\/]+)(?::(\d+))?(\/[^?#]*)?(?:\?([^#]*))?(?:#(.*))?$/
if (urlMatcher) {
def (_, protocol, host, port, path, query, fragment) = urlMatcher[0]
println "Protocol: ${protocol}"
println "Host: ${host}"
println "Port: ${port ?: 'default'}"
println "Path: ${path ?: '/'}"
println "Query: ${query ?: 'none'}"
println "Fragment: ${fragment ?: 'none'}"
}
println ""
// Extract and summarize log data
def logs = """2026-03-08 14:30:22 ERROR Database connection lost
2026-03-08 14:30:25 WARN Retrying connection attempt 1
2026-03-08 14:30:28 WARN Retrying connection attempt 2
2026-03-08 14:30:31 ERROR Connection retry failed
2026-03-08 14:30:32 INFO Switching to backup database
2026-03-08 14:30:33 INFO Backup connection established"""
def logEntries = logs.findAll(/(?m)^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\w+) (.+)$/) {
full, date, time, level, msg -> [date: date, time: time, level: level, message: msg]
}
def summary = logEntries.groupBy { it.level }
summary.each { level, entries ->
println "${level}: ${entries.size()} entries"
entries.each { println " [${it.time}] ${it.message}" }
}
println ""
// Markdown link extraction
def markdown = "Check [Groovy Docs](https://groovy-lang.org/documentation.html) and [String Guide](/groovy-string-tutorial-complete-guide/) for more."
def links = markdown.findAll(/\[([^\]]+)\]\(([^)]+)\)/) { full, text, href -> [text: text, href: href] }
links.each { println "Link: '${it.text}' -> ${it.href}" }
Output
Protocol: https Host: www.example.com Port: 8080 Path: /path/to/page Query: key=value&foo=bar Fragment: section ERROR: 2 entries [14:30:22] Database connection lost [14:30:31] Connection retry failed WARN: 2 entries [14:30:25] Retrying connection attempt 1 [14:30:28] Retrying connection attempt 2 INFO: 2 entries [14:30:32] Switching to backup database [14:30:33] Backup connection established Link: 'Groovy Docs' -> https://groovy-lang.org/documentation.html Link: 'String Guide' -> /groovy-string-tutorial-complete-guide/
What happened here: The URL parser uses a single regex with optional groups to break a URL into its components – protocol, host, port, path, query string, and fragment. The log analyzer combines findAll with a closure to parse each line into a map, then uses Groovy’s groupBy to organize by log level. The Markdown link extractor shows how regex handles nested bracket patterns. These are patterns you’ll use in real production code.
Regex with Groovy Collections
One of Groovy’s strengths is how naturally regex integrates with collection operations. Here are patterns that combine both:
Regex with Collections
def files = ["report_2026.pdf", "data.csv", "image_001.png", "backup_2025.tar.gz",
"notes.txt", "photo_123.jpg", "archive_2024.zip"]
// grep with regex - filter a list by pattern
def yearFiles = files.grep(~/.*\d{4}.*/)
println "Files with year: ${yearFiles}"
// findAll with regex closure
def imageFiles = files.findAll { it ==~ /.*\.(png|jpg|gif)/ }
println "Image files: ${imageFiles}"
// collect with regex extraction
def years = files.findAll { it =~ /\d{4}/ }
.collect { (it =~ /(\d{4})/)[0][1] }
println "Years found: ${years}"
// any and every with regex
def hasZip = files.any { it ==~ /.*\.zip/ }
def allText = files.every { it ==~ /.*\.txt/ }
println "Has ZIP: ${hasZip}, All TXT: ${allText}"
// Group by file extension using regex
def byExtension = files.groupBy { (it =~ /\.(\w+)$/)[0][1] }
byExtension.each { ext, names ->
println "${ext}: ${names}"
}
Output
Files with year: [report_2026.pdf, backup_2025.tar.gz, archive_2024.zip] Image files: [image_001.png, photo_123.jpg] Years found: [2026, 2025, 2024] Has ZIP: true, All TXT: false pdf: [report_2026.pdf] csv: [data.csv] png: [image_001.png] gz: [backup_2025.tar.gz] txt: [notes.txt] jpg: [photo_123.jpg] zip: [archive_2024.zip]
The grep method is especially elegant – it accepts a Pattern directly and filters the list to only matching elements. This works because Groovy’s Pattern implements isCase(), which is the same mechanism that powers regex in switch statements.
Common Regex Patterns Reference
Here’s a quick reference of regex patterns you’ll use frequently in Groovy:
| Pattern | Matches | Example |
|---|---|---|
\d+ | One or more digits | “42”, “12345” |
\w+ | Word characters (letters, digits, underscore) | “hello”, “var_1” |
\s+ | One or more whitespace | ” “, “\t\n” |
[a-zA-Z]+ | One or more letters | “Hello”, “abc” |
.* | Any characters (greedy) | Matches everything |
.*? | Any characters (lazy/non-greedy) | Matches as little as possible |
^...$ | Full string anchors | Start to end matching |
(?i) | Case-insensitive flag | “Hello” matches “hello” |
(?m) | Multiline flag | ^ and $ match per line |
(?s) | DOTALL flag | Dot matches newlines |
(?=...) | Positive lookahead | Followed by pattern |
(?<=...) | Positive lookbehind | Preceded by pattern |
Performance Considerations
Regex performance matters when you’re processing thousands of strings. Here are practical tips:
- Precompile patterns – use
~/pattern/and store the Pattern object when reusing it across multiple strings - Avoid catastrophic backtracking – patterns like
(a+)+can take exponential time on certain inputs - Use possessive quantifiers –
\d++instead of\d+when you don’t need backtracking - Prefer specific over general –
[a-zA-Z]is faster than.because it doesn’t need to match everything - Use non-capturing groups –
(?:...)when you don’t need to extract the group, as capturing adds overhead - Consider alternatives – for simple contains checks,
String.contains()is faster than regex
Performance Tips
import java.util.regex.Pattern
// Precompile for reuse
def pattern = ~/\b[A-Z][a-z]+\b/
def names = ["John went Home", "Alice met Bob", "Charlie ran Fast"]
names.each { sentence ->
def capitalized = (sentence =~ pattern).collect { it }
println "${sentence} -> Capitalized words: ${capitalized}"
}
// Use contains() for simple checks instead of regex
def text = "Hello World"
println "Contains (fast): ${text.contains('World')}"
println "Regex (slower): ${text =~ /World/ as boolean}"
// Non-capturing group when you don't need the match
def data = "2026-03-08"
def withCapture = data =~ /(\d{4})-(\d{2})-(\d{2})/
def withoutCapture = data =~ /(?:\d{4})-(?:\d{2})-(?:\d{2})/
println "With capture groups: ${withCapture[0]}"
println "Without capture groups: ${withoutCapture[0]}"
Output
John went Home -> Capitalized words: [John, Home] Alice met Bob -> Capitalized words: [Alice, Bob] Charlie ran Fast -> Capitalized words: [Charlie, Fast] Contains (fast): true Regex (slower): true With capture groups: [2026-03-08, 2026, 03, 08] Without capture groups: [2026-03-08]
Best Practices
DO:
- Use slashy strings
/pattern/for all regex patterns – they eliminate double-escaping - Use
==~for validation (full match) and=~for searching (partial match) - Use named groups
(?<name>...)for complex patterns to improve readability - Use the
(?x)flag with comments for complex patterns - Precompile patterns with
~/pattern/when reusing across multiple operations - use GDK methods like
findAll(regex),find(regex), andreplaceAll(regex, closure)
DON’T:
- Confuse
=~(find/partial) with==~(match/full) – this is the most common regex bug in Groovy - Use regex for simple string operations –
contains(),startsWith(),endsWith()are faster and clearer - Write patterns with nested quantifiers like
(a+)+– they cause catastrophic backtracking - Parse HTML or XML with regex – use a proper parser instead
- Forget that
.doesn’t match newlines by default – use(?s)when needed
Conclusion
We covered the key techniques for Groovy regular expressions – from the three operators (~/pattern/, =~, ==~) to capturing groups, lookaheads, multiline modes, and real-world parsing. Groovy takes Java’s powerful but verbose regex support and makes it clean, readable, and actually enjoyable to use.
The key things to remember: slashy strings eliminate escape hell, =~ finds patterns anywhere while ==~ requires a full match, and GDK methods like findAll(regex) make extraction a one-liner. Once you internalize these patterns, you’ll find yourself reaching for regex confidently instead of avoiding it.
For more on working with strings, check out our complete Groovy String tutorial and the split() guide for regex-powered splitting.
Summary
=~creates a Matcher (find anywhere),==~returns a boolean (full match)- Slashy strings
/pattern/let you write\dinstead of\\d ~/pattern/creates a precompiledPatternfor reuse- Groovy’s Matcher is iterable – use
each,collect, and indexing on matches - GDK string methods like
findAll(regex)andreplaceAll(regex, closure)are more concise than raw Matcher code
If you also work with build tools, CI/CD pipelines, or cloud CLIs, check out Command Playground to practice 105+ CLI tools directly in your browser — no install needed.
Up next: Groovy Assert and Power Assert – Testing Values
Frequently Asked Questions
What is the difference between =~ and ==~ in Groovy?
The =~ operator is the find operator – it creates a Matcher and returns true if the pattern is found anywhere in the string. The ==~ operator is the match operator – it returns a boolean that is true only if the entire string matches the pattern from start to finish. Use =~ for searching and ==~ for validation.
How do I avoid double backslashes in Groovy regex?
Use slashy strings with the /pattern/ syntax. In slashy strings, backslashes are literal, so you can write /\d+/ instead of “\\d+”. Dollar slashy strings $/pattern/$ are useful when your pattern contains forward slashes.
How do I extract all regex matches from a string in Groovy?
Use the findAll method on String: ‘text’.findAll(/pattern/). For capturing groups, pass a closure: ‘text’.findAll(/(\w+)@(\w+)/) { full, user, domain -> user }. You can also create a Matcher with =~ and use collect or iterate with each.
Can I use regex in a Groovy switch statement?
Yes, Groovy’s switch statement accepts Pattern objects in case clauses. Use case ~/pattern/: and Groovy will test the switch value against the pattern using the ==~ (full match) operator. This gives you regex-powered branching with clean syntax.
How do I precompile a regex pattern in Groovy?
Use the pattern operator: def pattern = ~/your regex here/. This creates a java.util.regex.Pattern object that can be reused across multiple matching operations without recompilation. This is equivalent to Pattern.compile() but with cleaner syntax and no double-escaping needed.
Related Posts
Previous in Series: Groovy Spaceship Operator – Compare with <=>
Next in Series: Groovy Assert and Power Assert – Testing Values
Related Topics You Might Like:
- Groovy String Tutorial – The Complete Guide
- Groovy Split String – Complete Guide
- Groovy Assert and Power Assert – Testing Values
This post is part of the Groovy & Grails Cookbook series on TechnoScripts.com

No comment