Note: This article not only about creating particular RegEx builder, but about how to think in the SOLID way. Approaches described here may be reused to create anything you want.
Regular expressions can feel like trying to decipher an alien language—one minute you’re confident in your pattern, and the next, you’re tangled in a mess of backslashes and brackets. If you've ever spent hours debugging a regex that “almost” worked, you know the struggle. What if there was a way to build regexes as intuitively as snapping together LEGO pieces? Enter the Fluent Regex Builder.
In this article, we'll dive into creating a fluent builder for regexes that adheres to SOLID principles, making your code modular, testable, and downright fun to read. We’ll cover everything from basic building blocks to advanced features like named capturing groups, composite patterns, and even caching of built regexes.
Why a Fluent Builder?
Imagine trying to explain a complex machine by describing every bolt and gear individually. Overwhelming, right? That's what regexes often feel like. A fluent builder allows you to construct these complex patterns step-by-step, in a human-readable form.
Consider this:
var passwordRegex = RegexBuilder.Create()
.StartAnchor()
.HasDigit() // (?=.*\d)
.HasSpecialChar() // (?=.*[!@#$%^&*])
.HasLetter() // (?=.*[a-zA-Z])
.AnyCharacter()
.AtLeast(8)
.EndAnchor()
.Build();
This snippet is like a recipe—clear instructions that produce a robust regex, eliminating the guesswork of manual pattern crafting.
Breaking Down the Components
Our builder works by assembling a series of components, each responsible for a tiny part of the final regex. Think of it as building with LEGO bricks, where each brick has a single responsibility:
Anchors: Define where the regex starts (
^
) and ends ($
).Literals & Patterns: Represent specific characters or sequences (like
"http"
or".*"
).Groups: Bundle parts of the regex together, including our fancy Named Capturing Groups (
(?<name>...)
).Quantifiers: Specify how many times something should appear (
+
,?
,{n}
, etc.).
Using this modular approach, you can mix and match components to create anything from a simple email validator to a URL matcher.
Embracing SOLID Principles
When creating a fluent builder, following SOLID principles isn’t just a buzzword—it’s a necessity for maintainable code. Here’s a quick rundown of how we applied these principles:
Single Responsibility: Each component (e.g.,
LiteralComponent
,GroupStartComponent
) handles one aspect of regex construction.Open/Closed: Need a new feature? Add a new component without altering the existing ones.
Liskov Substitution: Our components implement a common interface, ensuring you can swap them without any headaches.
Interface Segregation: We break down behaviors into small, focused interfaces so that classes only implement what they need.
Dependency Inversion: The builder depends on abstractions (interfaces), not concrete classes, making the design flexible and testable.
Define the Building Blocks
The IRegexComponent
Interface
At the heart of our builder lies a simple contract. Every component in our regex must be able to “render” itself as a string. This is where the IRegexComponent
interface comes in:
public interface IRegexComponent
{
string Render();
}
Analogy:
Think of this as labeling ingredients in a recipe. Whether you’re adding salt, pepper, or a dash of lemon, every ingredient clearly states its role. In our case, every regex component tells you exactly what it represents.
The ICompositeComponent
Interface
Sometimes, a single ingredient isn’t enough—you need a combination of them. That’s where composite components shine. The ICompositeComponent
extends IRegexComponent
and adds a method for checking “balance,” which is especially useful for managing groups in regexes.
public interface ICompositeComponent : IRegexComponent
{
int Balance();
}
Analogy:
Imagine building a sandwich. Not only do you need to know what each layer is, but you also want to ensure that you have an equal number of slices of bread (or, in our case, matching opening and closing parentheses). The Balance()
method makes sure you don’t end up with an extra slice!
Creating Concrete Components
Once we have our interfaces, we can start building concrete classes. Each class represents a piece of our regex puzzle.
Basic Components
PatternComponent: A simple wrapper for static patterns.
public class PatternComponent : IRegexComponent
{
private readonly string _pattern;
public PatternComponent(string pattern) => _pattern = pattern;
public string Render() => _pattern;
}
LiteralComponent: Handles literal strings by escaping regex metacharacters.
public class LiteralComponent : IRegexComponent
{
private readonly string _literal;
public LiteralComponent(string literal)
{
// Imagine this as ensuring your ingredient is properly measured!
_literal = Regex.Escape(literal);
}
public string Render() => _literal;
}
Special Components
Group Components:
For grouping parts of your regex, we create components like GroupStartComponent
and GroupEndComponent
.
public class GroupStartComponent : IRegexComponent
{
private readonly bool _capturing;
public GroupStartComponent(bool capturing = true) => _capturing = capturing;
public string Render() => _capturing ? "(" : "(?:";
}
public class GroupEndComponent : IRegexComponent
{
public string Render() => ")";
}
LookaheadComponent:
Used to assert that certain conditions exist ahead in the string.
public class LookaheadComponent : IRegexComponent
{
private readonly IRegexComponent _inner;
private readonly bool _isNegative;
public LookaheadComponent(IRegexComponent inner, bool isNegative = false)
{
_inner = inner;
_isNegative = isNegative;
}
public string Render() => _isNegative ? $"(?!{_inner.Render()})" : $"(?={_inner.Render()})";
}
QuantifierComponent:
Adds quantifiers like {n,}
, +
, or ?
to previous components.
public class QuantifierComponent : IRegexComponent
{
private readonly string _quantifier;
public QuantifierComponent(string quantifier) => _quantifier = quantifier;
public string Render() => _quantifier;
}
Assembling the Fluent Builder
Now that we have our components, let’s create the fluent builder class. The builder’s job is to let you chain these components together in a logical, readable manner.
The Basic Fluent Methods
Here’s how you might implement methods like StartAnchor()
, HasSpecialChar()
, and AtLeast()
:
public class RegexBuilder
{
private readonly List<IRegexComponent> _components = new List<IRegexComponent>();
public static RegexBuilder Create() => new RegexBuilder();
public RegexBuilder StartAnchor()
{
_components.Add(new PatternComponent("^"));
return this;
}
public RegexBuilder HasSpecialChar()
{
// A lookahead to ensure at least one special character
_components.Add(new LookaheadComponent(new PatternComponent(".*[!@#$%^&*]")));
return this;
}
public RegexBuilder AtLeast(int min)
{
_components.Add(new QuantifierComponent("{" + min + ",}"));
return this;
}
// Additional fluent methods would follow a similar pattern...
public string Build()
{
var sb = new StringBuilder();
foreach (var component in _components)
sb.Append(component.Render());
return sb.ToString();
}
}
Analogy:
Think of the builder as your master chef. You call methods like StartAnchor()
to add a base ingredient, HasSpecialChar()
to sprinkle in some spice, and AtLeast(8)
to ensure you have enough servings. The final dish is the regex pattern that’s built piece by piece.
Implementing Caching and Named Regexes
One of the cooler features of our builder is the ability to cache regexes for reuse. This is especially handy for patterns like email or URL validators that you might use across your application.
The Repository Pattern
We create a thread-safe repository to store named regexes. This repository uses a ConcurrentDictionary
to ensure that access is safe even when multiple threads are involved.
public class NamedRegexRepository : INamedRegexRepository
{
private readonly ConcurrentDictionary<string, Regex> _regexes = new ConcurrentDictionary<string, Regex>();
public void Register(string name, RegexBuilder builder, bool compile = false)
{
var pattern = builder.Build();
var regex = new Regex(pattern, compile ? RegexOptions.Compiled : RegexOptions.None);
if (!_regexes.TryAdd(name, regex))
throw new InvalidOperationException($"A regex with the name '{name}' is already registered.");
}
public Regex Get(string name)
{
if (_regexes.TryGetValue(name, out var regex))
return regex;
throw new KeyNotFoundException($"No regex registered with the name '{name}'.");
}
public bool TryGet(string name, out Regex regex)
{
return _regexes.TryGetValue(name, out regex);
}
}
Using Named Regexes in the Builder
In the builder, you add methods like Build(string name)
or BuildOrGet(string name)
, which either register a new regex or return an existing one.
public string Build(string name)
{
string pattern = Build();
_repository.Register(name, this);
return pattern;
}
Analogy:
Imagine you’re at a coffee shop that remembers your regular order. The first time, you tell the barista your unique order (registering the regex). Next time, you simply say “my usual,” and they fetch it from the cache—fast and efficient.
How about try to use it?
Here’s how you can use our builder to create regexes for everyday tasks:
Email Validation
var emailRegex = RegexBuilder.Create()
.StartAnchor()
.CharacterClass("a-zA-Z0-9._%+-").OneOrMore()
.Literal("@")
.CharacterClass("a-zA-Z0-9.-").OneOrMore()
.Literal(".")
.CharacterClass("a-zA-Z").AtLeast(2)
.EndAnchor()
.ToRegex();
// Test against valid and invalid emails
Console.WriteLine(emailRegex.IsMatch("test@example.com")); // True
Console.WriteLine(emailRegex.IsMatch("invalid-email")); // False
URL Matching
var urlRegex = RegexBuilder.Create()
.StartAnchor()
.BeginGroup()
.Literal("http")
.Literal("s").Optional()
.EndGroup()
.Literal("://")
.CharacterClass("a-zA-Z0-9\\-_").OneOrMore()
.BeginGroup()
.Literal(".")
.CharacterClass("a-zA-Z0-9\\-_").OneOrMore()
.EndGroup().OneOrMore()
.AnyCharacter().ZeroOrMore()
.EndAnchor()
.ToRegex();
// Test against valid and invalid URLs
Console.WriteLine(urlRegex.IsMatch("https://example.com")); // True
Console.WriteLine(urlRegex.IsMatch("ftp://example.com")); // False
VIN Code Checker
var vinRegex = RegexBuilder.Create()
.StartAnchor()
.CharacterClass("A-HJ-NPR-Z0-9").Exactly(17)
.EndAnchor()
.ToRegex();
// Test against valid and invalid VINs
Console.WriteLine(vinRegex.IsMatch("1HGCM82633A004352")); // True
Console.WriteLine(vinRegex.IsMatch("1HGCM82633A00435")); // False
Conclusion
Building a fluent regex builder is like constructing a custom, modular kitchen where every tool and ingredient has its place. By defining clear interfaces like IRegexComponent
and ICompositeComponent
, and then creating concrete classes for various regex pieces, we break down a daunting task into manageable parts.
With our builder, you can piece together complex regexes using a fluent, readable syntax, and even cache frequently used patterns for optimal performance. Whether you’re validating emails, matching URLs, or checking VIN codes, this approach makes regex construction both intuitive and robust.
So next time you’re faced with a gnarly regex, remember: start with your building blocks, assemble them like a pro, and let your regex shine!
All code with unit tests are ready to use on github repo: https://github.com/Forevka/RegEx.Fluent