Regular expressions (regex) often intimidate beginners. The cryptic syntax looks like random symbols, and the documentation can be overwhelming. Yet regex is one of the most powerful tools in programming, capable of solving complex text processing tasks in a single line.
This regex tutorial for beginners walks you through pattern matching from the ground up with practical examples you can use immediately. By the end, you'll understand how to read, write, and test your own regex patterns.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. Think of it as a mini-language for finding and manipulating text.
Simple example:
Pattern: cat
Matches: "cat", "category", "concatenate"
More complex:
Pattern: \b\d{3}-\d{3}-\d{4}\b
Matches: "555-123-4567" (phone number format)
Regex is supported in virtually all programming languages and many text editors, making it a universally valuable skill.
Why Learn Regex?
Validation: Check if input matches expected format (email, phone, credit card)
Extraction: Pull specific data from text (URLs from HTML, prices from documents)
Search and replace: Find and modify patterns (change date formats, clean data)
Data parsing: Split and process structured text
Text analysis: Count occurrences, find patterns in logs
A single regex pattern can replace hundreds of lines of string manipulation code.
Basic Regex Syntax
Literal Characters
The simplest regex pattern is literal characters:
Pattern: hello
Matches: "hello", "hello world", "say hello"
Metacharacters
Special characters with specific meanings:
. (dot): Matches any single character except newline
Pattern: c.t
Matches: "cat", "cut", "c9t"
Does NOT match: "cart" (too many characters)
^: Matches start of string
Pattern: ^hello
Matches: "hello world"
Does NOT match: "say hello"
$: Matches end of string
Pattern: world$
Matches: "hello world"
Does NOT match: "world peace"
*: Matches 0 or more of preceding element
Pattern: ca*t
Matches: "ct", "cat", "caat", "caaaat"
+: Matches 1 or more
Pattern: ca+t
Matches: "cat", "caat", "caaaat"
Does NOT match: "ct"
?: Matches 0 or 1 (makes preceding element optional)
Pattern: colou?r
Matches: "color", "colour"
Character Classes
[...]: Matches any character inside brackets
Pattern: [aeiou]
Matches: any vowel
[^...]: Matches any character NOT in brackets
Pattern: [^0-9]
Matches: any non-digit
Ranges:
[a-z] lowercase letters
[A-Z] uppercase letters
[0-9] digits
[a-zA-Z] any letter
Shorthand Character Classes
\d: Any digit (equivalent to [0-9]) \D: Any non-digit \w: Word character (letters, digits, underscore) \W: Non-word character \s: Whitespace (space, tab, newline) \S: Non-whitespace
Quantifiers
{n}: Exactly n occurrences
Pattern: \d{3}
Matches: "123", "456"
{n,}: n or more
Pattern: \d{3,}
Matches: "123", "1234", "123456"
{n,m}: Between n and m occurrences
Pattern: \d{3,5}
Matches: "123", "1234", "12345"
Grouping and Alternation
(...): Grouping
Pattern: (abc)+
Matches: "abc", "abcabc", "abcabcabc"
|: Alternation (OR)
Pattern: cat|dog
Matches: "cat" or "dog"
Combined:
Pattern: (cat|dog)s?
Matches: "cat", "cats", "dog", "dogs"
Common Patterns Explained
Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breaking it down:
^- start of string[a-zA-Z0-9._%+-]+- one or more valid email characters before @@- literal @ symbol[a-zA-Z0-9.-]+- domain name\.- literal dot (escaped)[a-zA-Z]{2,}- domain extension (at least 2 letters)$- end of string
Phone Number (US Format)
^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Breaking it down:
\(?- optional opening parenthesis\d{3}- three digits (area code)\)?- optional closing parenthesis[-.\s]?- optional separator (dash, dot, or space)\d{3}- three digits[-.\s]?- optional separator\d{4}- four digits
Matches:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- 5551234567
URL
https?://(?:www\.)?[\w.-]+\.[a-z]{2,}
Breaking it down:
https?- http or https://- literal ://(?:www\.)?- optional www. (non-capturing group)[\w.-]+- domain name\.- literal dot[a-z]{2,}- top-level domain
Password Strength
Requires: 8+ characters, uppercase, lowercase, digit, special char. (Need to generate passwords that meet these rules? Try our Password Generator.)
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Breaking it down:
^- start(?=.*[a-z])- lookahead: contains lowercase(?=.*[A-Z])- lookahead: contains uppercase(?=.*\d)- lookahead: contains digit(?=.*[@$!%*?&])- lookahead: contains special char[A-Za-z\d@$!%*?&]{8,}- at least 8 valid characters$- end
Date (YYYY-MM-DD)
^\d{4}-\d{2}-\d{2}$
Simple version matches format but not validity.
More sophisticated (validates ranges):
^(20\d{2})-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
HTML Tags
<([a-z]+)[^>]*>(.*?)</\1>
Breaking it down:
<([a-z]+)- opening tag, capture tag name[^>]*- any attributes>- close opening tag(.*?)- content (non-greedy)</\1>- closing tag (backreference to captured tag name)
Practical Examples
Extract Email from Text
const text = "Contact us at support@example.com for help";
const email = text.match(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/);
console.log(email[0]); // "support@example.com"
Validate Username
function isValidUsername(username) {
return /^[a-zA-Z0-9_]{3,16}$/.test(username);
}
// 3-16 characters, letters/numbers/underscore only
Replace Multiple Spaces with Single Space
const text = "Too many spaces";
const fixed = text.replace(/\s+/g, " ");
// "Too many spaces"
Extract Numbers from String
const text = "I have 3 cats and 2 dogs";
const numbers = text.match(/\d+/g);
// ["3", "2"]
Validate Hex Color
function isHexColor(color) {
return /^#[0-9A-Fa-f]{6}$/.test(color);
}
Flags and Modifiers
g: Global (find all matches, not just first)
"cat cat cat".match(/cat/g); // ["cat", "cat", "cat"]
i: Case-insensitive
/cat/i.test("CAT"); // true
m: Multiline (^ and $ match line starts/ends)
/^line/m.test("first\nline two"); // true
s: Dot matches newline
/.+/s.test("line one\nline two"); // true
Lookaheads and Lookbehinds
Positive Lookahead (?=...)
Matches if followed by pattern:
Pattern: \d+(?= dollars)
Text: "50 dollars and 30 cents"
Matches: "50" (followed by " dollars")
Negative Lookahead (?!...)
Matches if NOT followed by pattern:
Pattern: \d+(?! dollars)
Text: "50 dollars and 30 cents"
Matches: "30" (not followed by " dollars")
Positive Lookbehind (?<=...)
Matches if preceded by pattern:
Pattern: (?<=\$)\d+
Text: "$100 and €50"
Matches: "100" (preceded by $)
Negative Lookbehind (?<!...)
Matches if NOT preceded by pattern:
Pattern: (?<!\$)\d+
Text: "$100 and €50"
Matches: "50" (not preceded by $)
Common Pitfalls
Greedy vs Non-Greedy
Greedy (default): Match as much as possible
Pattern: <.*>
Text: "<div>content</div>"
Matches: "<div>content</div>" (entire string)
Non-greedy: Add ? after quantifier
Pattern: <.*?>
Matches: "<div>" and "</div>" (separately)
Not Escaping Special Characters
Wrong: .
Right (literal dot): \.
Wrong: $50
Right: \$50
Performance Issues
Catastrophic backtracking with nested quantifiers:
Dangerous: (a+)+b
Better: a+b
Overcomplicating
Regex isn't always the answer. For complex parsing (HTML, JSON), use proper parsers — for example, a JSON formatter is far more reliable than regex for validating JSON structure.
Testing and Debugging Regex
Online Testers
- regex101.com: Excellent explanations and testing
- regexr.com: Visual representation
- RegexBuddy: Comprehensive desktop app
Try our Regex Tester to test patterns with your own text—all processing happens in your browser for privacy.
Tips for Creating Patterns
- Start simple: Build patterns incrementally
- Test with various inputs: Include edge cases
- Use comments: Most regex flavors support comments
- Break complex patterns into parts: Test each piece
Language-Specific Considerations
JavaScript
// Literal notation
const regex = /pattern/flags;
// Constructor
const regex = new RegExp("pattern", "flags");
// Methods
string.match(regex)
string.search(regex)
string.replace(regex, replacement)
regex.test(string)
regex.exec(string)
Python
import re
# Search
re.search(r'pattern', text)
# Match from start
re.match(r'pattern', text)
# Find all
re.findall(r'pattern', text)
# Replace
re.sub(r'pattern', 'replacement', text)
Note the r prefix for raw strings (prevents backslash escape issues).
Other Languages
- PHP:
preg_match(),preg_replace() - Java:
PatternandMatcherclasses - Ruby:
/pattern/literal, String methods - Perl: Built-in regex support (regex originated in Perl)
Syntax is largely consistent, but check documentation for language-specific features.
Best Practices
1. Keep it simple: Complex regex is hard to maintain
2. Comment complex patterns:
const regex = /
^ # Start of string
\d{3} # Three digits
- # Literal dash
\d{2} # Two digits
$ # End of string
/x; // x flag allows whitespace and comments
3. Validate, don't parse: Use regex for validation; use proper parsers for complex structures
4. Test thoroughly: Include edge cases
5. Consider readability: Sometimes simple string methods are clearer than regex
6. Cache compiled patterns: In loops, compile regex once
7. Be specific: \d+ is better than .+ when you expect digits
Real-World Use Cases
Form Validation
const patterns = {
email: /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
phone: /^\d{3}-\d{3}-\d{4}$/,
zipCode: /^\d{5}(-\d{4})?$/,
creditCard: /^\d{4}\s\d{4}\s\d{4}\s\d{4}$/
};
Data Cleaning
Regex powers many text processing workflows. If you handle messy text regularly, our Text Cleaner automates common cleanup tasks. Here are some regex patterns for custom cleaning:
// Remove HTML tags
text.replace(/<[^>]+>/g, '');
// Standardize phone numbers
phone.replace(/\D/g, ''); // Keep only digits
// Fix multiple spaces
text.replace(/\s+/g, ' ');
Log Parsing
// Extract IP addresses from logs
const ips = logText.match(/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g);
// Find error lines
const errors = logText.match(/^.*ERROR.*$/gm);
URL Manipulation
When working with URLs, regex handles extraction and pattern matching, while a URL Encoder handles the encoding side:
// Extract domain
const domain = url.match(/https?:\/\/([^\/]+)/)[1];
// Change protocol
const httpsUrl = url.replace(/^http:/, 'https:');
Learning Path
1. Master basics: Literals, character classes, quantifiers
2. Practice common patterns: Email, phone, URL
3. Learn anchors and boundaries: ^, $, \b
4. Understand grouping: Capture groups, non-capturing groups
5. Explore advanced features: Lookaheads, backreferences
6. Study real-world examples: Read regex in open-source code
7. Build your own patterns: Practice with actual problems
Conclusion
Regular expressions are powerful once you understand the fundamentals. Start with simple patterns and build complexity gradually. The syntax becomes intuitive with practice.
Key takeaways:
- Regex defines search patterns using special syntax
- Master basic metacharacters (., *, +, ?, ^, $)
- Use character classes ([...]) and shorthands (\d, \w, \s)
- Quantifiers control repetition ({n}, {n,m})
- Test patterns thoroughly with various inputs
- Keep patterns simple and readable
With regex in your toolkit, you can solve complex text processing tasks efficiently. Whether validating user input, parsing data, or cleaning text, regex provides a concise, powerful solution.
Ready to test your regex patterns? Use our Regex Tester to experiment with patterns and see matches highlighted in real-time—all processing happens securely in your browser.
Related Reading
- Password Security Guide — see how regex validation fits into a broader password security strategy
- JSON vs XML: Choosing the Right Data Format — when to use proper parsers instead of regex for structured data
- Base64 Encoding Explained — another foundational concept every developer should understand
Regular expressions might seem cryptic at first, but they're an investment that pays dividends throughout your programming journey. Start practicing today, and soon you'll wonder how you ever lived without them.