Regular Expression (or "Regex" for short) is a special way to search for patterns in text. Think of it like the "find" feature in your word processor, but much more powerful!
Instead of searching for exact words, regex lets you search for patterns like "any 5-digit number" or "any email address."
Regex helps you:
- Check if information (like emails or phone numbers) is typed correctly
- Find specific parts in large text
- Replace or change many pieces of text at once
- Save time by automating text tasks
Let's say you want to make rules for usernames in your app. You want usernames to:
- Only use letters, numbers, underscores, and hyphens
- Be between 3-16 characters long
Here's a regex pattern for that:
^[a-zA-Z0-9_-]{3,16}$
This might look confusing, but let's break it down:
^
means "start of the text"[a-zA-Z0-9_-]
means "any letter, number, underscore, or hyphen"{3,16}
means "between 3 and 16 of these characters"$
means "end of the text"
So username123, john_doe, and cool-name would all be accepted!
Letters, numbers, and most symbols just match themselves:
cat
matches the word "cat"42
matches the number "42"
Some characters have special powers in regex. If you want to use these as normal characters, put a backslash (\
) before them:
.
(dot) - Matches any character*
,+
,?
- Special repeat symbols^
,$
- Position markers\
- The escape character[]
,()
,{}
- Grouping symbols
Square brackets [ ]
let you match any ONE character from a list:
[aeiou]
matches any vowel[0-9]
matches any digit[a-zA-Z]
matches any letter (upper or lowercase)
Common patterns have shortcuts:
\d
matches any digit (same as[0-9]
)\w
matches any "word character" (letters, numbers, underscore)\s
matches any space, tab, or line break
*
means "zero or more" (can appear any number of times or not at all)+
means "one or more" (must appear at least once)?
means "zero or one" (optional, appears once or not at all){n}
means exactly n times{n,m}
means between n and m times
Pattern: ^\d{10}$
Example: 1234567890
Explanation:
^
ensures the pattern starts at the beginning of the string\d{10}
matches exactly 10 digits (0-9)$
ensures the pattern ends at the end of the string- This pattern works for simple 10-digit phone numbers without any separators
For more complex phone formats with separators:
Pattern: ^(\+\d{1,3}[ -])?\(?\d{3}\)?[ -]?\d{3}[ -]?\d{4}$
Example: +1-123-456-7890 or (123) 456-7890
Explanation:
(\+\d{1,3}[ -])?
optionally matches a country code with + followed by 1-3 digits and a space or hyphen\(?\d{3}\)?
matches 3 digits for the area code, optionally surrounded by parentheses[ -]?
optionally matches a space or hyphen as separators\d{3}[ -]?\d{4}
matches the remaining 7 digits with an optional separator in the middle
For 6-digit Indian PIN codes:
Pattern: ^[1-9][0-9]{5}$
Example: 400001
Explanation:
^[1-9]
ensures the PIN code starts with a digit from 1-9 (not 0)[0-9]{5}$
ensures the remaining 5 characters are digits from 0-9- This pattern follows the Indian PIN code format which is always a 6-digit number not starting with 0
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Examples: [email protected], [email protected]
Explanation:
^[a-zA-Z0-9._%+-]+
matches one or more characters that could be letters, numbers, dots, underscores, percent signs, plus signs, or hyphens (valid username characters)@
matches the @ symbol literally[a-zA-Z0-9.-]+
matches one or more characters for the domain name (letters, numbers, dots, or hyphens)\.
matches a dot literally (escaped because dot is a special character in regex)[a-zA-Z]{2,}$
matches at least 2 letters for the top-level domain (like com, org, etc.)
The pattern works for both complex emails with special characters like [email protected]
and simpler common formats like [email protected]
.
For a password that requires at least 8 characters, one uppercase letter, one lowercase letter, one number, and one special character:
Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Example: Password1!
Explanation:
^
ensures the pattern starts at the beginning of the string(?=.*[a-z])
is a positive lookahead that ensures there is at least one lowercase letter(?=.*[A-Z])
ensures there is at least one uppercase letter(?=.*\d)
ensures there is at least one digit(?=.*[@$!%*?&])
ensures there is at least one of the special characters[A-Za-z\d@$!%*?&]{8,}$
ensures the password is at least 8 characters long and only contains allowed characters
Pattern: ^(https?:\/\/)?(www\.)?[a-zA-Z0-9]+\.[a-zA-Z]{2,}(\/\S*)?$
Example: https://www.example.com/path
Explanation:
^(https?:\/\/)?
optionally matches http:// or https://(www\.)?
optionally matches "www."[a-zA-Z0-9]+\.
matches one or more alphanumeric characters followed by a dot (domain name)[a-zA-Z]{2,}
matches at least 2 letters (top-level domain)(\/\S*)?$
optionally matches a slash followed by any non-whitespace characters (URL path)
Pattern: ^(0[1-9]|[12][0-9]|3[01])\/(0[1-9]|1[0-2])\/\d{4}$
Example: 25/12/2023
Explanation:
^(0[1-9]|[12][0-9]|3[01])
matches valid day values from 01-310[1-9]
matches days 01-09[12][0-9]
matches days 10-293[01]
matches days 30-31
\/
matches a forward slash literally(0[1-9]|1[0-2])
matches valid month values from 01-120[1-9]
matches months 01-091[0-2]
matches months 10-12
\/\d{4}$
matches a forward slash followed by exactly 4 digits for the year
Pattern: ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Example: 192.168.1.1
Explanation:
- The complex pattern
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
matches a valid octet in an IP address (0-255)25[0-5]
matches 250-2552[0-4][0-9]
matches 200-249[01]?[0-9][0-9]?
matches 0-199
\.
matches a dot literally- The entire pattern repeats 4 times with dots in between to match all four octets of an IP address
For a username with 3-16 characters, allowing letters, numbers, underscores, and hyphens:
Pattern: ^[a-zA-Z0-9_-]{3,16}$
Example: user_name123
Explanation:
^[a-zA-Z0-9_-]
ensures the username only contains letters, numbers, underscores, or hyphens{3,16}$
ensures the username is between 3 and 16 characters long- These restrictions are common for usernames to ensure they're easy to type and remember
Pattern: ^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9][0-9])[0-9]{12})$
Example: 4111111111111111 (Visa)
Explanation:
- This pattern validates the most common credit card formats:
4[0-9]{12}(?:[0-9]{3})?
matches Visa cards (13 or 16 digits starting with 4)5[1-5][0-9]{14}
matches MasterCard (16 digits starting with 51-55)3[47][0-9]{13}
matches American Express (15 digits starting with 34 or 37)6(?:011|5[0-9][0-9])[0-9]{12}
matches Discover cards (16 digits starting with 6011 or 65)
Pattern: ^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$
Example: <div class="example">Content</div>
Explanation:
^<([a-z]+)
matches the opening of an HTML tag and captures the tag name([^<]+)*
matches any attributes in the tag(?:>(.*)<\/\1>|\s+\/>)$
matches either:- A closing
>
, any content, and then a closing tag with the same name as the opening tag (\1
refers to the first captured group) - OR a self-closing tag like
<img src="example.jpg" />
- A closing
Pattern: ^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Example: #FF5733 or #F73
Explanation:
^#
matches the hash symbol at the start([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
matches either:- Exactly 6 hexadecimal characters (0-9, A-F, a-f) for full hex color codes like #FF5733
- OR exactly 3 hexadecimal characters for shorthand hex color codes like #F73
Pattern: ^([01]?[0-9]|2[0-3]):[0-5][0-9]$
Example: 13:45
Explanation:
^([01]?[0-9]|2[0-3])
matches valid hour values from 0-23[01]?[0-9]
matches hours 0-192[0-3]
matches hours 20-23
:
matches the colon literally[0-5][0-9]$
matches valid minute values from 00-59
Character | Description | Example |
---|---|---|
. |
Matches any single character | a.c matches "abc", "adc", etc. |
^ |
Matches start of a string | ^hello matches strings that start with "hello" |
$ |
Matches end of a string | world$ matches strings that end with "world" |
* |
Matches 0 or more occurrences | ab*c matches "ac", "abc", "abbc", etc. |
+ |
Matches 1 or more occurrences | ab+c matches "abc", "abbc", but not "ac" |
? |
Matches 0 or 1 occurrence | ab?c matches "ac" and "abc" only |
\ |
Escapes special characters | a\.c matches "a.c" literally |
\d |
Matches any digit (0-9) | \d{3} matches "123", "456", etc. |
\w |
Matches any word character (a-z, A-Z, 0-9, _) | \w+ matches "abc_123" |
\s |
Matches any whitespace character | a\sb matches "a b" |
[...] |
Matches any one character in brackets | [abc] matches "a", "b", or "c" |
[^...] |
Matches any one character NOT in brackets | [^abc] matches any character except "a", "b", or "c" |
{n} |
Matches exactly n occurrences | a{3} matches "aaa" |
{n,} |
Matches n or more occurrences | a{2,} matches "aa", "aaa", etc. |
{n,m} |
Matches between n and m occurrences | a{1,3} matches "a", "aa", "aaa" |
() |
Groups expressions and remembers matched text | (ab)+ matches "ab", "abab", etc. |
` | ` | Acts like OR operator |
You can test your regex patterns on these websites:
- RegExr: https://regexr.com/
- Regex101: https://regex101.com/
- Start simple: Begin with basic patterns and gradually add complexity
- Test thoroughly: Always test your regex with various input strings
- Use anchors:
^
and$
ensure the entire string matches your pattern - Be specific: Make your patterns as specific as possible to avoid false matches
- Use online tools: Regex testing websites help visualize how your pattern works
- Break it down: Complex patterns can be understood by breaking them into smaller parts
- Comment your regex: In code, comment complex regex to explain what it does
Remember, regex is powerful but can be complex. Take your time to understand each part of a pattern before using it in your projects.
Happy pattern matching!