Regex
We need to understand a bit about regexes, or "regular expressions". "Regex" for short, is a programming-language-agnostic way of searching for patterns in text.
Related Sites:
To get really good at using regex, we'd need a full course on the topic. For now, let's just cover the basics. In Python, we can use the re
module to work with regex. It has a findall function that will return a list of all the matches in a string. See examples below.
Regex for a Single Word
text = "My phone number is 555-555-5555 and my friend's number is 555-555-5556"
matches = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(matches) # ['555-555-5555', '555-555-5556']
\d
matches any digit{3}
means "exactly three of the preceding character"-
is just a literal-
that we want to match
Regex for Text Between Parentheses
text = "I have a (cat) and a (dog)"
matches = re.findall(r"\((.*?)\)", text)
print(matches) # ['cat', 'dog']
\(
and\)
are escaped parentheses that we want to match(
and)
is a capture group, meaning it groups the matched text, allowing us to reference or extract it separately..*?
matches any number of characters (except for line terminators) between the parentheses
Regex for Emails Multiple Capture Groups
text = "My email is lane@example.com and my friend's email is hunter@example.com"
matches = re.findall(r"(\w+)@(\w+\.\w+)", text)
print(matches) # [('lane', 'example.com'), ('hunter', 'example.com')]
\w
matches any word character (alphanumeric characters and underscores)+
means "one or more of the preceding character"@
is just a literal@
symbol that we want to match\.
is a literal.
that we want to match (The.
is a special character in regex, so we escape it with a leading backslash)
Regex Examples
The findall function that will return a list of all the matches in a string.
import re
text = "I'm a little teapot, short and stout. Here is my handle, here is my spout."
matches = re.findall(r"teapot", text)
print(matches) # ['teapot']
text = "My email is lane@example.com and my friend's email is hunter@example.com"
matches = re.findall(r"(\w+)@(\w+\.\w+)", text)
print(matches) # [('lane', 'example.com'), ('hunter', 'example.com')]
Testing Regexes
Use regexr.com for interactive regex testing, it breaks down each part of the pattern and explains what it does.
Last updated