Regex

We need to understand a bit about regexes, or "regular expressions". "Regex" for short, is a programming-language-agnostic way of searching for patterns in text.

To get really good at using regex, we'd need a full course on the topic. For now, let's just cover the basics. In Python, we can use the re module to work with regex. It has a findall function that will return a list of all the matches in a string. See examples below.

Regex for a Single Word

text = "My phone number is 555-555-5555 and my friend's number is 555-555-5556"
matches = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(matches) # ['555-555-5555', '555-555-5556']
  • \d matches any digit

  • {3} means "exactly three of the preceding character"

  • - is just a literal - that we want to match

Regex for Text Between Parentheses

text = "I have a (cat) and a (dog)"
matches = re.findall(r"\((.*?)\)", text)
print(matches) # ['cat', 'dog']
  • \( and \) are escaped parentheses that we want to match

  • ( and ) is a capture group, meaning it groups the matched text, allowing us to reference or extract it separately.

  • .*? matches any number of characters (except for line terminators) between the parentheses

Regex for Emails Multiple Capture Groups

text = "My email is lane@example.com and my friend's email is hunter@example.com"
matches = re.findall(r"(\w+)@(\w+\.\w+)", text)
print(matches)  # [('lane', 'example.com'), ('hunter', 'example.com')]
  • \w matches any word character (alphanumeric characters and underscores)

  • + means "one or more of the preceding character"

  • @ is just a literal @ symbol that we want to match

  • \. is a literal . that we want to match (The . is a special character in regex, so we escape it with a leading backslash)

Regex Examples

The findall function that will return a list of all the matches in a string.

import re
text = "I'm a little teapot, short and stout. Here is my handle, here is my spout."
matches = re.findall(r"teapot", text)
print(matches) # ['teapot']

text = "My email is lane@example.com and my friend's email is hunter@example.com"
matches = re.findall(r"(\w+)@(\w+\.\w+)", text)
print(matches)  # [('lane', 'example.com'), ('hunter', 'example.com')]

Testing Regexes

Use regexr.com for interactive regex testing, it breaks down each part of the pattern and explains what it does.

Last updated