Checking if a Python string contains a substring is a common task in Python programming. Knowing how to do this effectively opens up possibilities for various text manipulation and decision-making within your code. Python offers several methods for substring checks, each with its own strengths:
- Use the
in
operator for a straightforward and readable approach. - Use the
__contains__()
method for a direct but less common approach. - Use
find()
orindex()
methods to get the position of the substring. - Use the
count()
method to determine the number of occurrences of a substring. - Use
startswith()
andendswith()
methods to check if a string starts or ends with a substring. - Use list comprehension for checking multiple substrings simultaneously.
- Use regular expressions for complex pattern matching searches.
- Use Pandas for substring searches in tabular data (like CSV or Excel files).
Let’s explore each of these methods in more detail!
Using the in Operator (Recommended)
The most Pythonic and recommended way to check if a string contains a specific substring is to use Python’s membership operator, in
. This operator returns True if the substring exists in the string and False otherwise. This approach is ideal for situations where you simply need to confirm the presence or absence of a substring.
text = "The light from the lighthouse lights the way."
print("light" in text)
# Output: True
To check if a substring is not present, simply use the not in
operator:
text = "The light from the lighthouse lights the way."
print("darkness" not in text)
# Output: True
The in
operator returns a Boolean value (either True or False) indicating whether the substring was found. This makes it perfect for use in conditional statements within your code:
text = "The light from the lighthouse lights the way."
if "light" in text:
print("Found!")
While it might seem counterintuitive, Python always considers an empty string ""
to be a substring of any other string. So checking for an empty string within another string will always return True.
Performing Case-Insensitive Searches
It’s important to remember that Python strings are case-sensitive. This means that “hello” and “Hello” are considered different substrings.
If you need to perform a case-insensitive search, you can convert both the main string and the substring to lowercase before your check. Here’s how you would do this using the lower()
method:
text = "The Light From The Lighthouse Lights The Way."
substring = "light"
print(substring.lower() in text.lower())
# Output: True
Using the __contains__() Method
Under the hood, the in
operator uses the __contains__()
method of the string object. This method is called implicitly when you use the in
operator to check for a substring.
While you generally won’t call __contains__()
directly, as it is less readable than using the in
operator, it remains an option.
text = "The light from the lighthouse lights the way."
print(text.__contains__("light"))
# Output: True
Using the find() Method
The in
operator is suitable for determining if a substring exists within a string. However, if you need to know the location of the substring within the string, you can use the find()
method.
The find()
method scans the string and returns the starting index (position) of the first occurrence of the substring. If it can’t find the substring, it returns -1.
text = "The light from the lighthouse lights the way."
print(text.find("light"))
# Output: 4
print(text.find("darkness"))
# Output: -1
Limit the find() Search
The find(substring, start, end)
method allows you to narrow your search to a specific section of the string using the optional start
and end
parameters.
start
: This parameter indicates the index where the search should begin. If you need to find a substring starting from a position other than the beginning of the string, use thestart
parameter.end
: This parameter determines the index where the search should stop (this index is not included in the search itself). Use theend
parameter to limit the search to a specific range of characters within the string.
text = "The light from the lighthouse lights the way."
# Find "light" starting at position 20
print(text.find("light", 20))
# Output: 30
# Find "light" in between indices 10 and 30
print(text.find("light", 10, 30))
# Output: 19
Using the index() Method
Similar to the find()
method, the index()
method searches for the first occurrence of a substring within a string and returns its starting index.
However, the key difference is how it handles cases where the substring is not found. Instead of returning -1, the index()
method raises a ValueError
exception. This behavior can be useful if you explicitly want your code to signal an error when a substring you expect to be present is not found in the string.
text = "The light from the lighthouse lights the way."
print(text.index("light"))
# Output: 4
print(text.index("darkness"))
# ValueError: substring not found
Similar to find()
, the index(substring, start, end)
method lets you specify start
and end
indices to limit the search to a specific section of your string.
# Find "light" starting at position 20
text = "The light from the lighthouse lights the way."
print(text.index("light", 20))
# Output: 30
# Find "light" in between indices 10 and 30
text = "The light from the lighthouse lights the way."
print(text.index("light", 10, 30))
# Output: 19
Using the count() Method
If you are interested in how many times a substring appears, not just its presence or location, you can use the count()
method. It simply returns the number of occurrences of a substring in the given string.
text = "The light from the lighthouse lights the way."
print(text.count("light"))
# Output: 3
print(text.count("darkness"))
# Output: 0
Similar to find()
and index()
, the count()
method also allows you to refine your search using the optional start
and end
parameters.
# Count "light" starting at position 15
text = "The light from the lighthouse lights the way."
print(text.count("light", 15))
# Output: 2
# Count "light" in between indices 10 and 30
text = "The light from the lighthouse lights the way."
print(text.count("light", 10, 30))
# Output: 1
Using startswith() and endswith() Methods
The startswith()
and endswith()
methods are used to check if a string starts or ends with a specific substring, respectively.
text = "Hello, world!"
# Check if the text starts with 'Hello'
print(text.startswith("Hello"))
# Output: True
# Check if the text ends with 'world!'
print(text.endswith("world!"))
# Output: True
Although they are not designed to find substrings that might occur in the middle of your string, they can be useful in specific cases.
Checking for Multiple Substrings
When you need to determine if any of several substrings exist within a larger string, list comprehension provides an elegant and efficient solution when combined with the in
operator. Here’s how it works:
text = "The light from the lighthouse lights the way."
substrings = ["light", "way", "darkness"]
# Check which substrings are found in 'text'
found_substrings = [s for s in substrings if s in text]
print("Found substrings:", found_substrings)
# Output: Found substrings: ['light', 'way']
Using Regular Expressions (Advanced)
Regular expressions (or “regex”) offer a powerful way to perform complex pattern matching within text. Python’s re
module offers several functions for working with regular expressions.
One of the core functions in the re
module is search()
. This function scans your text for a match to the regular expression pattern you provide. If a match is found, it returns a “match object” containing details about the match; if no match is found, it returns None. This makes search()
convenient for use in conditional statements.
The example below uses a regular expression to check the string for any words starting with ‘light’ followed by one or more letters:
import re
text = "The light from the lighthouse lights the way."
pattern = r"light\w+" # Pattern for words starting with "light"
if re.search(pattern, text):
print("Found!")
else:
print("Not found.")
As mentioned earlier, the re.search()
function returns a special “match object”. This object holds valuable information about the match—the matched substring and its starting and ending index positions within the original string.
import re
text = "The light from the lighthouse lights the way."
pattern = r"light\w+"
print(re.search(pattern, text))
# Output: <re.Match object; span=(19, 29), match='lighthouse'>
You can access these details through the group()
and span()
methods on the “match object”:
import re
text = "The light from the lighthouse lights the way."
pattern = r"light\w+"
match = re.search(pattern, text)
# Get the matched substring
print(match.group()) # Output: lighthouse
# Get the starting and ending index positions
print(match.span()) # Output: (19, 29)
The re.search()
locates only the first match for your pattern within the string. If you need to find all occurrences of a pattern, you should use re.findall()
. This function scans the string and returns a list of all substrings that match your pattern.
import re
text = "The light from the lighthouse lights the way."
pattern = r"light\w+" # Pattern for words starting with "light"
print(re.findall(pattern, text))
# Output: ['lighthouse', 'lights']
While re.findall()
is a convenient way to extract all matching substrings, it only returns a list of the matches themselves. This means you’ve lost the index positions that you had access to when you were using re.search()
.
If you need index information for all matches, you should use re.finditer()
. This function returns an iterator that yields match objects for each occurrence of your pattern, containing the same information as re.search()
—the matched substring and its starting and ending index positions.
import re
text = "The light from the lighthouse lights the way."
pattern = r"light\w+"
for match in re.finditer(pattern, text):
print(match)
# Output:
# <re.Match object; span=(19, 29), match='lighthouse'>
# <re.Match object; span=(30, 36), match='lights'>
Using Pandas for Substring Searches in Tabular Data
The techniques we discussed earlier are powerful for working with unstructured plain text. However, when your data is organized in a tabular format, such as a CSV file or an Excel sheet, using pandas offers a more streamlined and efficient approach for finding substrings within columns.
Assume you have a CSV file named “employees.csv” with the following structure:
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston
When working with tabular data in Python, it’s best to first load it into a pandas DataFrame using functions like read_csv()
or read_excel()
:
import pandas as pd
employees = pd.read_csv("employees.csv")
To verify if the data has been properly loaded into the DataFrame, you can check the dimensions of your DataFrame with the shape
attribute or use head()
to get a preview of the first few rows.
print(employees.shape)
# Output: (3, 4)
print(employees.head())
# Output:
# name age job city
# 0 Bob 25 Manager Seattle
# 1 Sam 30 Developer New York
# 2 Amy 20 Developer Houston
Now, you can query the whole pandas column to filter for entries that contain a specific substring by using the .str.contains()
method on that column and passing the substring as an argument.
print(employees.job.str.contains("Developer"))
# Output:
# 0 False
# 1 True
# 2 True
# Name: job, dtype: bool
For more complex pattern matching, str.contains()
seamlessly integrates with regular expressions. Simply pass a valid regular expression pattern as the argument.
print(employees.job.str.contains(r"Dev\w+"))
# Output:
# 0 False
# 1 True
# 2 True
# Name: job, dtype: bool