How to Check if a Python String Contains a Substring

Checking if a Python string contains a substring is a common task in Python programming. Knowing how to do this effectively opens up possibilities for various text manipulation and decision-making within your code. Python offers several methods for substring checks, each with its own strengths:

  • Use the in operator for a straightforward and readable approach.
  • Use the __contains__() method for a direct but less common approach.
  • Use find() or index() methods to get the position of the substring.
  • Use the count() method to determine the number of occurrences of a substring.
  • Use startswith() and endswith() methods to check if a string starts or ends with a substring.
  • Use list comprehension for checking multiple substrings simultaneously.
  • Use regular expressions for complex pattern matching searches.
  • Use Pandas for substring searches in tabular data (like CSV or Excel files).

Let’s explore each of these methods in more detail!

The most Pythonic and recommended way to check if a string contains a specific substring is to use Python’s membership operator, in. This operator returns True if the substring exists in the string and False otherwise. This approach is ideal for situations where you simply need to confirm the presence or absence of a substring.

text = "The light from the lighthouse lights the way."

print("light" in text)
# Output: True

To check if a substring is not present, simply use the not in operator:

text = "The light from the lighthouse lights the way."

print("darkness" not in text)
# Output: True

The in operator returns a Boolean value (either True or False) indicating whether the substring was found. This makes it perfect for use in conditional statements within your code:

text = "The light from the lighthouse lights the way."

if "light" in text:
   print("Found!")

While it might seem counterintuitive, Python always considers an empty string "" to be a substring of any other string. So checking for an empty string within another string will always return True.

Performing Case-Insensitive Searches

It’s important to remember that Python strings are case-sensitive. This means that “hello” and “Hello” are considered different substrings.

If you need to perform a case-insensitive search, you can convert both the main string and the substring to lowercase before your check. Here’s how you would do this using the lower() method:

text = "The Light From The Lighthouse Lights The Way."
substring = "light"

print(substring.lower() in text.lower())
# Output: True

Using the __contains__() Method

Under the hood, the in operator uses the __contains__() method of the string object. This method is called implicitly when you use the in operator to check for a substring.

While you generally won’t call __contains__() directly, as it is less readable than using the in operator, it remains an option.

text = "The light from the lighthouse lights the way."

print(text.__contains__("light"))
# Output: True

Using the find() Method

The in operator is suitable for determining if a substring exists within a string. However, if you need to know the location of the substring within the string, you can use the find() method.

The find() method scans the string and returns the starting index (position) of the first occurrence of the substring. If it can’t find the substring, it returns -1.

text = "The light from the lighthouse lights the way."

print(text.find("light"))
# Output: 4

print(text.find("darkness"))
# Output: -1

The find(substring, start, end) method allows you to narrow your search to a specific section of the string using the optional start and end parameters.

  • start: This parameter indicates the index where the search should begin. If you need to find a substring starting from a position other than the beginning of the string, use the start parameter.
  • end: This parameter determines the index where the search should stop (this index is not included in the search itself). Use the end parameter to limit the search to a specific range of characters within the string.
text = "The light from the lighthouse lights the way."

# Find "light" starting at position 20 
print(text.find("light", 20))
# Output: 30

# Find "light" in between indices 10 and 30
print(text.find("light", 10, 30))
# Output: 19

Using the index() Method

Similar to the find() method, the index() method searches for the first occurrence of a substring within a string and returns its starting index.

However, the key difference is how it handles cases where the substring is not found. Instead of returning -1, the index() method raises a ValueError exception. This behavior can be useful if you explicitly want your code to signal an error when a substring you expect to be present is not found in the string.

text = "The light from the lighthouse lights the way."

print(text.index("light"))
# Output: 4

print(text.index("darkness"))
# ValueError: substring not found

Similar to find(), the index(substring, start, end) method lets you specify start and end indices to limit the search to a specific section of your string.

# Find "light" starting at position 20
text = "The light from the lighthouse lights the way."
print(text.index("light", 20))
# Output: 30

# Find "light" in between indices 10 and 30
text = "The light from the lighthouse lights the way."
print(text.index("light", 10, 30))
# Output: 19

Using the count() Method

If you are interested in how many times a substring appears, not just its presence or location, you can use the count() method. It simply returns the number of occurrences of a substring in the given string.

text = "The light from the lighthouse lights the way."

print(text.count("light"))
# Output: 3

print(text.count("darkness"))
# Output: 0

Similar to find() and index(), the count() method also allows you to refine your search using the optional start and end parameters.

# Count "light" starting at position 15
text = "The light from the lighthouse lights the way."
print(text.count("light", 15))
# Output: 2

# Count "light" in between indices 10 and 30
text = "The light from the lighthouse lights the way."
print(text.count("light", 10, 30))
# Output: 1

Using startswith() and endswith() Methods

The startswith() and endswith() methods are used to check if a string starts or ends with a specific substring, respectively.

text = "Hello, world!"

# Check if the text starts with 'Hello'
print(text.startswith("Hello"))
# Output: True

# Check if the text ends with 'world!'
print(text.endswith("world!"))
# Output: True

Although they are not designed to find substrings that might occur in the middle of your string, they can be useful in specific cases.

Checking for Multiple Substrings

When you need to determine if any of several substrings exist within a larger string, list comprehension provides an elegant and efficient solution when combined with the in operator. Here’s how it works:

text = "The light from the lighthouse lights the way."
substrings = ["light", "way", "darkness"]

# Check which substrings are found in 'text'
found_substrings = [s for s in substrings if s in text]

print("Found substrings:", found_substrings)  
# Output: Found substrings: ['light', 'way']

Using Regular Expressions (Advanced)

Regular expressions (or “regex”) offer a powerful way to perform complex pattern matching within text. Python’s re module offers several functions for working with regular expressions.

One of the core functions in the re module is search(). This function scans your text for a match to the regular expression pattern you provide. If a match is found, it returns a “match object” containing details about the match; if no match is found, it returns None. This makes search() convenient for use in conditional statements.

The example below uses a regular expression to check the string for any words starting with ‘light’ followed by one or more letters:

import re

text = "The light from the lighthouse lights the way."
pattern = r"light\w+"  # Pattern for words starting with "light"

if re.search(pattern, text):
    print("Found!")
else:
    print("Not found.")

As mentioned earlier, the re.search() function returns a special “match object”. This object holds valuable information about the match—the matched substring and its starting and ending index positions within the original string.

import re

text = "The light from the lighthouse lights the way."
pattern = r"light\w+"

print(re.search(pattern, text)) 
# Output: <re.Match object; span=(19, 29), match='lighthouse'>

You can access these details through the group() and span() methods on the “match object”:

import re

text = "The light from the lighthouse lights the way."
pattern = r"light\w+"
match = re.search(pattern, text)

# Get the matched substring
print(match.group())         # Output: lighthouse

# Get the starting and ending index positions
print(match.span())          # Output: (19, 29)

The re.search() locates only the first match for your pattern within the string. If you need to find all occurrences of a pattern, you should use re.findall(). This function scans the string and returns a list of all substrings that match your pattern.

import re

text = "The light from the lighthouse lights the way."
pattern = r"light\w+"  # Pattern for words starting with "light"

print(re.findall(pattern, text)) 
# Output: ['lighthouse', 'lights']

While re.findall() is a convenient way to extract all matching substrings, it only returns a list of the matches themselves. This means you’ve lost the index positions that you had access to when you were using re.search().

If you need index information for all matches, you should use re.finditer(). This function returns an iterator that yields match objects for each occurrence of your pattern, containing the same information as re.search()—the matched substring and its starting and ending index positions.

import re

text = "The light from the lighthouse lights the way."
pattern = r"light\w+"

for match in re.finditer(pattern, text):
   print(match)

# Output:
# <re.Match object; span=(19, 29), match='lighthouse'>
# <re.Match object; span=(30, 36), match='lights'>

Using Pandas for Substring Searches in Tabular Data

The techniques we discussed earlier are powerful for working with unstructured plain text. However, when your data is organized in a tabular format, such as a CSV file or an Excel sheet, using pandas offers a more streamlined and efficient approach for finding substrings within columns.

Assume you have a CSV file named “employees.csv” with the following structure:

employees.csv
name,age,job,city
Bob,25,Manager,Seattle
Sam,30,Developer,New York
Amy,20,Developer,Houston

When working with tabular data in Python, it’s best to first load it into a pandas DataFrame using functions like read_csv() or read_excel():

import pandas as pd
employees = pd.read_csv("employees.csv")

To verify if the data has been properly loaded into the DataFrame, you can check the dimensions of your DataFrame with the shape attribute or use head() to get a preview of the first few rows.

print(employees.shape)
# Output: (3, 4)

print(employees.head())
# Output:
#   name  age        job      city
# 0  Bob   25    Manager   Seattle
# 1  Sam   30  Developer  New York
# 2  Amy   20  Developer   Houston

Now, you can query the whole pandas column to filter for entries that contain a specific substring by using the .str.contains() method on that column and passing the substring as an argument.

print(employees.job.str.contains("Developer"))
# Output:
# 0    False
# 1     True
# 2     True
# Name: job, dtype: bool

For more complex pattern matching, str.contains() seamlessly integrates with regular expressions. Simply pass a valid regular expression pattern as the argument.

print(employees.job.str.contains(r"Dev\w+"))
# Output:
# 0    False
# 1     True
# 2     True
# Name: job, dtype: bool