Photo by Hello I’m Nik on Unsplash
Python is a great language. It is relatively easy to learn and has an intuitive syntax. The rich selection of libraries also contribute to the popularity and success of Python.
However, it is not just about the third party libraries. Base Python also provides numerous methods and functions to expedite and ease the typical tasks in data science.
In this article, we will go over 15 built-in string methods in Python. You might already be familiar with some of them but we will also see some of the rare ones.
The methods are quite self-explanatory so I will focus more on the examples to demonstrate how to use them rather than explaining what they do.
1. Capitalize
It makes the first letter uppercase.
txt = "python is awesome!"
txt.capitalize()
'Python is awesome!'
2. Upper
It makes all the letters uppercase.
txt = "Python is awesome!"
txt.upper()
'PYTHON IS AWESOME!'
3. Lower
It makes all the letters lowercase.
txt = "PYTHON IS AWESOME!"
txt.lower()
'python is awesome!'
4. Isupper
It checks if all the letters are uppercase.
txt = "PYTHON IS AWESOME!"
txt.isupper()
True
5. Islower
It checks if all the letters are lowercase
txt = "PYTHON IS AWESOME!"
txt.islower()
False
The following 3 methods are similar so I will do examples that include all of them.
6. Isnumeric
It checks if all the characters are numeric.
7. Isalpha
It checks if all the characters are in the alphabet.
8. Isalnum
It checks if all the characters are alphanumeric (i.e. letter or number).
# Example 1
txt = "Python"
print(txt.isnumeric())
False
print(txt.isalpha())
True
print(txt.isalnum())
True
# Example 2
txt = "2021"
print(txt.isnumeric())
True
print(txt.isalpha())
False
print(txt.isalnum())
True
# Example 3
txt = "Python2021"
print(txt.isnumeric())
False
print(txt.isalpha())
False
print(txt.isalnum())
True
# Example 4
txt = "Python-2021"
print(txt.isnumeric())
False
print(txt.isalpha())
False
print(txt.isalnum())
False
9. Count
It counts the number of occurrences of the given character in a string.
txt = "Data science"
txt.count("e")
2
10. Find
It returns the index of the first occurrence of the given character in a string.
txt = "Data science"
txt.find("a")
1
We can also find the second or other occurrences of a character.
txt.find("a", 2)
3
If we pass a sequence of characters, the find method returns the index where the sequence starts.
txt.find("sci")
5
11. Startswith
It checks if a string starts with the given character. We can use this method as a filter in a list comprehension.
mylist = ["John", "Jane", "Emily", "Jack", "Ashley"]
j_list = [name for name in mylist if name.startswith("J")]
j_list
['John', 'Jane', 'Jack']
12. Endswith
It checks if a string ends with the given character.
txt = "Python"
txt.endswith("n")
True
Both the endswith and startswith methods are case sensitive.
txt = "Python"
txt.startswith("p")
False
txt.startswith("P")
True
13. Replace
It replaces a string or a part of it with the given set of characters.
txt = "Python is awesome!"
txt = txt.replace("Python", "Data science")
txt
'Data science is awesome!'
14. Split
It splits a string at the occurrences of the specified character and returns a list that contains each part after splitting.
txt = 'Data science is awesome!'
txt.split()
['Data', 'science', 'is', 'awesome!']
By default, it splits at whitespace but we can make it based on any character or set of characters.
15. Partition
It partitions a string into 3 parts and returns a tuple that contains these parts.
txt = "Python is awesome!"
txt.partition("is")
('Python ', 'is', ' awesome!')
txt = "Python is awesome and it is easy to learn."
txt.partition("and")
('Python is awesome ', 'and', ' it is easy to learn.')
The partition method returns exactly 3 parts. If there are multiple occurrences of the character used for partitioning, the first one is taken into account.
txt = "Python and data science and machine learning"
txt.partition("and")
('Python ', 'and', ' data science and machine learning')
We can also do a similar operation with the split method by limiting the number of splits. However, there are some differences.
- The split method returns a list
- The returned list does not include the characters used for splitting
txt = "Python and data science and machine learning"
txt.split("and", 1)
['Python ', ' data science and machine learning']
Bonus
Thanks Matheus Ferreira for reminding me one of the greatest strings methods: join. I also use the join method but I forgot to add it here. It deserves to get in the list as a bonus.
The join method combines the strings in a collection into a single string.
mylist = ["Jane", "John", "Matt", "James"]
"-".join(mylist)
'Jane-John-Matt-James'
Let’s do an example with a tuple as well.
mytuple = ("Data science", "Machine learning")" and ".join(mytuple)'Data science and Machine learning'
Conclusion
When performing data science, we deal with textual data a lot. Moreover, the textual data requires much more preprocessing than plain numbers. Thankfully, Python’s built-in string methods are capable of performing such tasks efficiently and smoothly.
Thank you for reading. Please let me know if you have any feedback.
Bio: Soner Yıldırım is a Junior Data Scientist at Invent Analytics and blogger.
Original. Reposted with permission.
Related:
- 10 Python String Processing Tips & Tricks
- Text Data Preprocessing: A Walkthrough in Python
- A General Approach to Preprocessing Text Data