10 Advanced Python Tricks for Data Scientists

Master cleaner, faster code with these essential techniques to supercharge your data workflows.



10 Advanced Python Tricks for Data Scientists
Image by Editor | Ideogram

 

As a data scientist, there is a good chance that you surely spend a lot of time writing Python code. While it is quite easy to learn and versatile (it can be used for almost everything!), knowing a few advanced tricks can boost your productivity and make your code cleaner and faster.

This article reviews 10 Python tricks every data professional should have in their toolbox. From simplifying iterations to automating workflows, these 10 tricks will help you write better Python code and improve your data science projects.

 

1. pandas_profiling: Generate a Detailed Report of Your Dataset

 
Having a first understanding of the data we are dealing with is vital. This is why we usually repeat the very same steps over and over again, with every new dataset we check. This is where pandas_profiling shines bright. This library generates an extensive data summary with a single line of code.

Why It Matters: It saves time and provides insights into the data’s structure, missing values, and correlations.

import pandas as pd
import pandas_profiling

# Load dataset
data = pd.read_csv('your-dataset.csv')

# Generate report
profile = pandas_profiling.ProfileReport(data)
profile.to_notebook_iframe()

 

This produces a comprehensive, interactive report covering statistics, distributions, and visualizations.

 

2. F-Strings: Fast and Clean String Formatting

 
Introduced in Python 3.6, f-strings provide a simple way to format strings by embedding variables directly.

Why It Matters: Cleaner, faster, and more readable than older methods like .format() or concatenation.

name = 'Alice'
age = 25
print(f'Hello, my name is {name} and I am {age} years old.')

 

3. Lambda Functions: Inline and Anonymous Functions

 
Lambda functions allow you to create small, single-use functions on the fly.

Why It Matters: Perfect for quick calculations and when combined with map(), filter(), or sorted().

points = [(1, 2), (3, 1), (5, -1)]
points.sort(key=lambda x: x[1])
print(points)

 

4. zip: Combine Multiple Lists Simultaneously

 
zip() aggregates elements from multiple lists, making it easy to iterate through them in parallel.

Why It Matters: Simplifies tasks where multiple lists need to be processed together.

list1 = ['Data', 'Machine', 'Deep']
list2 = ['Science', 'Learning', 'Learning']
list3 = ['Fundamentals', 'Models', 'Neural Networks']

for word1, word2, word3 in zip(list1, list2, list3):
    print(word1, word2, word3)

 

5. itertools: Advanced Iterations Made Easy

 
The Itertools library provides powerful tools for complex iteration tasks like permutations, combinations, and infinite loops.

Why It Matters: Essential for tasks requiring unique patterns or combinations of data.

from itertools import combinations
items = ['A', 'B', 'C']
print(list(combinations(items, 2)))

 

6. NumPy Broadcasting: Eliminate Explicit Loops

 
Broadcasting in NumPy allows you to perform operations between arrays of different shapes without writing loops.

Why It Matters: Improves efficiency for numerical operations on large arrays.

import numpy as np

arr = np.array([1, 2, 3])
print(arr + 10)

 

7. Matplotlib subplots: Organize Multiple Visualizations

 
subplots() allows you to display multiple plots in a structured grid within a single figure.
Why It Matters: Ideal for comparing visualizations side-by-side.

import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2)

axes[0, 0].plot([1, 2, 3], [1, 4, 9])  # Top-left
axes[0, 1].plot([1, 2, 3], [1, 3, 6])  # Top-right
axes[1, 0].plot([1, 2, 3], [3, 2, 1])  # Bottom-left
axes[1, 1].plot([1, 2, 3], [2, 2, 2])  # Bottom-right

plt.show()

 

8. apply(): Optimize DataFrame Transformations

 
The apply() function in pandas allows you to apply a function across rows or columns efficiently.

Why It Matters: Avoids manual loops when transforming large DataFrames.

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df['sum'] = df.apply(lambda row: row['a'] + row['b'], axis=1)
print(df)

 

9. map(): Fast and Clean Data Transformations

 
The map() function applies a specified function to each element of an iterable, like a list or tuple.

Why It Matters: It simplifies element-wise transformations, making your code cleaner and faster than manual loops.

numbers = [1, 2, 3, 4]
squares = list(map(lambda x: x**2, numbers))
print(squares)  # Output: [1, 4, 9, 16]

 

10. Scikit-learn Pipelines: Automate Your Workflow

 
Pipelines in Scikit-learn puts chain preprocessing steps and models into a single workflow, ensuring consistency during training and testing.

Why It Matters: Simplifies machine learning workflows and reduces code redundancy.

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Assuming `data` is the DataFrame shown above
# Drop non-numeric columns for the pipeline
X = data[['Temperature', 'CO2 Emissions', 'Sea Level Rise', 'Precipitation', 'Humidity', 'Wind Speed']]

# Create a target column for demonstration purposes
# Example: Classify if Temperature > 15 as 1, otherwise 0
data['High_Temperature'] = (data['Temperature'] > 15).astype(int)
y = data['High_Temperature']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline
pipeline = Pipeline([
   ('scaler', StandardScaler()),        # Scale the features
   ('classifier', LogisticRegression()) # Apply logistic regression
])

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Make predictions
predictions = pipeline.predict(X_test)

print("Predictions:", predictions)

 

Wrapping Up

 
These 10 advanced Python tricks can boost your efficiency and performance as a data scientist. So remember them next time you are working with your next dataset.

You can find all of the code in this accompanying notebook.
 
 

Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the data science field applied to human mobility. He is a part-time content creator focused on data science and technology. Josep writes on all things AI, covering the application of the ongoing explosion in the field.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!