5 Useful Python Scripts to Automate Boring Excel Tasks

Merging spreadsheets, cleaning exports, and splitting reports are necessary-but-boring tasks. These Python scripts handle the repetitive parts so you can focus on the actual work.



5 Useful Python Scripts to Automate Boring Excel Tasks
Image by Author

 

Introduction

 
Excel remains relevant for data work, but a significant portion of the time spent using it is purely mechanical. Tasks like combining files from multiple sources, tracking down duplicate records, reformatting inconsistent exports, and splitting a master sheet into separate files are not complex, but they are time-consuming and prone to human error.

These five Python scripts help automate those tasks. Each one is self-contained, configurable, and designed to work with messy real-world data.

You can find all the scripts on GitHub.

 

Merging Multiple Excel Files

 

// The Pain Point

When consolidating data from multiple Excel or comma-separated values (CSV) files, the manual process — opening each file, copying the data, and pasting into a master sheet — is slow and prone to misalignment errors, especially when column orders differ between files.

 

// What the Script Does

This script scans a folder for .xlsx and .csv files, stacks all their data into a single unified sheet, and writes a clean merged output file. It can optionally add a source column so you always know which row originated from which file, and it handles mismatched column orders automatically.

 

// How It Works

The script uses pandas to read every file in a target directory, aligns columns by name rather than position, and concatenates everything into one DataFrame. A configurable add_source_column flag appends the original filename to each row. Column mismatches are logged so you know if some files had extra or missing fields. The output is written with openpyxl and includes a summary tab showing file-by-file row counts.

 
Get the Excel files merger script

 

Finding and Flagging Duplicate Rows

 

// The Pain Point

Duplicate records are common in datasets that have been exported and re-imported across systems. Exact matches are easy to find, but near-duplicates — same record, slightly different formatting or spacing — are harder to catch manually at scale.

 

// What the Script Does

This script scans an Excel file for duplicate rows based on columns you define, flags exact duplicates and near-duplicates through fuzzy matches on string fields, and writes an annotated output file highlighting every suspected duplicate group with color coding and a confidence score.

 

// How It Works

The script uses pandas for exact duplicate detection and RapidFuzz for fuzzy string matching on configurable key columns. Each row is assigned a duplicate group ID and a match confidence percentage. The output Excel file uses openpyxl formatting to highlight duplicate clusters. A separate summary sheet shows total duplicates found, broken down by match type.

 
Get the duplicate finder script

 

Cleaning and Standardizing Messy Exported Data

 

// The Pain Point

Data exported from external systems often arrives inconsistently formatted with mixed date formats, inconsistent capitalization, phone numbers with varying separators, and trailing whitespaces. Cleaning this manually before any analysis adds up quickly.

 

// What the Script Does

This script applies a configurable set of cleaning rules to an Excel or CSV file. These include standardizing dates, trimming whitespace, fixing capitalization, normalizing phone numbers and postcodes, removing blank rows, and flagging cells that appear incorrect. It outputs a cleaned file and a change log showing exactly what was modified.
 

// How It Works

The script reads a configuration file that maps column names to cleaning operations: date_format, title_case, strip_whitespace, phone_normalize, remove_blank_rows, and others. Each operation is applied in sequence. A side-by-side change log is written to a second sheet in the output, showing original versus cleaned values for every modified cell. Nothing is silently discarded. If a value cannot be parsed, it is flagged in a _clean_errors column.

 
Get the data cleaner script

 

Splitting One Sheet into Separate Files by Column Value

 

// The Pain Point

A master dataset often needs to be distributed as separate files — such as one per region, department, or category. Doing this manually involves filtering, copying, and saving repeatedly, with a high risk of mixing up data between files.

 

// What the Script Does

This script reads a single Excel sheet and splits it into separate output files — one per unique value in a specified column. Each output file contains only the rows for that value, with the original formatting preserved. Filenames are generated automatically from the column values. Optionally, it can send each file as an email attachment using a name-to-email mapping you provide.

 

// How It Works

The script groups the DataFrame by the target column using pandas, then writes each group to its own .xlsx file using openpyxl. A naming template, like Sales_Report_{value}_{date}.xlsx, allows you to control the output filename format. Column headers, data types, and basic formatting are preserved in each output file. An optional email mode reads a CSV mapping of {value} → {email address} and sends each file via the Simple Mail Transfer Protocol (SMTP).

 
Get the sheet splitter script

 

Generating a Summary Pivot Report from Raw Data

 

// The Pain Point

Producing a summary report from raw data — totals by category, monthly trends, or top performers — involves building pivot tables, formatting them, and copying results to a presentable layout. When the source data updates regularly, this process is repeated from scratch each time.

 

// What the Script Does

This script reads a raw data Excel file, builds configurable pivot summaries, and writes a formatted multi-tab summary report. Charts are generated and embedded in the output file. You can re-run it any time the source data changes.

 

// How It Works

A configuration file defines the date field, the value field, grouping columns, and specific aggregations to run. The script uses pandas for all aggregation logic and openpyxl with Matplotlib for chart generation. Each summary type is given its own tab. Conditional formatting highlights the highest and lowest values. The report is designed for on-demand regeneration, and running the script again overwrites the previous output cleanly.

 
Get the pivot report generator script

 

Wrapping Up

 
These five scripts cover common Excel tasks that are straightforward to automate but tedious to perform manually. Choose whichever one addresses the most frequent task in your workflow and start there. Here is a quick overview:

 

Script Name Purpose Key Features Best Use Case
Excel Files Merger Combine multiple Excel/CSV files Column alignment, source tracking, summary sheet Consolidating data from multiple sources
Duplicate Finder Identify exact and fuzzy duplicates Fuzzy matching, confidence scores, color highlighting Cleaning datasets with repeated records
Data Cleaner Standardize messy exported data Formatting rules, normalization, change log Preprocessing raw external data
Sheet Splitter Split one sheet into multiple files Auto file naming, grouping, optional email sending Distributing reports by category/region
Pivot Report Generator Create summary reports from raw data Automated pivots, charts, multi-tab output Recurring reporting and dashboards

 

Happy automating!
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!