Overcoming the Simplicity Illusion with Data Migration
What’s the key to a smooth data migration experience? It comes down to this primary issue: whether or not you can rapidly determine your dataset composition.
By Yancy Blum, Senior Systems Engineer, Datadobi
Here’s a common conundrum for IT staff: being tasked with solving a data migration challenge with no idea where to begin. This can occur when IT administrators charged with file storage find themselves having to perform migrations that on the surface appear easy. Yet they discover later that it’s much harder than it looked at first glance.
There are countless potential migration pitfalls, including everything involved with moving complicated metadata to suddenly realizing that multiple protocols are necessary. What’s the key to dealing with such troubles to ensure that you have a smooth, rather than disastrous, migration experience? It comes down to this primary issue: whether or not you can rapidly determine your dataset composition.
Image source: Technology Advice
Toolset Upgrades
The need for file migration obviously isn’t new, as companies have needed the ability to copy over their data for decades. But what has changed is the reliability and capabilities of data migration toolsets. Legacy tools have historically required a large outlay of time from IT staff who needed to monitor them constantly. Even then, there was no guarantee of a successful migration, where the intended data indeed is accurately relocated from one system to another. Outdated tools offer no insight or analysis on the data landscape of the source system.
Both data and metadata are expanding exponentially, which means that IT needs a solution to quickly illuminate:
- What’s being moved
- How long it will take
- If/how file integrity will be affected while migrating
Yet file migration tools have not traditionally been purpose-built to handle file migration specifically, which has left organizations struggling without a true enterprise migration solution. To be classified as such, a tool must address both spatial acuity and archiving issues. It isn’t easy for even a veteran IT administrator to get an accurate read on file relevancy or untangle which storage system houses a particular dataset—much less get to the bottom of whether those datasets are still in use.
Spatial Acuity and Spring Cleaning
In terms of spatial acuity, to plan a seamless migration, IT needs the ability to visualize not only layout complexity, but also files and filesystem characteristics. And when it comes to data “spring cleaning,” it’s hard for even seasoned experts to sort out what’s important from what’s become irrelevant in the morass of files, some of which are critical and others that have faded into obscurity. Particularly when you’re managing massive datasets, it’s rarely a no-brainer to know—without the proper tool to help you see it—when it’s time to retire certain files to archive, tape, or Amazon Glacier. If a file hasn’t been modified in years, then it’s probably time for a file catharsis.
A reporting module can help with both spatial acuity and spring-cleaning issues, since graphs and charts can quickly and effectively convey complex data. Such visuals also allow for a fast peek at data based on the last time a file was modified and other attributes. For example, if you’re able to see such information during a data migration, then you can relocate older untouched files to an appropriate storage tier.
Using a Reporting Module to Determine File Age
Any large-scale migration can be brought to its knees by small files—classified as less than 128KB—en masse. I like to think of this situation as “death by a thousand paper cuts,” since small files gobble considerably more resources than larger files of 256KB or more. The result of the former is never getting to hit optimum throughput of your infrastructure.
Here’s why: regardless of file size, there are some inalterable truths. Consider the overhead associated with the network protocol during any network file transfer. While it takes less time to migrate a small file than a larger one, small files negatively impact a migration in another important way: they fail to achieve optimal throughput or speed.
When you consider the millions or even billions of times this process repeats, it becomes clear how much less effective this is compared to moving larger files. As one simple example, just ponder the difference between moving a 1KB file to a 1GB file. To equal 1GB of file size, the 1KB files must make a million trips. Next factor in the variables of overhead and time spent on this process. Since the 1GB file achieves optimal throughput but the 1KB file doesn’t, the cost-benefit of moving larger files rather than many smaller ones becomes clear. This gives you an idea of how smart, purpose-built data migration tools can help administrators tackle multiple challenges to greatly simplify planning and execution of a migration.
Bio: Yancy Blum is a Senior Pre-Sales and Systems Engineer at Datadobi. As the Senior Pre-Sales and Systems Engineer for Datadobi, Yancy Blum collaborates with the Datadobi sales support and account teams, serving as a technical expert for customer presentations; articulating the value and identifying opportunities within strategic partner and customer accounts. In addition, Blum helps respond to requests for information or requests for proposals (RFPs) from customers, supplying the technical details of proposed solutions. Prior to Datadobi, Blum served as a Senior System Engineer with Trintri, Pre-Sales/SE with Eastern Computer Exchange, Technical Consultant with CDI LLC, and Customer Engineer with EMC. He holds a Bachelor of Science (B.S.) Degree in Electrical and Electronics Engineering from DeVry University.
Related:
- Feature Store as a Foundation for Machine Learning
- The Best Tool for Data Blending is KNIME
- Data Scientist, Data Engineer & Other Data Careers, Explained
- Overcoming the Racial Bias in AI
- Linear Regression Model Selection: Balancing Simplicity and Complexity
- 11 Best Practices of Cloud and Data Migration to AWS Cloud
- High-Fidelity Synthetic Data for Data Engineers and Data Scientists Alike
- AIRSIDE LIVE Is Where Big Data, Data Security and Data Governance Converge
- Data Scientist, Data Engineer & Other Data Careers, Explained
Latest News
- KDnuggets News, May 31: Bard for Data Science Cheat She...
- Go from Engineer to ML Engineer with Declarative ML
- Solving 5 Complex SQL Problems: Tricky Queries Explained
- KDnuggets Top Posts for March 2023: AutoGPT: Everything...
- The Top AutoML Frameworks You Should Consider in 2023
- How Hard is it to Get into FAANG Companies
Top Posts Last Week |
---|
|
More Recent Posts
- How Hard is it to Get into FAANG Companies
- Deep Learning with R
- LLM Apocalypse Now: Revenge of the Open Source Clones
- Integrating ChatGPT Into Data Science Workflows: Tips and Best...
- The Role of Open Source Tools in Accelerating Data Science Pro...
- Bard for Data Science Cheat Sheet
- 4 Career Lessons That Helped Me Navigate the Difficult Job Market
- Data Analytics Tools You Need To Know in 2023
- Top 10 Tools for Detecting ChatGPT, GPT-4, Bard, and Claude
- Introducing MPT-7B: A New Open-Source LLM
Related Posts
- Top November Stories: Top Python Libraries for Data Science, Data…
- Data Scientist vs Data Analyst vs Data Engineer
- Data science is not about data - applying Dijkstra principle to data…
- AWS Webinar: How are data-driven companies using ESG and sustainability…
- How To Use Synthetic Data To Overcome Data Shortages For Machine Learning…
- PASS Data Community Summit – Free Online Conference for Data Professionals
Get The Latest News!