Maximize Your Productivity as a Data Scientist by Organizing
Read some tips on getting organized when it comes to working with data.
Kaleidico via Unsplash
Sometimes it feels like there are not enough hours in the day. Whilst you're in the process of completing one task, another request comes through the door. If you work with data, something that you hope would take you a couple of hours may turn into a couple of days.
Your 8-hour workday may turn into 9 hours, 10 hours, sometimes even more. You start to spend so much time at work, you start to neglect your personal life.
There are many articles out there, explaining how you can increase your productivity at work; however, the life of a Data Scientist is quite different. Organising yourself is going to save you a lot of time and energy, making your workday smoother; allowing you to finish on time.
We all fall victim to this; most days were doing so well and some days we get lost in so much work that we fall out of pattern and organising ourselves goes out the window. I always say to people, in regards to organising yourself; if it takes less than 10 minutes, do it now.
When it comes to organising yourself in a data-orientated environment, it can get overwhelming. There are so many files, notebooks, and documents. It can get messy real fast. Below are some tips to help you organise yourself when working with data.
Give Your Files Descriptive Names
This will save you a lot of time when you’re scoping through thousands and thousands of datasets, etc. Many times files get lost because team members don’t remember the file name; however, a quick search on your Mac or Windows can eliminate that problem and save you time if you give the files descriptive names.
Below are some points to take into consideration when giving your file names before projects have started or thinking of a new system.
File names need to be:
- Meaningful to you and your colleagues
- Easily accessible
Before starting a new project or rethinking your current system, it would be beneficial if your team agreed upon the following aspects:
- Vocabulary – ensuring everyone uses a common language
- Punctuation – punctuation symbols such as capitals, hyphens, and spaces need to be thought about thoroughly, and their usefulness.
- Dates – This will help you differentiate projects, allowing you to refer back to old projects. i.e. YYYY-MM-DD
- Order - discuss and come to a conclusion on which element should go first so that files can be found easily. For example, ordering file names by Date, Project Code, and Client.
Pouring all your datasets, notebooks, and outputs in one place can become very messy quickly. Creating folders for specific projects, which contain sub-folders will help you differentiate the difference and also easily find files, etc. Explaining this process to the rest of the team, ensuring everybody uses the same system will resolve issues such as missing datasets, notebooks, etc.
If team members want to reflect on an old project, they will be able to comfortably do that because the projects will be divided up, with descriptive file names. They will not need to consult other team members about the location of these files if everybody adheres to the procedure.
As a Data Scientist, you work with a lot of datasets every day, which may differ from one another. You may be going through a project and find yourself struggling with a similar issue that you had in the past with another project; however, you don’t remember if the issue was the same or the solution.
Document, document, document. Our brains are amazing, but we are not guaranteed to remember everything. However, if we remember to write everything down; at least we have something we can refer to.
It is a good practice to document your data at the beginning of each project; this could include research or issues that may affect the project. Continuing to add information as the project progresses is important to understand the problems and solutions, along with knowing what not to do next time. This is an important element to improve the workflow of an organisation.
There are several ways you can add documentation to your data:
- Embedded documentation - This is when one document, normally structured as a text file, or a binary is embedded within another.
- Supporting documentation - This is information that is in separate files that accompanies data. It provides context, explanation, or instructions on the use of the data.
Sigmund via Unsplash
Once these elements of your Data Science project workflow are fixed and adhered to by all the team members, you will start to see a change in your productivity; allowing you to focus on other pressing issues. These can be data wrangling, fixing bugs, and more.
Nisha Arya is a Data Scientist and freelance Technical writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.