How to Ace Data Science Assessment Test by Using Automatic EDA Tools

By using a few lines of code, you can understand key aspects of a given dataset. These tools have helped me answer business-related questions during the data assessment test by Alooba.



data_science_assessment_test
Image by author | Canva Pro

 

Generally, the assessment test is divided into five parts; statistics, business analytics, coding, SQL, and hands-on data analytics. You will be given a dataset and 20 minutes to answer three to four business-related questions. Even if you are an expert in data analysis, you cannot do data ingestion, data analytics, and reporting in a limited time. For that, you need super tools that can automate the data analysis part so that you can focus on answering the questions from a business case study.

 

ace_data_science_assessment_test_automatic_eda_tools
Image by author

 

In this blog, we are learning about Auto-EDA tools to assist us in passing the data analytics part. We will be learning about Deepnote, Autoviz, Pandas profiling, and Sweetviz. These tools require a few lines of code to display critical information about data.

 

Deepnote

 
Deepnote is a free cloud data science notebook that supports multiple third-parties integrations and programming languages. Recently, the platform has introduced a new way to display Pandas DataFrame. The New Year’s resolution dataset is available at Kaggle and under CC BY-SA 4.0 license. The data was gathered through a survey asking respondents about their New Year’s resolution of 2022.

As we can observe, the data frame displays the distribution of categorical and numerical features. It exhibits the min-max range of specific features and the percentage of missing values.

import pandas as pd
data = pd.read_csv("nyr_data.csv")
data


 

XXXXX

 

We can also use the Filter option to display a specific category or a value. Finally, the Visualize option will help us create simple data visualization without writing a single line of code.

 

deepnote analytics
Image by author

 

The Visualize option will ask for the Pandas DataFrame, type of chart, X-axis, Y-axis, and Color to display an interactive visualization.

 

deepnote plot
Image by author

 

We have used visualizing of the Pandas DataFrame to display the distribution of gender in the survey using a bar chart.

 

XXXXX

 

Deepnote is always my first line of defense against time constraints projects and assessment tests. I can easily produce a simple data report within 20 minutes. Deepnote has also helped me in passing multiple data science and machine learning assessment tests. If you are interested in my Deepnote projects check out my profile.

 

AutoViz

 

Autoviz is used for creating automatic visualization of the tabular data using a single line of code. It displays all combinations of charts based on the feature type. If the feature text, it will display word cloud, and if it is a category, it will display the combination of bar charts. Autoviz comes with four options; SVG, Bokeh, Server, and HTML.

  • SVG/PNG/JPG: matplotlib plots are generated, which can be stored locally or displayed in a Jupyter notebook.
  • Bokeh: Interactive charts are generated within Jupyter notebooks.
  • Server: launch a browser-based dashboard containing all the charts.
  • HTML: silently create bokeh charts and save HTML files locally.

With a few lines of code we can produce; detailed information about the dataset, pairwise scatter plot, distplot, boxplot, probability plot, histogram, violin plot, heatmap, and bar chart for each category. Autoviz saved us half an hour of coding and reporting results.

 

from autoviz.AutoViz_Class import AutoViz_Class
%matplotlib inline
AV = AutoViz_Class()
df = AV.AutoViz("nyr_data.csv")


 

XXXXX

 

Pandas Profiling

 
Pandas-profiling generates detailed data reports using Pandas DataFrame. The report consists of variable types, the shape of data, unique values, histograms, statistical analysis, text analysis, and missing values.

The image below shows the summary of data profiles. The data summary also includes alerts highlighting the highly correlated variables and the frequency of missing values in a particular variable.

 

XXXXX
Image by author

 

The final report consists of comprehensive information about variable distributions, correlations matrix, missing values, and samples. This information is enough for you to answer 60% of assessment test questions.

 

from pandas_profiling import ProfileReport
profile = ProfileReport(data, title="Pandas Profiling Report")
profile


 

XXXXX

 

SweetViz

 
Sweetviz is an open-source Python library that creates high-definition visualization to support your exploratory data analysis. The user inference is interactive and easy to navigate. With a single line of code, you can produce a professional data analytical report.

The report includes; the shape of the dataset, types of features, correlation, missing values, and distribution using bar charts. It is similar to Pandas Profiling but is much cleaner and easy to navigate.

 

sweetviz
Image by author

 

SweetViz uses Pandas DataFrame and generates beautiful HTML-based data reports. We can save the HTML reports locally or run them directly in a Jupyter notebook using my_report.show_notebook().

import sweetviz as sv

my_report = sv.analyze(data)
my_report.show_notebook() ## Or use show_html


 

XXXXX

 

Conclusion

 
The tools we have disused are great for generating exploratory data analysis reports. In the end, it comes down to your understanding of the subject matter and your experience in the data science sector. The assessment questions are usually around a business case study, for example; historical sales data of a digital camera company. These tools can assist you in understanding the dataset but without subject matter expertise, it will be hard for you to answer the questions.

In this blog, we have learned about Deepnote, AutoViz, Pandas Profiling, and SweetViz. The automatic exploratory data analysis tools can help you understand the problem statement faster with a few lines of code.

 
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.