Top Python Data Science Interview Questions

Six must-know technical concepts and two types of questions to test them.



Figure
Photo by JESHOOTS.COM on Unsplash

 

If you want to have a career in data science, knowing Python is a must. Python is the most popular programming language in data science, especially when it comes to machine learning and artificial intelligence.

To help you in your data science career, I’ve prepared the main Python concepts tested in the data science interview. Later on, I will discuss two main interview question types that cover those concepts you’re required to know as a data scientist. I’ll also show you several example questions and give you solutions to push you in the right direction.

 

Technical Concepts of Python Interview Questions

 
This guide is not company-specific. So if you have some data science interviews lined up, I strongly advise you to use this guide as a starting point of what might come up in the interview. Additionally, you should also try to find some company-specific questions and try to solve them too. Knowing general concepts and practicing them on real-life questions is a winning combination.

I’ll not bother you with theoretical questions. They can come up in the interview, but they too cover the technical concepts found in the coding questions. After all, if you know how to use the concepts I’ll be talking about, you probably know to explain them too.

Technical Python concepts tested in the data science job interviews are:

  1. Data types
  2. Built-in data structures
  3. User-defined data structures
  4. Built-in functions
  5. Loops and conditionals
  6. External libraries (Pandas)

 

1. Data Types

 
Data types are the concept you should be familiar with. This means you should know the most commonly used data types in Python, the difference between them, when and how to use them. Those are data-types such as integers (int), floats (float), complex (complex), strings (str), booleans (bool), null values (None).

 

2. Built-in Data Structures

 
These are list, dictionary, tuple, and sets. Knowing these four built-in data structures will help you organize and store data in a way that will allow easier access and modifications.

 

3. User-defined Data Structures

 
On top of using the built-in data structures, you should also be able to define and use some of the user-defined data structures. These are arrays, stack, queue, trees, linked lists, graphs, HashMaps.

 

4. Built-in Functions

 
Python has over 60 built-in functions. You don’t need to know them all while, of course, it’s better to know as many as possible. The built-in functions you can’t avoid are abs(), isinstance(), len(), list(), min(), max(), pow(), range(), round(), split(), sorted(), type().

 

5. Loops and Conditionals

 
Loops are used in repetitive tasks when they perform one piece of code over and over again. They do that until the conditionals (true/false tests) tell them to stop.

 

6. External Libraries (Pandas)

 
While there are several external libraries used, Pandas is probably the most popular. It is designed for practical data analysis in finance, social sciences, statistics, and engineering.

 

Python Interview Types of Questions

 
All those six technical concepts are mainly tested by only two types of interview questions. Those are:

  1. Data manipulation and analysis
  2. Algorithms

Let’s have a closer look at each of them.

 

1. Data Manipulation and Analysis

 
These questions are designed to test the above technical concept by solving the ETL (extracting, transforming, and loading data) problems and performing some data analysis.

Here’s one such example from Facebook:

QUESTION: Facebook sends SMS texts when users attempt to 2FA (2-factor authenticate) into the platform to log in. In order to successfully 2FA they must confirm they received the SMS text message. Confirmation texts are only valid on the date they were sent. Unfortunately, there was an ETL problem with the database where friend requests and invalid confirmation records were inserted into the logs, which are stored in the 'fb_sms_sends' table. These message types should not be in the table. Fortunately, the 'fb_confirmers' table contains valid confirmation records so you can use this table to identify SMS text messages that were confirmed by the user.

Calculate the percentage of confirmed SMS texts for August 4, 2020.

ANSWER: 

import pandas as pd

import numpy as np

df = fb_sms_sends[["ds","type","phone_number"]]

df1 = df[df["type"].isin(['confirmation','friend_request']) == False]

df1_grouped = df1.groupby('ds')['phone_number'].count().reset_index(name='count')

df1_grouped_0804 = df1_grouped[df1_grouped['ds']=='08-04-2020']

df2 = fb_confirmers[["date","phone_number"]]

df3 = pd.merge(df1,df2, how ='left',left_on =["phone_number","ds"], right_on = ["phone_number","date"])

df3_grouped = df3.groupby('date')['phone_number'].count().reset_index(name='confirmed_count')

df3_grouped_0804 = df3_grouped[df3_grouped['date']=='08-04-2020']

result = (float(df3_grouped_0804['confirmed_count'])/df1_grouped_0804['count'])*100


One of the questions asked to test your data analysis skills is this one from Dropbox:

QUESTION: Write a query that calculates the difference between the highest salaries found in the marketing and engineering departments. Output just the difference in salaries.

ANSWER: 

import pandas as pd

import numpy as np

df = pd.merge(db_employee, db_dept, how = 'left',left_on = ['department_id'], right_on=['id'])

df1=df[df["department"]=='engineering']

df_eng = df1.groupby('department')['salary'].max().reset_index(name='eng_salary')

df2=df[df["department"]=='marketing']

df_mkt = df2.groupby('department')['salary'].max().reset_index(name='mkt_salary')

result = pd.DataFrame(df_mkt['mkt_salary'] - df_eng['eng_salary'])

result.columns = ['salary_difference']

result


 

2. Algorithms

 
When it comes to Python algorithm interview questions, they test your problem-solving using the algorithms. Since algorithms are not limited to only one programming language, these questions test your logic and thinking, as well as coding in Python.

For example, you could get this question:

QUESTION: Given a string containing digits from 2-9 inclusive, return all possible letter combinations that the number could represent. Return the answer in any order.

A mapping of digit to letters (just like on the telephone buttons) is given below. Note that 1 does not map to any letters.

ANSWER:

class Solution:

    def letterCombinations(self, digits: str) -> List[str]:

        # If the input is empty, immediately return an empty answer array

        if len(digits) == 0: 

            return []

        

        # Map all the digits to their corresponding letters

        letters = {"2": "abc", "3": "def", "4": "ghi", "5": "jkl", 

                   "6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz"}

        

        def backtrack(index, path):

            # If the path is the same length as digits, we have a complete combination

            if len(path) == len(digits):

                combinations.append("".join(path))

                return # Backtrack            

            # Get the letters that the current digit maps to, and loop through them

            possible_letters = letters[digits[index]]

            for letter in possible_letters:

                # Add the letter to our current path

                path.append(letter)

                # Move on to the next digit

                backtrack(index + 1, path)

                # Backtrack by removing the letter before moving onto the next

                path.pop()

        # Initiate backtracking with an empty path and starting index of 0

        combinations = []

        backtrack(0, [])

        return combinations


Or it could get even more difficult with the following question:

QUESTION: “Write a program to solve a Sudoku puzzle by filling the empty cells. A sudoku solution must satisfy all of the following rules:

  1. Each of the digits 1-9 must occur exactly once in each row.
  2. Each of the digits 1-9 must occur exactly once in each column.
  3. Each of the digits 1-9 must occur exactly once in each of the 9 3x3 sub-boxes of the grid.

The '.' character indicates empty cells.”

ANSWER:

from collections import defaultdict

class Solution:

    def solveSudoku(self, board):

        """

        :type board: List[List[str]]

        :rtype: void Do not return anything, modify board in-place instead.

        """

        def could_place(d, row, col):

            """

            Check if one could place a number d in (row, col) cell

            """

            return not (d in rows[row] or d in columns[col] or \

                    d in boxes[box_index(row, col)])        

        def place_number(d, row, col):

            """

            Place a number d in (row, col) cell

            """

            rows[row][d] += 1

            columns[col][d] += 1

            boxes[box_index(row, col)][d] += 1

            board[row][col] = str(d)            

        def remove_number(d, row, col):

            """

            Remove a number which didn't lead 

            to a solution

            """

            del rows[row][d]

            del columns[col][d]

            del boxes[box_index(row, col)][d]

            board[row][col] = '.'                

        def place_next_numbers(row, col):

            """

            Call backtrack function in recursion

            to continue to place numbers

            till the moment we have a solution

            """

            # if we're in the last cell

            # that means we have the solution

            if col == N - 1 and row == N - 1:

                nonlocal sudoku_solved

                sudoku_solved = True

            #if not yet    

            else:

                # if we're in the end of the row

                # go to the next row

                if col == N - 1:

                    backtrack(row + 1, 0)

                # go to the next column

                else:

                    backtrack(row, col + 1)                         

        def backtrack(row = 0, col = 0):

            """

            Backtracking

            """

            # if the cell is empty

            if board[row][col] == '.':

                # iterate over all numbers from 1 to 9

                for d in range(1, 10):

                    if could_place(d, row, col):

                        place_number(d, row, col)

                        place_next_numbers(row, col)

                        # if sudoku is solved, there is no need to backtrack

                        # since the single unique solution is promised

                        if not sudoku_solved:

                            remove_number(d, row, col)

            else:

                place_next_numbers(row, col)                    

        # box size

        n = 3

        # row size

        N = n * n

        # lambda function to compute box index

        box_index = lambda row, col: (row // n ) * n + col // n       

        # init rows, columns and boxes

        rows = [defaultdict(int) for i in range(N)]

        columns = [defaultdict(int) for i in range(N)]

        boxes = [defaultdict(int) for i in range(N)]

        for i in range(N):

            for j in range(N):

                if board[i][j] != '.': 

                    d = int(board[i][j])

                    place_number(d, i, j)

        sudoku_solved = False

        backtrack()


This would be quite a complex algorithm and good for you if you knew how to solve it!

 

Conclusion

 
For a data science interview, the six technical concepts I’ve mentioned are a must. Of course, it’s recommended you dive even deeper into Python and broaden your knowledge. Not only theoretically but also practicing by solving as many as possible both data manipulation and analysis and algorithm questions.

For the first one, there are plenty of examples on StrataScratch. You could probably find the questions from the company where you applied for a job. And LeetCode is a good choice when you decide to practice writing Python algorithms before your interviews.

Read my other articles on data science SQL Interview Questions and Answers and Most Common Data Science Interview Questions!

 
Bio: Nate Rosidi is a Data Scientist & Product Manager.

Original. Reposted with permission.

Related: