Novice to Ninja: Why Your Python Skills Matter in Data Science

As a data scientist, is it worthwhile leveling up your Python skills? Dive into code comparisons across expertise levels & discover if "good enough" is really enough.



Novice to Ninja: Why Your Python Skills Matter in Data Science
Image created by Author with DALL•E 3

 

Introduction

 

We know that programming is a useful (essential?) skill for data scientists to possess. But what level of programming skill is necessary? Should a data scientist aim to be "good enough," or instead desire to become an expert level programmer? Should we aspire to be coding ninjas?

If we are going to explore this topic, we should first get an idea of what a beginner, intermediate, and expert level programmer look like $mdash; or at least what their code looks like.

Below you will find 2 programming tasks, each with 3 code snippets; one each for potential beginner, intermediate, and expert level programmer approaches to completing that tasks, with some explanation as to the differences. This should give us a foundations on which to build a discussion about the importance of programming abilities.

Remember, these are concocted approaches meant to imitate programming at these different levels. All the scripts are functional and get the job done, but they do so with varying degrees of elegance, efficiency, and Pythonic-ness.

 

Task: Find the Factorial of a Number

 

Let's first take a task that is simple but can be approached in multiple ways, finding the factorial of a given number. Let's implement this task for hypothetical beginner, intermediate, and expert Python programmers, and compare differences in the code.

 

Beginner's Approach

 

A beginner may use a straightforward approach using a for loop to calculate the factorial. Here's how they might do it.

n = int(input("Enter a number to find its factorial: "))
factorial = 1

if n < 0:
    print("Factorial does not exist for negative numbers")
elif n == 0:
    print("The factorial of 0 is 1")
else:
    for i in range(1, n + 1):
        factorial *= i
    print(f"The factorial of {n} is {factorial}")

 

Intermediate's Approach

 

An intermediate programmer might use a function to improve code reuse and readability, and also use the math library for basic checks.

import math

def factorial(n):
    if n < 0:
        return "Factorial does not exist for negative numbers"
    elif n == 0:
        return 1
    else:
        return math.prod(range(1, n + 1))

n = int(input("Enter a number to find its factorial: "))
result = factorial(n)
print(f"The factorial of {n} is {result}")

 

Expert's Approach

 

An expert programmer might use recursion and add type hints for better maintainability. They may also make use of Python's terse and expressive syntax.

from typing import Union

def factorial(n: int) -> Union[int, str]:
    return 1 if n == 0 else n * factorial(n - 1) if n > 0 else "Factorial does not exist for negative numbers"

n = int(input("Enter a number to find its factorial: "))
print(f"The factorial of {n} is {factorial(n)}")

 

Summary

 

Let's have a look at the differences in code and what stands out most between the levels of expertise.

  • Beginner: Uses longer overall code, no use of functions or libraries, straightforward logic
  • Intermediate: Uses a function for better structure, uses math.prod for calculating the product
  • Expert: Uses recursion for elegance, adds type hints, and uses Python's conditional expression for conciseness

 

Task: Generate Fibonacci Numbers

 

For a second example, let's consider the task of finding the Fibonacci sequence up to n numbers. Here's how programmers at different levels might tackle this task.

 

Beginner's Approach

 

A beginner might use a basic for loop and a list to collect the Fibonacci numbers.

n = int(input("How many Fibonacci numbers to generate? "))
fibonacci_sequence = []

if n <= 0:
    print("Please enter a positive integer.")
elif n == 1:
    print([0])
else:
    fibonacci_sequence = [0, 1]
    for i in range(2, n):
        next_number = fibonacci_sequence[-1] + fibonacci_sequence[-2]
        fibonacci_sequence.append(next_number)
    print(fibonacci_sequence)

 

Intermediate's Approach

 

An intermediate programmer might use list comprehensions and the zip function for a more Pythonic approach.

n = int(input("How many Fibonacci numbers to generate? "))

if n <= 0:
    print("Please enter a positive integer.")
else:
    fibonacci_sequence = [0, 1]
    [fibonacci_sequence.append(fibonacci_sequence[-1] + fibonacci_sequence[-2]) for _ in range(n - 2)]
    print(fibonacci_sequence[:n])

 

Expert's Approach

 

An expert might use generators for a more memory-efficient approach, along with Python's unpacking feature to swap variables in a single line.

def generate_fibonacci(n: int):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

n = int(input("How many Fibonacci numbers to generate? "))
if n <= 0:
    print("Please enter a positive integer.")
else:
    print(list(generate_fibonacci(n)))

 

Summary

 

Let's see what the major differences are and what major programmatic differences separate the expertise levels.

  • Beginner: Uses basic control structures and lists, straightforward but a bit verbose
  • Intermediate: Utilizes list comprehensions and zip for a more Pythonic and concise solution
  • Expert: Employs a generator for a memory-efficient solution and uses unpacking for elegant variable swapping

 

The Benefits of "Ninja" Coding

 

If all of the example code works and ultimately gets the job done, why should we strive to become the best coders that we can be? Great question!

Becoming a proficient programmer is about more than just getting code to work. Here are some reasons why striving to be a better coder is beneficial:

 

1. Efficiency

 

  • Time: Writing more efficient code means tasks are completed faster, which is beneficial both for the programmer and for anyone using the software
  • Resource Utilization: Efficient code uses less CPU and memory, which can be crucial for applications running on limited resources or at a large scale

 

2. Readability and Maintainability

 

  • Collaboration: Code is often written and maintained by teams. Clean, well-structured, and well-commented code is much easier for others to understand and collaborate on
  • Longevity: As projects grow or evolve, maintainable code is easier to extend, debug, and refactor, saving time and effort in the long run

 

3. Reusability

 

  • Modularity: Writing functions or modules that solve a problem well means that you can easily reuse that code in other projects or contexts
  • Community Contributions: High-quality code can be open-sourced and benefit a wider community of developers

 

4. Robustness and Reliability

 

  • Error Handling: Advanced programmers often write code that can not only solve problems but also handle errors gracefully, making the software more reliable
  • Testing: Understanding how to write testable code and actual tests ensures that the code works as expected in various scenarios

 

5. Skill Recognition

 

  • Career Advancement: Being recognized as a skilled coder can lead to promotions, job opportunities, and higher pay
  • Personal Satisfaction: There's a sense of accomplishment and pride in knowing that you're capable of writing high-quality code

 

6. Adaptability

 

  • New Technologies: Strong foundational skills make it easier to adapt to new languages, libraries, or paradigms
  • Problem-Solving: A deeper understanding of programming concepts enhances your ability to approach problems creatively and effectively

 

7. Cost-Effectiveness

 

  • Less Debugging: Well-written code is often less prone to bugs, reducing the amount of time and resources spent on debugging
  • Scalability: Good code can be more easily scaled up or down, making it more cost-effective in the long run

 
So, while getting the job done is certainly important, how you get it done can have wide-ranging implications for your personal development, your team, and your organization. We should all strive to become the best programmers that we can be, and that goes for data scientists as well.
 
 

Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Managing Editor, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.