How To “Ultralearn” Data Science: optimization learning, Part 3

This third part in a series about how to "ultralearn" data science will guide you through how to optimize your learning through five valuable techniques.

By Benthecoder, Developer, Writer, Machine Learning and AI enthusiast.

Photo by Safar Safarov on Unsplash.

See also


I hope by now, you have a solid foundation on ultralearning and learning how to learn, as well as hacks for deep work and focused learning. With that, now you’re ready to learn. But just as one can optimize a machine-learning model to make it better, there are a few ways to optimize your learning methodology to construct an efficacious system for learning.

In this segment, it’s all about optimizing the learning process for data science. The main focus will be on directness, drilling, retrieval, feedback, and retention.

“Intellectual growth should commence at birth and cease only at death.” -Albert Einstein


1. Directness

Photo by Moritz Mentges on Unsplash.

Data science is a broad field with many subfields. To learn all the necessary skills and prowess solely with formal education is implausible. Imagine once you finish a bachelor’s degree in data science, you look for an internship or even a job, and you have no experience in the market at all. Imagine everything you’ve studied in college for the past four years, did not prepare you for the real world. This is why some data scientists have a Ph.D. in theoretical physics or a master's in statistics. They had a few more years before diving into the real world. This is a common situation, and it applies to almost every other major as well, and it’s called failure the transfer.

Transfer is the process of learning something in one context (statistics and programming) and then transferring it to another (predicting the temperature rise of Earth in the next 20 years). Despite its importance, formal education often fails to optimize transfer. Ironically, this transfer learning concept also exists in the field of deep learning.

The problem with formal education is that it sets up an indirect path between the learning context and the target environment — the context in which learned skills and knowledge are applied. For example, you learn about linear algebra in college. You spend hours on practice questions and past year papers. However, when it comes to applying it in data science, you fail to transfer it into an application, as you don’t have an underlying cognizance of what concepts (such as determinants, invertible matrices, Eigenvectors, Gram-Schmidt processes, and so on) actually mean. In other words, you didn’t grasp the essence of linear algebra. (Check out 3blue1brown’s essence of linear algebra YouTube playlist.)

Rote learning statistics and probability in a college classroom is a far cry from applying it in the real world, such as making the right decision in the face of uncertainty using statistical methodology.

Ultralearners know to keep the path between their learning environment and their target environment as direct as possible. By doing this, they cultivate a quality of ultralearning called directness.

The most direct way to learn something is to do it. The most effective way to learn to code is to write code. The most effective way to learn data science is to engage in data science projects and solve real-world problems.

“For the things we have to learn before we can do them, we learn by doing them.”― Aristotle

This learning-by-doing approach is called project-based learning.

It situates the skill you’re learning directly in your target environment — no transfer necessary!

One of the most extreme but effective modes of project-based learning is immersive learning: total immersion in the target environment. Applied to data science, this could mean three months of internship, data science competitions, 100 days of data science, etc.

Of course, not everyone has time for immersive learning. Moreover, some skills don’t lend themselves to this approach. There’s a reason that trainee pilots don’t immerse themselves by flying Boeings on their first day of training. Instead, they learn in flight simulators.

If immersive learning isn’t within your reach, use the flight simulator method by replicating the conditions and pressures of your target environment as closely as possible.

If you can’t get an internship, participate in data science competitions or innovate on your data science projects and solve problems you are passionate about. To replicate the conditions and pressures of data science, you can set a time limit for your project and even present the data to your friends and family (make sure they understand).

Whatever you’re learning, establish a direct path between your learning context and your target environment.

Once you’ve done that, it’s time to drill down and perfect your technique.


2. Drilling

Photo by Matt Antonioli on Unsplash.

What do elite athletes, piano prodigies, and successful ultralearners have in common? They all rely on drilling to perfect their techniques and maintain their competitive edge. So, how can you drill strategically to achieve the best results?

Crucially, you should never begin your project by drilling. Instead, use the direct-then-drill approach. To do this, start with direct practice, whether you’re writing code or solving business problems. Use this direct practice to identify the areas where you wish to drill. After drilling, go back to direct practice until it becomes necessary to drill again.

To make the most out of your drilling, apply it to a rate-determining step.

In chemistry, the rate-determining step is the part of the process that precipitates a chain reaction; in ultralearning, it’s the step that opens up the next level of knowledge or the broadest range of applications.

For example, you may have a great grasp of the concepts of machine learning but lack the programming expertise to put these concepts into practice. In that case, learning Python would be your rate-determining step, so you’d focus your drilling in this area.

How should you design your drills? That depends on the area you want to drill. Can it be easily isolated from the rest of your project? If so, try time-slicing, where you isolate one step in a more involved process and repeat the step until you’ve perfected it. If you want to perfect your data wrangling, for example, you could time-slice by drilling your code-cleaning capabilities. Or, separate your desired skill into different cognitive components and drill each separately. For example, in Python programming, you could drill Pandas, scikit-learn, or PyTorch.

If you’re working on a more creative or complex project, you might find it challenging to drill in isolation — it’s hard to drill problem-solving, for example. In that case, try the copycat method instead. Choose a successful person you admire, whether it’s billionaire investor Warren Buffet or genius entrepreneur Bill Gates, and emulate the way they solve problems as closely as you can.

Pop quiz! Why is transfer learning important? What is directness? What’s the rate-determining set? If you had trouble answering these questions, you may need to work on retrieval, which is the next topic.


3. Retrieval

Photo by David Travis on Unsplash.

Learning statistics is a great way to improve your problem-solving skills — but only if your hard-won knowledge doesn’t desert you when you’re at your easel. It’s pointless learning new skills, concepts, and procedures if you’re unable to retrieve them quickly and efficiently. As a data scientist, you must have the prowess to understand data — down to the fundamental level — and clean, model and present the data in the right form. After that, you have to tell a story about your data, converting your ingenious analysis into layman terms. That said, to ensure you’re always ready to mess with data and articulate it, there are two methods you can use to improve your retrieval rate.

The first is review: going back over the materials you’ve just studied.

The second is recall: trying to recall facts and concepts from memory.

A 2011 study from Purdue University shows recall is significantly more effective for long-term learning retention, yet most learners opt for review strategies over recall strategies when trying to consolidate their learning.

The reason we prefer review over recall all comes down to a concept called the judgment of learning. Essentially, we humans believe that we have learned a concept when we can process it without any difficulty. In college, students read back their notes over and over again, fabricating an impression that they have grasped the information. That’s why we gravitate toward passive review strategies: They confirm our perception that we’re learning successfully.

But perception isn’t everything. Struggling to recall something in the short term means you’re far more likely to remember it in the long term. Experts call this desirable difficulty — the difficulty posed by the recall is ultimately desirable, as it maximizes our chances of retaining what we’ve learned.

To apply an active recall strategy in your learning, here are three ways to do so.

I) Pose questions

During your study session on neural networks, for example, pose questions that force you to recall the answer. Write “How are neural networks applied in real life?” instead of “Neural networks are used for image classification, object detection, …” Every time you go over your notes, you’ll be forced to recall what you’ve learned.

“It is very important for young people keep their sense of wonder and keep asking why.“ — Stephen Hawking

II) Free recall

After the study session, sit down with a piece of blank paper and write down everything you can remember from what you’ve learned, in as much detail as possible.

III) Test everything that you have learned

Finally, for a more concrete recall-based challenge, set yourself a task that will test everything you’ve learned in your data science project so far. The advantage of this approach is that you don’t need to waste time recalling general aspects of your subject that don’t apply directly to your intended project; rather, you’ll recall specific skills and concepts in a targeted way as you need to use them.

Mastered retrieval? It’s time to be cordial with feedback.


4. Feedback

Photo by Adam Jang on Unsplash.

No matter what level of expertise you’re at, you need to seek out feedback on your progress if you want to improve. Moreover, you need to learn how to distinguish between different levels of feedback and acquire strategies for eliciting feedback.

Almost all feedback is useful, but not all feedback is created equal. It’s helpful to divide feedback into three different categories.

Outcome feedback

This feedback validates that you’ve reached the desired outcome. Imagine you’re giving a presentation of your data, and your clients fully comprehend the result and applaud you for your work. That’s outcome feedback. It can be encouraging, but it’s hard to glean any more information from this type of feedback.

Informational feedback

This feedback gives you more to work with, as it alerts you about your mistakes. Imagine if you made a mistake, and your data is entirely wrong. The lead data scientist then pulls you out of the project and hands it over to someone else. This kind of feedback is useful for highlighting problem areas and isolating your mistakes.

Corrective feedback

This is the best feedback, as it tells you what you’re doing wrong and how to fix it. This is where the lead data scientist gives you notes on what went well, what didn’t land, and how you can improve. In this scenario, you are given corrective feedback that’s constructive and helps you develop and grow.

“I will take every constructive criticism, make it my own, learn from my mistake, and go forward” — Julie Payette

When sorting through your feedback, focus on corrective feedback over informational feedback and informational feedback over outcome feedback.

How do you ensure you’re receiving enough feedback in the first place? Start by remembering to fail for feedback. If you’re not extending yourself to the point where you fail, you stop yourself from getting useful informational or corrective feedback. Pushing beyond your limits will elicit helpful feedback. Acting on that feedback will, in turn, extend your limits.

Don’t neglect to seek meta-feedback, either. It’s important to seek feedback on how well your learning methods are working. A simple way to test your learning methods is to track your learning rate — try timing how long it takes you to clean your data, for example. If your learning rate isn’t tracking upward, act on this negative feedback by revisiting your learning methods.

By eliciting feedback and prioritizing corrective and informational feedback, you can constantly adjust and improve your performance.


5. Retention

Photo by Robina Weermeijer on Unsplash.

Your ultralearning journey to being a data scientist might not require so much memorization since everything can be googled these days, but you’ll need to memorize some facts, formulas, or procedures just to speed up your work process and make it more effective.

So, to learn things so that they stick, the most productive strategy you can employ is to settle on a memorization system and incorporate it at regular, closely spaced stages throughout your journey. The key is to use a memorization system that’s both easy to integrate into your project and well-suited to the type of project. For example, it’s better to have a semantic network type of memorization system for data science as it is a very diverse field with many concepts to remember from different subjects. And it establishes the underpinnings of each convoluted topic and allows you to append new information to it as you learn. This substantiates your ability of retention, as the foundation is there.


Action Plan

  1. Fast track from theory to practice by doing and immersing yourself in the learning process.
  2. Use drilling to hone your skills to perfection or the copycat method to learn from the best.
  3. Using the recall strategies to retrieve the information you’ve learned by perpetually posing questions and testing yourself throughout the voyage of ultralearning.
  4. Focus on corrective feedback that helps you identify weaknesses and improve your performance.
  5. Apply smart, strategically spaced memorization sessions using a semantic network of memory that ensures what you learn sticks.

Original. Reposted with permission.