Data Science 101: How to get good at R
Everybody talks about R programming, how to learn, how to be good at it. But in this article, Ari Lamstein tells us his story about why and how he started with R along with how to publish, market and monetise R projects.
By Ari Lamstein.
Recently a few people in my membership site have asked me the following question:
How can I get good at R?
This has come up enough times for me to outline my thoughts on the subject. That way I can simply forward people to this post the next time the question comes up.
My advice is geared towards people who want to build an online portfolio that improves their career. This is the same path that I took, and in some ways I’m just saying what worked for me.
My advice is to stop thinking about your aptitude with R at all. Instead just view it as a tool – a means to an end. This will allow you to shift your focus to
- Picking a project that you care about
- Publishing your results from that research
- Communicating directly with people who value your results
In my experience, this change in focus changes everything.
Each of these points could fill an entire book. Today I will simply share how they relate to my own journey learning R. I’ll outline how I went from “R as a side project” to “Career as an R developer”. That’s often what people who ask me this question really want to know.
Why and How I Learned R
I did not set out to learn R.
Several years ago I was working as a software engineer at a real estate company. I was working with sales leads and needed to analyze the data.
MySQL was a great place to start my analyses, but I needed to do more. I started learning R in the hopes that it could help me better analyze the data. It did.
In terms of how I learned R, I probably read everything I could find. There was no single resource that told me exactly what I needed. Much of the material detailed techniques that were not useful to me at all. In retrospect, Winston Chang’s R Graphics Cookbook and the Coursera Data Science courses are the only resources that I can remember by name now.
In this case, the 3 points above were met because I had a full time job, I was analyzing work-related data, and I was sharing the results with my team.
At some point I realized that I had pretty custom analytical needs. My primary analytical unit was the zip code, which is itself unusual. And I was interested in mashing up our internal data with demographic data from external sources, such as the US Census Bureau. I couldn’t find anything in the R ecosystem that could do exactly what I needed, so I built it myself. This project wound up becoming choroplethr.
Since I’ve been a software engineer for most of my career, everything up until now was, in a sense, “normal” for me. It’s the following sections where I really stepped out of my comfort zone.
Publishing the Project
Despite working as a software engineer for 10+ years in San Francisco, I had never created an open source project before. R has a rich tradition of this through its package system, and it seemed like a great opportunity for me to try something new.
Moving the project from internal use to something that others could use “off the shelf” was a lot of work. I genuinely wasn’t sure if anyone else would use the project. But at the time I thought “If this winds up helping at least one other person, it will have been worth it.”
There isn’t an accurate way to measure package installs in R. But at the time of the this writing, and according to the metacran/cranlogs app, the main choroplethr package has been installed 39,000 times.
Marketing the Project
When I first published choroplethr, marketing was a four letter word to me. Now I’ve come to embrace it.
Marketing means different things to different people. But for me it mostly means:
- Finding people who are either using choroplethr, or might consider using it
- Finding out why they are using it. How does it fit into the broader context of their project?
- Finding out what other problems they have that I might be able to help with.
I mostly use content marketing for choroplethr. In practice this means that I have a blog, and a way for people to subscribe to my email list.
The email list is important. Most people who visit a website leave and never come back. If you have their email address, you can talk with them about the above points at a later date.
My primary email opt-in is Learn to Map Census Data in R, my free email course on how to use choroplethr. At this point it has been taken by several thousand people.
Monetizing the Project
Before launching a product, it’s best to have potential customers lined up. Ideally you’ve had a few interviews with them via Skype, surveyed them or at least been in touch with them via email. In practice, these prospective customers will already be on your email list. This is why I list “Market” before “Monetize”.
I have monetized choroplethr in two ways:
- Creating courses on how to use it, and general education around the broader topic
- Providing training and consulting services around the project and broader topic
I found that creating my first course changed everything for me. In addition to direct sales from the course, having a paid, educational product resulted in higher quality consulting leads.
The first course, which I wasn’t sure if anyone would buy, grossed several thousand dollars.
Advice to Beginners
I share my own story only so that it can help the beginners who ask me “How can I get good at R?”
My answer is that it’s a long process. But two things you can do today are:
- Pick a topic that interests you.
- Create a blog, and write your first post. That post can introduce you and say what you are planning to research.
The topic, of course, can change. But I recommend trying to stick with a single topic for a minimum of 3 months. Having deep knowledge about a specific niche will make you more valuable to people who are seeking information about that niche.
If you have any questions on what I outline above, or would like me to write future articles about a particular part of it in depth, feel free to contact me through the “Contact” button below.
If you’d like me to give you personalized help applying this material to your own situation, consider joining my membership site.
Original post. Reposted with permission.
Bio: Ari Lamstein is a Software Engineer and Data Analyst. He helps clients with software engineering and data analysis projects, writes open source software and runs training workshops. See arilamstein.com for details.