Emacs for Data Science
Data science nowadays demands a polyglot developer and, choosing a correct code editor would definitely be a worthy investment. Here we provide, important features of Emacs and its advantages over other editors.
By Robert Vesco.
If you want an editor that works with R, Python, SAS, Stata, SQL and almost any other data science language. If you want an editor with IDE-like features. If you want an editor that works on any platform and as well as on the terminal. If you're a fan of literate programming. If you want an editor that is highly customizable and will be around after most editors have come and gone, then you'd be hard pressed to find anything better than Emacs.
If you work in exclusively in R, you might want to work in Rstudio. If you work in python, you might be tempted by Spyder. Chances are there is a specialized IDE for whatever language you typically work in. But that's the rub. What if you want to work in another language? Or combine languages? You end up using several IDEs, but not knowing them well. Plus, once they fall out of favor or stop being updated, your hard-gained knowledge is lost. At the other end of the spectrum there are text editors like notepad++ and sublime. These work with just about any language you can imagine and with some add-ons you can get additional features, but they tend to be limited to certain platforms and customization is often non-trivial.
A modern data scientist often has to work on multiple platforms with multiple languages. Some projects may be in R others in Python. Or perhaps you have to work on a cluster with no gui. Or maybe you need to write papers with latex. You can do all that with Emacs and customize it to do whatever you like. I won't lie though. The learning curve can be steep, but I think the investment is worth it.
Below are some key features that I think make Emacs an excellent editor for any data scientist.
For most programming languages, you get out-of-the-box syntax highlighting. Packages like ESS and Elpy provide additional features like autocompletion, documentation and debugging capabilities. The number of IDE features available will vary by language, but at minimum there is probably syntax highlighting and some form of autocompletion.
Figure 1: "Autocompletion"
One of the things that I enjoy is easy access to help and function parameters … which often also come with autocomplete.
Figure 2: "Help for Functions"
Figure 3: "Parameter help for Function"
Enough with the print statements already and debug that R and python code!
Figure 4: "Interactive debugging with conditional breakpoint"
One of the features that first sold me on Emacs was interactive commands. With a keyboard short cut you can send a buffer, function, paragraph or line to the interpreter. Let me be clear – you don't even have to highlight the code. This saves you a ton of time when you're doing statistical analysis1.
Figure 5: "Interactive Commands"
Do you work with databases? Many of the same benefits mentioned above also apply to sql. Work with sqlite, postgresql, mysql and other databases interactively. Do you have a long SQL statement you are debugging? No problem. Iterate quickly.
Figure 6: "Interactive SQL"
Org mode / Literate Programming
Do you write publications? Do you want to keep your code and paper together? You a believer in reproducible research? With emacs you can put any language you want in your document. While Rstudio allows this also, you're limited to just R and latex.
Figure 7: "Literate Programming: Code & Stata"
do you need latex? No problem.
They key to this magic is a monster package called org mode. It is one of emacs killer features. You can also use this to organize your code... or your life.
Sometime you need to remote into a server. Or perhaps you are working on a cluster with no gui and you need to interactively debug your scripts.
Figure 8: "Works in the terminal just as well"
Interacting with the shell
Is there are terminal command you wish you could run? In emacs you can run terminal commands easily. But what makes this feature super cool is that it can operate on your text. You can select a region of code, send it to a terminal command and have that stdout replace the text in your buffer!
Figure 9: "Using SED to find and replace text in the buffer"
Data scientists often work with tabular data. Sometimes you may want to delete or move a column around. Or perhaps there is a block of white space you need to change.
Figure 10: "Using rectangle mode to alter blocks of text"
Everything at your finger tips
Emacs has numerous packages that allow you to search and find files, functions and anything else that you can imagine. But by far the best is helm. With just a few keys you can instantly find what you are looking for. I couldn't do it justice, but this demo gives you a taste for the amazing things it can do.
Any feature you want
Perhaps you're wedded to sublime's multiple cursors? You can get it: http://emacsrocks.com/. Or perhaps you're a long time vim user? Evil Mode gives you the editing power of Vim with the utility of emacs. If you're a git user, Emacs has magit which make working with git a joy. If there is something that it doesn't have, check for packages, else emacs is the most customizable editor you will find. Almost everything about it can be made to work your particular work flow.
30+ years old and a large user base
Emacs has been around a long time. Code that was written a decade ago mostly still work. And every year it's getting better. However, emacs24 is amazing. If you tried emacs years ago, you should give it another try. It now has package management built in, so you can easily add testing packages. Importantly, there is no sign that emacs is going away anytime soon and it's free. It will likely be around for at least another decade if not more.
So what are the downsides?
Legacy code on the intertubes confuses people
Emacs has been around a long time. Emacs 24 was a huge improvement, but it also broke a lot of things. Same goes for Org-mode between versions 7 and 8. A lot of stuff on the intertubes will lead you astray and frustrated if you're not aware.
Emacs-lisp for customization
I actually enjoy working with lisp because it is so different from other languages I work with. However, many others would prefer using a language like python.
Not bro/noob friendly
Emacs is not like the cheerful, always smiling bro with the abercrombie face. First encounters can be painful and awkward. It's not sublime. That said, there are several starter packages to enable useful features out-of-the-box. For scientists, Kieran Healy's starter package might be useful:http://kieranhealy.org/resources/emacs-starter-kit/
Another useful package is prelude: https://github.com/bbatsov/prelude
If you're on a Mac, I've heard aquamacs will keep you warm and comfy:http://aquamacs.org/
Most of these will give you the power of emacs, quickly. Personally, I prefer to build my emacs up by scratch so it does what I want it to do and no more. But these packages are great ways to get a feel for it's power.
If you decide you want to work with python be prepared to experiment with lots of different python packages. While emacs has basic python support, you probably want linting, refactoring or other useful features. Many packages have tried to implement these features, some better than others. Personally, I like Elpy, but it's not perfect. The downside of options is that you have to wade through them. It can be painful sometimes.
What am I missing?
While I tried to include most of the features that I think would appeal to data scientists, let me know if I missed any killer feature and I'll try to include it here. https://twitter.com/robertvesco
1 Like many other features this will depend on the package you install. That said, it's easy to implement this feature for your favorite language
Bio: Robert Vesco is a doctoral candidate transitioning back to industry.Related:
- R vs Python for Data Science: The Winner is …
- Top 20 R packages by popularity
- 21 Essential Data Visualization Tools