Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning

Functional programming is touted as a solution for big data problems. Why is it advantageous? Why might it not be? And who is using it now?

I have not yet read James Joyce's Ulysses. The majority of the population also has not. However, of those who have, it seems that most laud it. It is so universally praised that even people who have never read the book reference it as an archetypal masterpiece. Presumably, reading it doesn't require a special skill set beyond reasonable literacy. And yet the set of people who have read it remains small relative to those who know of it and could potentially read it.

Periodically, technologies emerge which fit this pattern. A small fervent group of devotees evangelize it, while a far broader population knows it by name. This broader group occasionally feigns familiarity, and may possess a loose sense that it is important, but really has no idea why.

One such technology is functional programming. Functional programming languages like Haskell and OCaml boast several battalions of devoted acolytes. The broader population possesses a loose notion that functional programming is useful for big data, but mostly doesn't use it In this article I'll provide an overview of the advantages and disadvantages of functional programming for machine learning and data science. Then I'll consider whether its purported benefits are currently exploited by practitioners and academics in these areas.

What is Functional Programming

A thorough treatment of any specific functional language is outside of the scope of this post. Instead I'll introduce some of the most important aspects that most such languages share.

Functional programming is a paradigm which rejects several aspects of its opposite, imperative programming, and introduces a few new ideas, several of which have been subsequently adopted by many popular imperative programming languages.

What You Can Do

First-Order Functions
Functional programming supports first order functions. These functions can be passed as arguments to other functions, can be spun up anonymously, and can be composed of other functions. This can be convenient for machine learning algorithms when you may want to pass functions as parameters such as different regularizers, update rules, or even learning algorithms altogether. One simple but useful application of composing functions from other functions might be quickly defining a mixed-norm regularization function. Another might be defining the computation that takes place at any node in a neural network. While first-order functions are useful, it is widely available in many other languages which are not purely functional. Scala, Julia, and Python all support first-order functions.

Type System
The functional programming community has created rich and powerful type systems. Unlike the simpler systems in lower-level languages like C, or invisible type systems in languages like Python, functional languages like Haskell provide an easy way to create new types, alias types, and even effect polymorphism via types. This can be especially useful when implementing machine learning algorithms. Consider a complicated neural network architecture in which each example has real valued features (floating point numbers), each edge in the graph has a weight (also a float), each node has both an input (float) and activation (float between 0 and 1). Then when back-propagating the error, each node has a delta value (float), and each node an approximate gradient component (float). Unfortunately, it is exceedingly easy to mess up, when implementing such an algorithm, taking input to a layer, instead of activation for example, and this might never create an error in the C language as long as no out of bounds errors arise. A language like Haskell, however, makes it easy to define separate types for gradients, activations, deltas, weights, etc. and to guarantee that a function operates only on the correct types. As a result, many programming errors that would otherwise seep through and require manual debugging can be caught by the compiler. A data science toolbox end-user may not benefit directly, but anyone implementing algorithms themselves clearly would. More expressive type systems, inspired by functional languages have grown in popularity. Julia, for example offers a rich type system in an otherwise imperative language.

What You Cannot Do

Equally as important as what functional programming languages enable is what they expressly prohibit. So-called pure functional languages forbid state and traditional iteration structures. Absent side effects, all computation is done by evaluating functions, and these functions should behave the same way any time they are called with precisely the same arguments. Functional programs forbid assignment. It would be impossible to write the following program in a purely functional language:

x = 5

x = 10

It is therefore impossible to write loops in the iterative way (which involve incrementing a counter or repeatedly testing a condition). Instead all loops must be implemented with recursion. This might seem absurdly restrictive. Why would we want to give up assignment? Further, the real world, and the models we are training presumably do have evolving state. Who are we trying to fool by pretending that they do not?