Silver BlogStatistics with Julia: The Free eBook

This free eBook is a draft copy of the upcoming Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Interested in learning Julia for data science? This might be the best intro out there.



The majority of debates, discussions, and flame wars regarding the programming languages for data science tend to focus on Python and R. While these may be the 2 most used languages in the space, that doesn't mean that they are the only options available, nor does it mean that they are even necessarily the "best" choices. One additional option, among many, is Julia, a fast, dynamic, open source general purpose programming language which is seen as a desirable data science skill to consider adopting.

One of the best introductions that I have encountered for using the language for data science is the book Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence, written by Yoni Nazarathy and Hayden Klok, and currently in draft form. The book's website can be found here, while its accompanying code examples are available in this GitHub repository.



The first question you may have at this point is "why Julia?", in the context of there being other more widely embraced options out there. It's best to address this before moving on, and we do so with this excerpt from the book's first chapter.

Julia is first and foremost a scientific programming language. It is perfectly suited for statistics, machine learning, data science, as well as for light and heavy numerical computational tasks. It can also be integrated in user-level applications, however one would not typically use it for front-end interfaces, or game creation. It is an open-source language and platform, and the Julia community brings together contributors from the scientific computing, statistics, and data-science worlds. This puts the Julia language and package system in a good place for combining mainstream statistical methods with methods and trends of the scientific computing world.

With that out of the way, we can move on to why this book in particular is a great way to learn Julia for the application of data science. Again, from chapter one of the book:

Question: Do I need to have any statistics or probability knowledge to read this book?
Answer: Statistics or probability knowledge is not pre-assumed. Hence, this book is a self-contained guide for the core principles of probability, statistics, machine learning, data science, and artificial intelligence. It is ideally suited for engineers, data-scientists, or science professionals, wishing to strengthen their core probability, statistics, and data science knowledge while exploring the Julia language. However, general mathematical notation and results including basics from linear algebra, calculus, and discrete mathematics are used.

Question: What experience in programming is needed in-order to use this book?
Answer: While this book is not an introductory programming book, it does not assume that the reader is a professional software developer. Any reader that has coded in some other language at a basic level, will be able to follow the code examples and their descriptions.

That answers the 2 usual questions you might have before starting a book on programming for statistics: is knowledge of programming a prerequisite; is knowledge of statistics a prerequisite? The answer for both of these ends up being no, making this a resource truly befitting a beginner.

The book's table of contents, including appendices:

  1. Introducing Julia
  2. Basic Probability
  3. Probability Distributions
  4. Processing and Summarizing Data
  5. Statistical Inference Concepts
  6. Confidence Intervals
  7. Hypothesis Testing
  8. Linear Regression and Extensions
  9. Machine Learning Basics
  10. Simulation of Dynamic Models
  1. How-to in Julia
  2. Additional Language Features
  3. Additional Packages


The book's section 1.3, Crash Course by Example, is a great place to start your Julia voyage if you are inexperienced with the language. After a quick once-over of the linguistic basics in the previous section, you jump right into coding with some non-trivial examples, including bubble sort, string manipulation, JSON parsing, and Markov chain steady states.

As this chapter marches on, plots and graphics are covered, as is random number generation and Monte Carlo simulations, followed by integrating Julia with other languages. The book gets into statistical concepts in the next chapter, and from that point onward the concepts build upon one another, leading up to more advanced topics such as statistical inference, confidence intervals, hypothesis testing, linear regression, machine learning, and more.

This is the resource I've been waiting for to effectively learn Julia for data science the way I have been wanting to learn it. I hope you're as excited as I am to get moving on your journey.