Contributing to PyTorch: By someone who doesn’t know a ton about PyTorch
By the end of my week with the team, I managed to proudly cut two PRs on GitHub. I decided that I would write a blog post to knowledge share, not just to show that YES, you can too.
By Cami Williams, Open Source Developer Advocate Lead at Facebook
Hello, friends. My name is Cami. Welcome to my mind dump.
For those of you who don’t know me, I am an ex-software engineer turned developer advocate. I recently started a new role at Facebook as the pillar lead Developer Advocate for ML/AI (I know, I am fancy). For those of you who don’t care about titles, basically I have joined FB to help with the PyTorch developer community.
At this point, it is worthwhile saying that up until now I have never used PyTorch. Ever. In my life. Sure I had heard about it, watched some videos about it, but experimenting? No. Reading the source code? Nah. Contributing to it? You already know the answer.
That is, until .✫*ﾟ･ﾟ｡.☆.*｡･ﾟ✫* LAST WEEK .✫*ﾟ･ﾟ｡.☆.*｡･ﾟ✫*.
I got the privilege to go sit with some of the engineers and researchers working on PyTorch to help me get ramped up. Beforehand, I had started the PyTorch Udacity class online. By this point I had only made it through lesson 2, which gives you a nice overall understanding of neural networks. By the end of my week with the team, I managed to proudly cut two PRs on GitHub. I decided that I would write a blog post to knowledge share, not just to show that YES, you can too, but also so that I wouldn’t forget everything that I have learned over the Labor Day weekend.
Things to know about PyTorch
One of the first meetings I had with the eng team was to give me a brief overview of PyTorch and how it works. What resulted was a beautiful whiteboard:
By this figure you can have a clear understanding of PyTorch. Now let’s move onto the code…
Just kidding. You are probably wondering what all the circles, buzzwords, and that random AWS means. Allow me to give you a brief overview based upon my understanding from that:
- PyTorch is a deep learning framework for fast, flexible experimentation. It’s a Python-based scientific computing package living on top of a C++ backend API.
- The Python “front-end” of PyTorch has three distinct sections:
Torch: A package containing data structures for multi-dimensional tensors and mathematical operations.
Torch.nn: Creating and training neural networks. The data inputted into these modules are passed in the form of Tensors. For example: when training a “convolutional neural network” for images (ok if you don’t know what that is!) you can use the “nn.conv2D” module (I got this example from my coworker Seth)
Torch.optim: optimization algorithms used for training the neural networks. For example: algorithms such as SGD — for beginners — or more advanced ones such as Adam used for training (another great example from good ol’ Seth)
- Tensors in PyTorch are similar to NumPy arrays. The thing with tensors though is they can range from being a single number, to a 1-D matrix (vector), to a X-D dimensional structure that is full of crazy data.
- Taking the gradient of something is taking the derivative of something (that’s right kids, machine learning is calculus). You take the gradient of a tensor to help you figure out what you need to do to minimize error.
- Gradient descent is an algorithm that allows us to minimize error efficiently. The error is determined by our data. We have data that is properly classified and improperly classified. We take the gradient to decrease the number of improperly classified items.
- The most important thing to know about tensors is that they keep track of gradients automatically. The data within a tensor represents connecting edges in a neural network. Tensors have 3 properties worth mentioning:
rank: Identifies the number of dimensions of the tensor. Ex: a vector has a rank of 1.
shape: The number of rows and columns it has. Comes back as a
type: Data type assigned to the tensor’s elements.
- The training loop for data for your neural network in PyTorch starts with calling a function defining the data, differentiating it against the other data, and then applying it to the neural network, performing a gradient operation to minimize error, and then applying any outlier parameters to the neural network. Repeat for your entire dataset you want to train in your neural network.
- The C++ “backend” has five distinct sections:
Autograd: records operations on tensors to form an autograd graph. Basically calculates derivatives performed on a gradient-enabled tensor.
ATen: The tensor and mathematical operation library
C++ Frontend: High level constructs for training and evaluation of machine learning models;
C++ Extensions: A means of extending the Python API with custom C++ and CUDA routines.
- CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by Nvidia. Okay, I got that part from Wikipedia, but basically CUDA is used to do things on the GPU.
- A GPU is a graphics processing unit. Think of it as just a big computer we can compute big things on.
“That wasn’t brief, Cami, my comprehension tubes are clogged.” I know, I am sorry, but trust me reading that will help. Here is an obligatory I am learning something gif.
Also, this post was not intended to help you understand the ins and outs of PyTorch. I am sharing this to help you get started contributing to the PyTorch open source repo on GitHub.
Within this repo, you can already deduce what a lot of the folders contain based upon the information above. There is also a great README that gives more information on neural networks, tensors, etc.
Setting up your development environment
To get started, read CONTRIBUTING.md. Before you contribute anything, read this. Read it. Do not skim. I skimmed it at first which lead to many headaches and tears.
If you can, that whether you are working on a Python or C++ issue, you should work on a GPU. There are other avenues you can go down (ex: AWS DLAMI or Nvidia Docker containers) too. These will help you to run things a lot faster. My first rookie mistake when I tried contributing was that I did everything locally on my computer. This resulted in many segfaults and errors that were confusing but ultimately a result from not having a powerful enough machine. I ended up SSH/MOSH/TMUX into a GPU I reserved and it worked really nicely. If you want help on how to set yours up, check out a couple of these posts:
- Introduction to GPU computing on HPC: GPU and SSH setup
- Make Sure that PyTorch Using GPU to Compute
- Reserving GPU memory
- Installing PyTorch and Tensorflow with CUDA enabled GPU
Now let’s dive into setting up your environment for PyTorch. When you have SSHed into your GPU, you need to do a couple housekeeping items:
- Link your GitHub account. Be sure to create an SSH key on your GPU and add it to your GitHub account.
- There is a good chance you will already have Python on your machine. Make sure it is Python 3. You probably already also have pip. If not, install both.
- Install Miniconda. It is a lot like pip. Their website doesn’t do a great job on documenting how to install it from your terminal, so this is what I did:
- Restart the terminal and check that this worked by typing
conda config — set auto_activate_base falseto prevent Conda from automatically activating (this can mess up other work on your machine).
conda activateto activate Conda. Before you start developing, always check that conda is activated. It will say
(base)at the beginning of your terminal line.
- Install ccache. If these steps don’t work, try doing
conda install ccachewhen conda is enabled. If that doesn’t work, try
- Check that this worked by typing
ccacheand ensure it appears in your install list by typing
- Install a faster linker. Don’t do the
ln -s /path/to/downloaded/ld.lld /usr/local/bin/ldcommand until you need it. I ran into issues and ultimately deleted these files because I didn’t even need them.
pip install ghstackor
conda install ghstack.
pip install ninjaor
conda install ninja.
- Set up flake8 which will be your linter by doing
pip install flake8 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi mccabe pycodestyle pyflakes. When you want to lint a file, do
- Set up an IDE that allows you to SSH into your GPU. I use Atom. See remote-ssh for instructions.
- AND FINALLY, set up PyTorch:
python setup.py develop is what you use to build your PyTorch code. It takes a while. After you build it once, there are ways to make this run faster. You have to build your PyTorch code whenever you edit C++, but not when you edit Python.
IT TAKES A LOT OF GUSTO TO GET YOUR ENVIRONMENT SET UP PROPERLY. If you have done it, pat yourself on the back and get some ice cream (sorry lactose-intolerant friends, try this leaf instead: 🌿).
If you have issues, never fear, I had many. Check out “Common Errors” down below or comment on this post and I can try to help. This took me a lot of time to finally understand and get set up, so I truly wish you the best.
Time to Contribute
Now that you have set up your environment, you can start contributing. If you haven’t ever contributed to Open Source before, you should skim the Code of Conduct. A lot of repos have them, in short be a good person and contribute good things.
Taking a look at the Issues tab in GitHub. These are mostly community-member filed issues. The label bootcamp to see beginner-level issues you can work on. I recommend before starting on an issue to just read through some of these to get a sense of where certain things live. You can also update filters to look at closed bootcamp issues. If you are feeling ambitious, you can also take a look at the Pull Requests tab. Not all PRs have related issues to give more insight, so I find it easier to get the lay of the land by looking at issues.
When I first looked through the issues, the imposter syndrome immediately crept in. “I don’t have a PhD in ML… I can’t do this.” Ssshh, of course you can.
You don’t need to know everything about everything to help out. All you need is direction.
If you find an issue you want to work on, don’t be shy about commenting with questions. Even just a, “I want to take this, do you have a recommendation on where to begin?” The reviewers on the issues are there to help, and will be elated that someone has an interest in solving the issue. I did this in-person, and was directed to a couple of issues that gave me a good foundation.
Let’s talk about how I solved issues without knowing what I was solving for!
Fortunately for all of us, the people who submit issues to the PyTorch repo tend to be really thorough and know what they are talking about from previous experience. Looking at this one in particular, with the background knowledge I gained from the whiteboarding session, I knew that this lived somewhere in the Python front-end since it was an issue with accepting input. It was a little sneaky at first, because it mentioned autograd, which we know is on the C++ side.
I created a new branch, and navigated to the
torch folder in the top-level directory since, in the example, the function in question being called by
torch. Wouldn’t you know: one of the first folders is labeled
autograd. Things are intuitively named? Huzzah!
Stepping into the
autograd folder, I used
git grep on the phrase
grad( to try and find the function definition. Sure enough, one of the first hits was in the
grad function, there were also *gasp* intuitively named variables! Could it be that all I had to do was a simple condition? Let’s try it. I knew that the objects
grad_outputs had to have the same shape. I also knew that with NumPy there was a way to easily compare the shapes.
Boom. Now, as a good software engineer, I know that my PR would surefire be accepted if I also had a test for the new code. But first I wanted to assure that this wasn’t breaking anything. Within the CONTRIBUTING.md file, it says to run
Ran it aaaaand, it failed. Why? Well actually PyTorch doesn’t use NumPy, it RECREATES NumPy. There are just functions in NumPy that also exist in PyTorch. I found this GitHub repo that shows the translation.
Also looking through this, I am using
shape() wrong. Reading the shape needs to happen on one tensor, not many. So I revised my code back in the
grad() function in
__init__.py to be:
I ran the tests and it worked! No breaking changes. Now, a test to confirm the error was thrown. Looking through the
test folder and there was a
test_autograd.py file with a
test_grad function. Once again, it was straightforward.
Before I nailed down what my tests should look like, I messed around by adding some print statements. Print statements are your best friend in Python. Printing random variables can also just give you more insight into what tensors actually look like and how they compare programmatically to gradient tensors. It is also good in general to read the other tests to learn how objects are created and tested on within this project. Ultimately, I ended up with:
Ran the tests, and nothing failed. I linted using flake8 and pushed my changes to GitHub. Then I created the PR, made sure it had the issue tagged and it had similar labels to the issue, and submitted! From there, I was fortunate to get a review pretty quickly. The commenters were very explicit in their suggestions, which made them easy to implement.
BAM. First PR done. Thankfully it was all in Python because WHO DARE edit C++ code? I. I dare. Next issue.
At first, I didn’t realize that this issue was on the C++ API. Reading it though (again, kudos to the people who write these issues), it was very detailed. Don’t be turned off by discussions happening on the issues, you can also gain a lot of information from them. Make sure before taking an issue that the discussion has had a resolution. This issue was with something that is returned. The other was with something that was inputted. Input is “front-end”, AKA Python, output is “back-end”, AKA C++.
Once again, created a new branch, and went for it. That’s what a lot of programming is… just going for it and hoping for the best. The first thing I wanted to learn is what the
flatten() function actually does. Time for some good ol’ print statements.
I navigated back to the
test folder did a
git grep on
flatten(. Fortunately after looking through everything there was a
test_flatten function in the
test_torch.py file. Stepping in, I read the assertions, got a sense of what to look for, and dumped print statements:
Here was my output:
What came back basically verified the issue. A 0-dim tensor, after being flattened, should be equivalent to a 1-dim tensor. That equivalence, according to these print statements, is determined by the
torch.Size(). A 0-dim tensor flattened is the same as a 1-dim tensor flattened.
Now to find it in the code. I went to GitHub and searched for instances of “flatten” within the repo. In Python, they had
Flatten modules. This was my first indication that I would have to do some C++ programming, typically you don’t want to edit a module, and there is some logic that will happen behind the scenes. There is also something called
caffe2. You can look up Caffe2, but it basically says that its deprecated and a part of PyTorch, which signals to me that I probably shouldn’t edit any of the Caffe2 code.
I ended up on the file
tensor_flatten.cpp. I thought that this had to be it: it was C++, it looked scary, and I couldn’t read it without putting my hands on my face and exhaling loudly. I probably stared at this file for an hour and went through many more cycles of imposter syndrome before asking for help.
I do not recommend this as a coping strategy. If I wasn’t in the office, I would have just commented on the issue asking if this was a good place to start. But fortunately for me, I had some nice trusty engineers sitting a few feet away from me who told me “heck no, get out of there”. Thank goodness.
After some direction, I was told that this was actually something that lived in
aten. AH! Yes! Because ATen was the tensor and mathematical library and flatten does something ~ mathematical ~. Sure. I didn’t get it right away either. But if you don’t get help right away and are at a loss for where to start, search the tech docs for PyTorch. This indicated to me that I needed to find a function named
flatten with the parameters
(input, start_dum, end_dim), returning a
Tensor. Based upon this definition, it was clear I was looking in the wrong place.
I decided to
git grep at the top level for
Tensor flatten(. This directed me to the appropriate place under
TensorShape.cpp. Wow! Readable code! This definitely feels right. I read through the code, and for complete understanding, I made sure to step into some of the functions called within the logic.
How about that, this code was also straightforward: I just needed to mimic what
flatten does for when there was a tensor of 1-dimension, versus just returning itself:
I updated my tests to include the appropriate results, and WOW! This also worked right out of the box. Linted, made a PR, got suggestions and holy heck I did it. I contributed to PyTorch.
What have I learned
It has been a while since I have contributed to open source, so I needed to remind myself of a few things:
- Don’t be afraid of the code. Smart people wrote this stuff, and (especially for a widely used project) a lot of it makes a lot of sense. It’s good to look under the hood of a project, even if you aren’t planning on contributing, to see how it’s made.
- Don’t be afraid to ask for help. The open source community was made for mind-sharing and thought leadership. Chances are you will find people excited about your involvement and willing to give you guidance.
- You are not an imposter. You don’t have to understand something completely, you just have to go for it and be willing to learn. With that mentality, you are adding value to the project.
- Read the documentation. Look to READMEs, tech docs, and forums for guidance.
- Just go for it. Even if you aren’t sure whether or not something will work, fully solves the problem, or offers a well-rounded feature, you can still try!
Now I leave you by saying, best of luck with your contributions, whether it is to PyTorch or elsewhere!
Aaaaaaand also some common errors. Cheers!
Import not found, or some import not connecting
- Make sure the import exists
- Make sure
- Rebuild PyTorch
NumPy functions don’t work
- Check to see that there is a PyTorch equivalent to what you are looking for first
- If there isn’t, create an issue to get your desired functionality into PyTorch! You can even try to build it yourself!
Not all the tests on my PR are passing
- That is okay. Breathe. You are smart.
- Sometimes the errors that appear aren’t your fault. Check the logs and see if anything comes up that seems to be from you. If there is, revisit your code.
- You don’t merge in your own PRs, someone from the PyTorch team does. They will take a look at it and probably let you know because they are nice.
Wow you read this far. Thank you so much. Nothing else to do but throw in one more gif and ask you to follow me on Twitter (@cwillycs).
Bio: Cami Williams currently works as an Open Source Evangelist at Facebook in Seattle focused on Machine Learning. She has spoken on behalf of the technical community at various conferences and workshops, including ReInvent, the Grace Hopper Celebration for Women in Computing, Consumer Electronics Show, and Game Developers Conference. Cami is proud to be an advocate for diversity and inclusion in the tech industry. During her free time, Cami loves to play board games with friends, build mechanical keyboards, and quote the Office.
Original. Reposted with permission.
- A Gentle Introduction to PyTorch 1.2
- 2018 Year-in-Review: Machine Learning Open Source Projects & Frameworks
- What’s the Best Data Strategy for Enterprises: Build, buy, partner or acquire?