Contributing to PyTorch: By someone who doesn’t know a ton about PyTorch

By the end of my week with the team, I managed to proudly cut two PRs on GitHub. I decided that I would write a blog post to knowledge share, not just to show that YES, you can too.

By Cami Williams, Open Source Developer Advocate Lead at Facebook

Hello, friends. My name is Cami. Welcome to my mind dump.

For those of you who don’t know me, I am an ex-software engineer turned developer advocate. I recently started a new role at Facebook as the pillar lead Developer Advocate for ML/AI (I know, I am fancy). For those of you who don’t care about titles, basically I have joined FB to help with the PyTorch developer community.

At this point, it is worthwhile saying that up until now I have never used PyTorch. Ever. In my life. Sure I had heard about it, watched some videos about it, but experimenting? No. Reading the source code? Nah. Contributing to it? You already know the answer.

That is, until .✫*゚・゚。.☆.*。・゚✫* LAST WEEK .✫*゚・゚。.☆.*。・゚✫*.

I got the privilege to go sit with some of the engineers and researchers working on PyTorch to help me get ramped up. Beforehand, I had started the PyTorch Udacity class online. By this point I had only made it through lesson 2, which gives you a nice overall understanding of neural networks. By the end of my week with the team, I managed to proudly cut two PRs on GitHub. I decided that I would write a blog post to knowledge share, not just to show that YES, you can too, but also so that I wouldn’t forget everything that I have learned over the Labor Day weekend.


Things to know about PyTorch

One of the first meetings I had with the eng team was to give me a brief overview of PyTorch and how it works. What resulted was a beautiful whiteboard:


Ahh yes. Scribbles.


By this figure you can have a clear understanding of PyTorch. Now let’s move onto the code…

Just kidding. You are probably wondering what all the circles, buzzwords, and that random AWS means. Allow me to give you a brief overview based upon my understanding from that:

  • PyTorch is a deep learning framework for fast, flexible experimentation. It’s a Python-based scientific computing package living on top of a C++ backend API.
  • The Python “front-end” of PyTorch has three distinct sections:

Torch: A package containing data structures for multi-dimensional tensors and mathematical operations.

Torch.nn: Creating and training neural networks. The data inputted into these modules are passed in the form of Tensors. For example: when training a “convolutional neural network” for images (ok if you don’t know what that is!) you can use the “nn.conv2D” module (I got this example from my coworker Seth)

Torch.optim: optimization algorithms used for training the neural networks. For example: algorithms such as SGD — for beginners — or more advanced ones such as Adam used for training (another great example from good ol’ Seth)

  • Tensors in PyTorch are similar to NumPy arrays. The thing with tensors though is they can range from being a single number, to a 1-D matrix (vector), to a X-D dimensional structure that is full of crazy data.
  • Taking the gradient of something is taking the derivative of something (that’s right kids, machine learning is calculus). You take the gradient of a tensor to help you figure out what you need to do to minimize error.
  • Gradient descent is an algorithm that allows us to minimize error efficiently. The error is determined by our data. We have data that is properly classified and improperly classified. We take the gradient to decrease the number of improperly classified items.
  • The most important thing to know about tensors is that they keep track of gradients automatically. The data within a tensor represents connecting edges in a neural network. Tensors have 3 properties worth mentioning:

rank: Identifies the number of dimensions of the tensor. Ex: a vector has a rank of 1.
shape: The number of rows and columns it has. Comes back as a tensor.Size([X]).
type: Data type assigned to the tensor’s elements.

  • The training loop for data for your neural network in PyTorch starts with calling a function defining the data, differentiating it against the other data, and then applying it to the neural network, performing a gradient operation to minimize error, and then applying any outlier parameters to the neural network. Repeat for your entire dataset you want to train in your neural network.
  • The C++ “backend” has five distinct sections:

Autograd: records operations on tensors to form an autograd graph. Basically calculates derivatives performed on a gradient-enabled tensor.

ATen: The tensor and mathematical operation library

TorchScript: An interface to the TorchScript JIT compiler and interpreter (JIT = “Just in time”)

C++ Frontend: High level constructs for training and evaluation of machine learning models;

C++ Extensions: A means of extending the Python API with custom C++ and CUDA routines.

  • CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by Nvidia. Okay, I got that part from Wikipedia, but basically CUDA is used to do things on the GPU.
  • A GPU is a graphics processing unit. Think of it as just a big computer we can compute big things on.

“That wasn’t brief, Cami, my comprehension tubes are clogged.” I know, I am sorry, but trust me reading that will help. Here is an obligatory I am learning something gif.

If you want to get a really deep, well-rounded understanding of everything, take the Udacity course I mentioned before, and read through the PyTorch docs. Both do a great job of explaining things.

Also, this post was not intended to help you understand the ins and outs of PyTorch. I am sharing this to help you get started contributing to the PyTorch open source repo on GitHub.

Within this repo, you can already deduce what a lot of the folders contain based upon the information above. There is also a great README that gives more information on neural networks, tensors, etc.


Setting up your development environment

To get started, read Before you contribute anything, read this. Read it. Do not skim. I skimmed it at first which lead to many headaches and tears.

If you can, that whether you are working on a Python or C++ issue, you should work on a GPU. There are other avenues you can go down (ex: AWS DLAMI or Nvidia Docker containers) too. These will help you to run things a lot faster. My first rookie mistake when I tried contributing was that I did everything locally on my computer. This resulted in many segfaults and errors that were confusing but ultimately a result from not having a powerful enough machine. I ended up SSH/MOSH/TMUX into a GPU I reserved and it worked really nicely. If you want help on how to set yours up, check out a couple of these posts:

Now let’s dive into setting up your environment for PyTorch. When you have SSHed into your GPU, you need to do a couple housekeeping items:

  • Link your GitHub account. Be sure to create an SSH key on your GPU and add it to your GitHub account.
  • There is a good chance you will already have Python on your machine. Make sure it is Python 3. You probably already also have pip. If not, install both.
  • Install Miniconda. It is a lot like pip. Their website doesn’t do a great job on documenting how to install it from your terminal, so this is what I did:
$ with-proxy wget “"$ chmod +x$ bash

  • Restart the terminal and check that this worked by typing conda.
  • Run conda config — set auto_activate_base false to prevent Conda from automatically activating (this can mess up other work on your machine).
  • Run conda activate to activate Conda. Before you start developing, always check that conda is activated. It will say (base)at the beginning of your terminal line.
  • Install ccache. If these steps don’t work, try doing conda install ccachewhen conda is enabled. If that doesn’t work, try wget "" .
  • Check that this worked by typing ccache and ensure it appears in your install list by typing conda list.
  • Install a faster linker. Don’t do the ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld command until you need it. I ran into issues and ultimately deleted these files because I didn’t even need them.
  • Run pip install ghstack or conda install ghstack.
  • Run pip install ninja or conda install ninja.
  • Set up flake8 which will be your linter by doing pip install flake8 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi mccabe pycodestyle pyflakes. When you want to lint a file, do flake8 <FILENAME>.
  • Set up an IDE that allows you to SSH into your GPU. I use Atom. See remote-ssh for instructions.
  • AND FINALLY, set up PyTorch:
$ git clone$ cd pytorch$ git pull --rebase$ git submodule sync — recursive$ git submodule update — init — recursive$ python develop

python develop is what you use to build your PyTorch code. It takes a while. After you build it once, there are ways to make this run faster. You have to build your PyTorch code whenever you edit C++, but not when you edit Python.

IT TAKES A LOT OF GUSTO TO GET YOUR ENVIRONMENT SET UP PROPERLY. If you have done it, pat yourself on the back and get some ice cream (sorry lactose-intolerant friends, try this leaf instead: ????).

If you have issues, never fear, I had many. Check out “Common Errors” down below or comment on this post and I can try to help. This took me a lot of time to finally understand and get set up, so I truly wish you the best.


Time to Contribute

Now that you have set up your environment, you can start contributing. If you haven’t ever contributed to Open Source before, you should skim the Code of Conduct. A lot of repos have them, in short be a good person and contribute good things.

Taking a look at the Issues tab in GitHub. These are mostly community-member filed issues. The label bootcamp to see beginner-level issues you can work on. I recommend before starting on an issue to just read through some of these to get a sense of where certain things live. You can also update filters to look at closed bootcamp issues. If you are feeling ambitious, you can also take a look at the Pull Requests tab. Not all PRs have related issues to give more insight, so I find it easier to get the lay of the land by looking at issues.

When I first looked through the issues, the imposter syndrome immediately crept in. “I don’t have a PhD in ML… I can’t do this.” Ssshh, of course you can.

You don’t need to know everything about everything to help out. All you need is direction.

If you find an issue you want to work on, don’t be shy about commenting with questions. Even just a, “I want to take this, do you have a recommendation on where to begin?” The reviewers on the issues are there to help, and will be elated that someone has an interest in solving the issue. I did this in-person, and was directed to a couple of issues that gave me a good foundation.

Let’s talk about how I solved issues without knowing what I was solving for!


Issue #17893: Check that outputs and grad_outputs have same shape in torch.autograd.grad()

Fortunately for all of us, the people who submit issues to the PyTorch repo tend to be really thorough and know what they are talking about from previous experience. Looking at this one in particular, with the background knowledge I gained from the whiteboarding session, I knew that this lived somewhere in the Python front-end since it was an issue with accepting input. It was a little sneaky at first, because it mentioned autograd, which we know is on the C++ side.

I created a new branch, and navigated to the torch folder in the top-level directory since, in the example, the function in question being called by torch. Wouldn’t you know: one of the first folders is labeled autograd. Things are intuitively named? Huzzah!

Stepping into the autograd folder, I used git grep on the phrase grad( to try and find the function definition. Sure enough, one of the first hits was in the file.
Within the grad function, there were also *gasp* intuitively named variables! Could it be that all I had to do was a simple condition? Let’s try it. I knew that the objects outputs and grad_outputs had to have the same shape. I also knew that with NumPy there was a way to easily compare the shapes.

import numpy as np...if np.shape(outputs) != np.shape(grad_outputs):
    raise RuntimeError("grad_outputs and outputs do not have the same shape")

Boom. Now, as a good software engineer, I know that my PR would surefire be accepted if I also had a test for the new code. But first I wanted to assure that this wasn’t breaking anything. Within the file, it says to run python test/

Ran it aaaaand, it failed. Why? Well actually PyTorch doesn’t use NumPy, it RECREATES NumPy. There are just functions in NumPy that also exist in PyTorch. I found this GitHub repo that shows the translation.

Also looking through this, I am using shape() wrong. Reading the shape needs to happen on one tensor, not many. So I revised my code back in the grad() function in to be:

for out, grad in zip(outputs, grad_outputs):
   if not out.shape == grad.shape:
     raise RuntimeError("grad_outputs and outputs do not have the same shape")

I ran the tests and it worked! No breaking changes. Now, a test to confirm the error was thrown. Looking through the test folder and there was a file with a test_grad function. Once again, it was straightforward.

Before I nailed down what my tests should look like, I messed around by adding some print statements. Print statements are your best friend in Python. Printing random variables can also just give you more insight into what tensors actually look like and how they compare programmatically to gradient tensors. It is also good in general to read the other tests to learn how objects are created and tested on within this project. Ultimately, I ended up with:

grad_out = torch.ones(2)try:
      outputs=[grad_sum], grad_outputs=[grad_out],
      inputs=[x], create_graph=True)
except RuntimeError as error:
   self.assertEqual(str(error), "grad_outputs and outputs do not have the same shape")

Ran the tests, and nothing failed. I linted using flake8 and pushed my changes to GitHub. Then I created the PR, made sure it had the issue tagged and it had similar labels to the issue, and submitted! From there, I was fortunate to get a review pretty quickly. The commenters were very explicit in their suggestions, which made them easy to implement.

BAM. First PR done. Thankfully it was all in Python because WHO DARE edit C++ code? I. I dare. Next issue.


Issue #22963: Torch.flatten() returns a 0-dim (not 1-dim) tensor for 0-dim tensors

At first, I didn’t realize that this issue was on the C++ API. Reading it though (again, kudos to the people who write these issues), it was very detailed. Don’t be turned off by discussions happening on the issues, you can also gain a lot of information from them. Make sure before taking an issue that the discussion has had a resolution. This issue was with something that is returned. The other was with something that was inputted. Input is “front-end”, AKA Python, output is “back-end”, AKA C++.

Once again, created a new branch, and went for it. That’s what a lot of programming is… just going for it and hoping for the best. The first thing I wanted to learn is what the flatten() function actually does. Time for some good ol’ print statements.

I navigated back to the test folder did a git grep on flatten(. Fortunately after looking through everything there was a test_flatten function in the file. Stepping in, I read the assertions, got a sense of what to look for, and dumped print statements:

# Test that flatten returns 1-dim tensor when given a 0-dim tensor
zero_dim_tensor = torch.tensor(123)
one_dim_tensor = torch.tensor([123])
cool_dim_tensor = torch.tensor([1,2,3])
flat0 = zero_dim_tensor.flatten()
flat1 = one_dim_tensor.flatten()
flat2 = cool_dim_tensor.flatten()print("--zero dim tensor--")
print("--one dim tensor--")
print("--cool dim tensor--")

Here was my output:

--zero dim tensor--
--one dim tensor--
—cool dim tensor--
tensor([1, 2, 3]) 
tensor([1, 2, 3])

What came back basically verified the issue. A 0-dim tensor, after being flattened, should be equivalent to a 1-dim tensor. That equivalence, according to these print statements, is determined by the shape being torch.Size([1]). A 0-dim tensor flattened is the same as a 1-dim tensor flattened.

Now to find it in the code. I went to GitHub and searched for instances of “flatten” within the repo. In Python, they had Flatten modules. This was my first indication that I would have to do some C++ programming, typically you don’t want to edit a module, and there is some logic that will happen behind the scenes. There is also something called caffe2. You can look up Caffe2, but it basically says that its deprecated and a part of PyTorch, which signals to me that I probably shouldn’t edit any of the Caffe2 code.

I ended up on the file tensor_flatten.cpp. I thought that this had to be it: it was C++, it looked scary, and I couldn’t read it without putting my hands on my face and exhaling loudly. I probably stared at this file for an hour and went through many more cycles of imposter syndrome before asking for help.

I do not recommend this as a coping strategy. If I wasn’t in the office, I would have just commented on the issue asking if this was a good place to start. But fortunately for me, I had some nice trusty engineers sitting a few feet away from me who told me “heck no, get out of there”. Thank goodness.

After some direction, I was told that this was actually something that lived in aten. AH! Yes! Because ATen was the tensor and mathematical library and flatten does something ~ mathematical ~. Sure. I didn’t get it right away either. But if you don’t get help right away and are at a loss for where to start, search the tech docs for PyTorch. This indicated to me that I needed to find a function named flatten with the parameters (input, start_dum, end_dim), returning a Tensor. Based upon this definition, it was clear I was looking in the wrong place.

I decided to git grep at the top level for Tensor flatten(. This directed me to the appropriate place under atnTensorShape.cpp. Wow! Readable code! This definitely feels right. I read through the code, and for complete understanding, I made sure to step into some of the functions called within the logic.

How about that, this code was also straightforward: I just needed to mimic what flatten does for when there was a tensor of 1-dimension, versus just returning itself:

if (self.dim() == 0) {
    return self.reshape(shape);

I updated my tests to include the appropriate results, and WOW! This also worked right out of the box. Linted, made a PR, got suggestions and holy heck I did it. I contributed to PyTorch.


What have I learned

It has been a while since I have contributed to open source, so I needed to remind myself of a few things:

  1. Don’t be afraid of the code. Smart people wrote this stuff, and (especially for a widely used project) a lot of it makes a lot of sense. It’s good to look under the hood of a project, even if you aren’t planning on contributing, to see how it’s made.
  2. Don’t be afraid to ask for help. The open source community was made for mind-sharing and thought leadership. Chances are you will find people excited about your involvement and willing to give you guidance.
  3. You are not an imposter. You don’t have to understand something completely, you just have to go for it and be willing to learn. With that mentality, you are adding value to the project.
  4. Read the documentation. Look to READMEs, tech docs, and forums for guidance.
  5. Just go for it. Even if you aren’t sure whether or not something will work, fully solves the problem, or offers a well-rounded feature, you can still try!

Now I leave you by saying, best of luck with your contributions, whether it is to PyTorch or elsewhere!

Aaaaaaand also some common errors. Cheers!


Common Errors

Import not found, or some import not connecting

  1. Make sure the import exists
  2. Make sure conda is activated
  3. Rebuild PyTorch

NumPy functions don’t work

  1. Check to see that there is a PyTorch equivalent to what you are looking for first
  2. If there isn’t, create an issue to get your desired functionality into PyTorch! You can even try to build it yourself!

Not all the tests on my PR are passing

  1. That is okay. Breathe. You are smart.
  2. Sometimes the errors that appear aren’t your fault. Check the logs and see if anything comes up that seems to be from you. If there is, revisit your code.
  3. You don’t merge in your own PRs, someone from the PyTorch team does. They will take a look at it and probably let you know because they are nice.

</Growing List>

Wow you read this far. Thank you so much. Nothing else to do but throw in one more gif and ask you to follow me on Twitter (@cwillycs).

Bio: Cami Williams currently works as an Open Source Evangelist at Facebook in Seattle focused on Machine Learning. She has spoken on behalf of the technical community at various conferences and workshops, including ReInvent, the Grace Hopper Celebration for Women in Computing, Consumer Electronics Show, and Game Developers Conference. Cami is proud to be an advocate for diversity and inclusion in the tech industry. During her free time, Cami loves to play board games with friends, build mechanical keyboards, and quote the Office.

Original. Reposted with permission.