StarCoder: The Coding Assistant That You Always Wanted

Let advanced AI take care of code completion, formatting, translation, and bug fixing. You can also chat with a StarChat and use VSCode extensions for work.

StarCoder: The Coding Assistant That You Always Wanted
Image by Author


What is a StarCoder?


The StarCoder is a cutting-edge large language model designed specifically for code. With an impressive 15.5B parameters and an extended context length of 8K, it excels in infilling capabilities and facilitates fast large-batch inference through multi-query attention.

StarCoderBase was trained on a vast dataset of 1 trillion tokens derived from The Stack. This collection consists of permissively licensed GitHub repositories, accompanied by inspection tools and an opt-out process for privacy-conscious developers. To further enhance its performance, the BigCode team meticulously fine-tuned StarCoderBase using 35B Python tokens.

As a result, StarCoder emerges as a powerful and refined language model equipped to handle a wide range of coding tasks with remarkable proficiency.

StarCoder: The Coding Assistant That You Always Wanted
Image from StarCoder Paper


StarCoderBase surpasses all existing open-source code language models that offer support for multiple programming languages and demonstrates exceptional performance, even outshining the renowned OpenAI code-cushman-001 model in terms of quality and results. Moreover, StarCoder can be prompted to achieve 40% pass@1 on HumanEval. It outperforms LaMDA, LLaMA, and PaLM models. 

Read the research paper to learn more about model evaluation. 


StartCoder Code Completion


BigCode - StarCoder code completion playground is a great way to test the model's capabilities. You can play around with various model formats, prefixes, and fill-ins to get the full experience.

In my opinion, it is a great tool for code completion, especially for Python code. However, it does have some drawbacks, such as outdated APIs, hallucinations, displaying Jupyter Notebook metadata, and incomplete code.

The best way to generate code with StarCoder is to use well-explained comments. It will help the model to better understand what you are trying to do and generate more accurate results.


StarCoder: The Coding Assistant That You Always Wanted
Image from StartCoder Code Completion


StarChat Playground


If you are used to the ChatGPT style of generating code, then you should try StarChat to generate and optimize the code. 

StarChat is a specialized version of StarCoderBase that has been fine-tuned on the Dolly and OpenAssistant datasets, resulting in a truly invaluable coding assistant. It is a 16-billion parameter model that was pre-trained on one trillion tokens sourced from 80+ programming languages, GitHub issues, Git commits, and Jupyter notebooks.

You can provide the instruction to StarChat, and it will produce the code with the explanation. You can also use follow-up prompts to modify the code. 


StarCoder: The Coding Assistant That You Always Wanted
Image from StarChat Playground


HF Code Autocomplete


HF Code Autocomplete is a free and open-source alternative to GitHub Copilot that is powered by StarCoder. I have been using it since its launch and I am quite impressed with its speed and accuracy. 


StarCoder: The Coding Assistant That You Always Wanted
HF Code Autocomplete VSCode Extension


It works with Jupyter Notebook and all kinds of files in VSCode. You just have to install the extension from the marketplace and add the Hugging Face API. 


StarCoder: The Coding Assistant That You Always Wanted
Image by Author | VSCode




We are in constant need of advanced code assistants in our workplace, ones that can effectively handle repetitive scripts while assisting in the creation of more complex systems. 

In this blog, we have thoroughly explored StarCoder and its diverse range of applications. It is worth noting that the open-source community is tirelessly dedicated to pushing the boundaries of code assistance, constantly striving to deliver groundbreaking solutions that enhance our coding experience and productivity.

I hope you enjoyed reading this blog and found it informative and insightful. Follow me on LinkedIn if you want to know more about the latest AI technology.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.