Use (Almost) Any Language Model Locally with Ollama and Hugging Face Hub
You can now run any GGUF model from Hugging Face's model hub with Ollama using a single command. Learn how here.
Image source: Hugging Face
Ollama, an application built on llama.cpp, now offers easy integration with a huge vault of GGUF format language models hosted on Hugging Face. This new feature allows users to run any of the 45,000+ public GGUF checkpoints on their local machines using a single command, eliminating the need for any setup procedure whatsoever. The integration provides flexibility in model selection, quantization schemes, and customization options, making this arguably the easiest way to acquire and run language models on your local machine.
The new functionality extends beyond model compatibility, offering users the ability to fine-tune (pun intended) their interaction with these models. Custom quantization options allow for optimized performance based on available hardware, while user-defined chat templates and system prompts enable personalized conversational workflows. Additionally, the ability to adjust sampling parameters allows for granular control over model output. This combination of accessibility and customization empowers users to leverage state-of-the-art language models locally, and makes AI-driven application development and research easier than ever.
Getting started is as easy as this:
# Run Ollama with specified model
# ollama run hf.co/{username}/{repository}
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF
# Run Ollama with specified model and desired quantization
# ollama run hf.co/{username}/{repository}:{quantization}
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:IQ3_M
That's it. After this, chat with the model at the command line or create your own programs that leverage the locally-running models.
Find out more here, then get started with the fantastic development right away.
Matthew Mayo (@mattmayo13) holds a master's degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.