Your Ultimate Guide to Chat GPT and Other Abbreviations

Everyone seems to have gone crazy about ChatGPT, which has become a cultural phenomenon. If you’re not on the ChatGPT train yet, this article might help you better understand the context and excitement around this innovation.

By Denis Shipilov, Solutions Architect at DataArt on June 15, 2023 in Artificial Intelligence

Your Ultimate Guide to Chat GPT and Other Abbreviations

What do all these abbreviations - ML, AI, AGI - mean?

ML (machine learning) is an approach to solving difficult computational problems – instead of coding using a programming language you build an algorithm that “learns” the solution from data samples.

AI (artificial intelligence) is a field of computer science dealing with problems (e.g., image classification, working with human language) that are difficult to solve using traditional programming. ML and AI go hand in hand, with ML being a tool to solve problems formulated in AI.

AGI (artificial general intelligence) - is the correct term for what popular culture usually implies by AI – the ability of computers to achieve human-like intellectual capabilities and broad reasoning. It is still the holy grail for researchers working in the AI field.

What is a Neural Network?

An artificial neural network (ANN) is a class of ML algorithms and data structures (or models for short) so called because it was inspired by the structure of biological neural tissue. But this doesn’t completely mimic all the biological mechanisms behind it. Rather, ANNs are complicated mathematical functions that are based on ideas from living species biology.

When I read “the model has 2 billion parameters” what does this mean?

Neural networks are layered structures consisting of uniform units interconnected with each other in a network. The way these units are interconnected is called architecture. Each connection has an associated number called weight and the weights store information the model learns from data. So, when you read “the model has 2 billion parameters,” it means that there are 2 billion connections (and weights) in the model, and it roughly designates the information capacity of the neural network.

What does Deep Learning mean?

Neural networks have been studied since the 1980s but made a real impact when the computer games industry introduced cheap personal supercomputers known as graphical processing units (GPUs). Researchers adapted this hardware for the neural network training process and achieved impressive results. One of the first deep learning architectures, the convolutional neural network (CNN), was able to carry out sophisticated image recognition that was difficult with classical computer vision algorithms. Since then, ML with neural networks has been rebranded as deep learning, with “deep” referring to the complicated NN architectures the networks are able to explore.

Where can I get some more details on how this tech works?

I’d recommend videos by Grant Sanderson available on his animated math channel.

What does the Large Language Model mean?

To work with human language using computers, language must be defined mathematically. This approach should be sufficiently generic to include the distinctive features of every language. In 2003 researchers discovered how to represent language with neural networks and called it the neural probabilistic language model or LM for short. This works like predictive text in a mobile phone – given some initial sequence of words (or tokens), the model can predict the next possible words with their respective probabilities. Continuing this process using previously generated words as input (this is autoregression) – the model can generate text in the language for which it was trained.

When I read about language models, I often encounter the term “transformer”. What is this?

Representing sequences of items was a challenging problem for neural networks. There were several attempts to solve the problem (mostly around variations of recurrent neural networks), which yielded some important ideas (e.g., word embedding, encoder-decoder architecture, and attention mechanism). In 2017 a group of Google researchers proposed a new NN architecture that they called a transformer. It combined all these ideas with effective practical implementation. It was designed to solve the language translation problem (hence the name) but proved to be efficient for capturing the statistical properties of any sequence data.

Why everyone talks about OpenAI?

OpenAI experimented with transformers to build a neural probabilistic language model. The results of their experiments are called GPT (generative pre-trained transformer) models. Pre-trained means they were training the transformer NN on a large body of texts mined on the Internet and then taking its decoder part for language representation and text generation. There were several generations of GPTs:

GPT-1: an initial experimental model to validate the approach
GPT-2: demonstrated ability to generate coherent human language texts and zero-shot learning – the ability to generalize to domains for which it was never specifically trained (e.g., language translation and text summarization, to name a few)
GPT-3 was a scale-up of the architecture (1.5 billion parameters of the GPT-2 vs 175 billion of the largest GPT-3) and was trained on a larger and more variate body of text. Its most important feature is the ability to produce texts in a wide range of domains by just seeing only a few examples in the prompt (hence the term few short learning) without any special fine-tuning or pre-training.
GPT-4: an even larger model (the exact characteristics are not disclosed), larger training datasets, and multimodality (text is augmented with image data).

Given the enormous number of parameters GPT models have (in fact, you need a huge computational cluster with hundreds to thousands of GPUs to train and serve these models), they were called Large Language Models (LLMs).

What’s the difference between GPT-3 and ChatGPT

The original GPT-3 is still a word prediction engine and thus is mostly of interest to AI researchers and computational linguists. Given some initial seed or prompt, it can generate text infinitely, which makes little practical sense. The OpenAI team continued to experiment with the model, trying to fine-tune it to treat prompts as instructions to execute. They fed in a large dataset of human-curated dialogues and invented a new approach (RLHF – reinforcement learning from human feedback) to significantly speed up this process with another neural network as a validator agent (typical in AI research). They released a model called InstructGPT as an MVP based on a smaller GPT-3 version and in November 2022 released a full-featured version called ChatGPT. With its simple chatbot and web UI, it changed the IT world.

What is the language model alignment problem?

Given that LLMs are just sophisticated statistical machines, the generation process could go in an unexpected and unpleasant direction. This type of result is sometimes called an AI hallucination, but from the algorithmic perspective, it is still valid, though unexpected, by human users.

Raw LLMs require treatment and additional fine-tuning with human validators and RLHF, as previously mentioned. This is to align LLMs with human expectations, and not surprisingly the process itself is called alignment. This is a long and tedious procedure with considerable human work involved; this could be considered LLM quality assurance. The alignment of the models is what distinguishes OpenAI/Microsoft ChatGPT and GPT-4 from their open-source counterparts.

Why there is a movement to stop the further development of language models?

Neural networks are black boxes (a huge array of numbers with some structure on top). There are some methods to explore and debug their internals but the exceptional generalization qualities of GPTs remain unexplained. This is the main reason behind the ban movement – some researchers think we are playing with fire (science fiction gives us fascinating scenarios of AGI birth and technological singularity) before we get a better understanding of the processes underlying LLMs.

What are the practical use cases of LLMs?

The most popular include:

Large text summarization
Vice versa - generating text from summary
Text styling (mimicking an author or character)
Using it as a personal tutor
Solving math/science exercises
Answering questions on the text
Generating programming code from short descriptions

Are the GPTs the only LLMs available now?

GPTs are the most mature models with API access provided by OpenAI and Microsoft Azure OpenAI services (if you need a private subscription). But this is the frontier of AI and many interesting things have happened since the release of ChatGPT. Google has built its PaLM-2 model; Meta open-sourced their LLaMA models for researchers, which spurred lots of tweaks and enhancements (e.g., Alpaca from Stanford) and optimization (now you can run LLMs on your laptop and even smartphone).

Huggingface provides BLOOM and StarCoder and HuggingChat – which are completely open source, without the LLaMA research-only limitation. Databricks trained their own completely open-source Dolly model. Lmsys.org is offering its own Vicuna LLM. Nvidia’s deep learning research team is developing its Megatron-LM model. The GPT4All initiative is also worth mentioning.

However, all these open-source alternatives are still behind OpenAI’s major tech (especially in the alignment perspective) but the gap is rapidly closing.

How can I use this technology?

The easiest way is to use OpenAI public service or their platform API playground, which offers lower-level access to the models and more control over network inner workings (specify system context, tune generation parameters, etc). But you should carefully review their service agreements since they use user interactions for additional model improvements and training. Alternatively, you can choose Microsoft Azure OpenAI services, which provide the same API and tools but with private model instances.

If you are more adventurous, you can try LLM models hosted by HuggingFace, but you’ll need to be more skilled with Python and data science tooling.

Denis Shipilov is experienced Solutions Architect with wide range of expertise from distributed systems design to the BigData and Data Science related projects.