LLaMA, or Large Language Model Meta AI, is not to be confused with its furry namesake! Rather than an animal, LLaMA is a new language model developed by a team of researchers at Meta. Its goal is to democratize access to language models and make them easier for people to utilize for a wide range of applications, such as chatbots.
But... What is a language model?
A language model is a statistical model that models the distribution of sequences of words, or more generally sequences of symbols such as letters, phonemes or words, in a natural language (e.g. French, English, Russian). It can, for example, predict the next word in a sequence of words. If the word "electrical" is provided to a language model, it could predict words such as "car" or "bike", based on the frequently observed associations in the data on which it was trained.
Large language models (LLMs) are typically trained on vast datasets sourced from a variety of sources, such as books, newspaper articles, web pages, and social media. This is done in order to capture the richness and diversity of the language and its usage.
What sets LLaMA apart from models like GPT-3 ?
Models like GPT-3 and Chinchilla are extremely expensive in terms of computing power. They require significant hardware and energy resources to be trained and deployed, limiting their accessibility for researchers, developers, and businesses with limited budgets. LLaMA's goal is to democratize access to advanced language models.
It can run on computers with a single GPU (Graphical Processing Unit). Some port of Facebook's LLaMA model even work on a computer with a simple M1 chip. (Github)
LLaMA is based on the transformer architecture but the particularity of this new model is that it works with very few hyperparameters compared to its competitors.
Hyperparameters are configurable parameters that determine the structure and behavior of a machine learning model during training. They influence the performance and complexity of the model.
LLaMA uses "only" 65 billion parameters (there are also versions with 33B, 13B, and 7B parameters). This is much less than GPT-3, which can use nearly 175 billion parameters, requiring over 700GB of GPU memory (175 x 4, each parameter requiring 4 bytes in FP32, using 32 bits to represent each real number).
According to the research article resulting from LLaMA, the LLaMA-13B model would even surpass Chat-GPT3 while being ten times more compact, while LLaMA-65B would be competitive against models like Chinchilla-70B and PaLM-540B.
In addition, unlike other models that rely on proprietary datasets, LLaMA is based on sources accessible to everyone (Web page crawling, public datasets, Wikipedia, Github, stack exchange forum, etc...). This approach helps promote transparency, accessibility, and demonstrates that it is possible to achieve cutting-edge performance in the field of language models by relying on open and available data.
Conclusion
The arrival of LLaMA may not have an immediate impact on the general public, as was the case with ChatGPT. It still highly technical and mainly reserved for research work.
However, Meta is positioning itself in the race for AI, and reminds everyone that they are also in the game. Their researchers demonstrated that it is possible to achieve competitive performance by relying solely on publicly available data, without the need for astronomical computing power. The publication of these models should promote the democratization of large language models and associated tools, while also contributing to improving their robustness and reducing associated problems such as toxicity and bias.
References
This article was originally written in french for a university project I had with the goal of defining some applications in the field of artificial intelligence for high school students who may not have a specialized background in the field. It was licensed under a CC BY 4.0 license © 2023 .
LLaMA: Open and Efficient Foundation Language Models (https://arxiv.org/abs/2302.13971), 27 Feb 2023, Meta AI
Introducing LLaMA: A foundational, 65-billion-parameter large language model
Github Repository: https://github.com/facebookresearch/llama