Covering Scientific & Technical AI | Saturday, January 18, 2025

LLM Spotlight: Llama 3 

The Llama 3 large language model (LLM) is Meta’s latest and most advanced LLM to date. Succeeding Llama 1 and 2, this most recent model excels at understanding and generating human-like text across a wide range of tasks such as question answering, summarization, translation, and computing programming. Meta AI – the AI assistant built into Facebook, Messenger, Instagram, and WhatsApp – currently relies on Llama 3.

Meta has released four versions of Llama 3 so far:

  • Llama 3 8B
  • Llama 3 8B-Instruct
  • Llama 3 70B
  • Llama 3 70B-Instruct

The two 8B models have 8 billion parameters, while the 70B models work with 70 billion parameters. The Instruct models were fine-tuned to better follow human directions, and are therefore more suited to be used as a chatbot compared to the raw Llama model. Meta is also currently working on a 400 billion-parameter version of Llama 3 that the company hopes to make available later in 2024.

Llama 3 is trained using over 15 trillion tokens containing content from publicly available sources. This is seven times the number of tokens used to train Llama 2. Llama 3 is also using a new tokenizer with a 128,256 token vocabulary, which is an improvement over the previous 32,000 token vocabulary used to train previous models. This improvement allows Llama 3 to better handle long contexts up to 8,192 tokens.

Llama 3 also has a high level of language understanding, especially considering its parameter size. The Measure of Language Understanding (MMLU) metric is a benchmark to evaluate an LLM’s ability to understand language.

Llama 3 8B received a score of 66.6 MMLU, while Llama 3 70B received 79.5 MMLU. These numbers pale in comparison to GPT-4 Turbo’s score of 88.4. However, GPT-4 Turbo reportedly works with 1 trillion parameters. The upcoming Llama 3 400B achieved a score of 86.1, making it competitive with an LLM that has more than double the parameter size.

While Llama 3 is clearly a top contender in the world of LLMs, it does fall short in certain areas. Llama 3 only works with text and is currently unable to understand images, video, and audio. Additionally, Llama 3 is primarily focused on English, and Meta is still developing multilingual capabilities.

There is also some controversy concerning the open-source nature of Llama 3. On one hand, Meta has made the model weights, code, and some training data for Llama 3 publicly available. On the other hand, Llama 3’s licensing terms require companies with over 700 million monthly active users to obtain a separate commercial license from Meta to use Llama 3, and Meta can choose to grant or deny this license at its discretion. Many have argued that this restriction violates the open source definition set by the Open Source Initiative.

Despite certain drawbacks, Llama 3 represents a significant leap forward in language models and it will be interesting to watch as Meta evolves this model further.

AIwire