Covering Scientific & Technical AI | Sunday, December 22, 2024

Galileo Introduces First-of-its-Kind Evaluation Foundation Models for Enterprise GenAI 

SAN FRANCISCO, June 6, 2024 -- Galileo, a leader in developing generative AI for the enterprise, today announced the release of Galileo Luna, a first-of-its-kind suite of Evaluation Foundation Models (EFMs) designed to transform how generative AI evaluations are conducted.

This novel approach is faster, more cost-effective, and more accurate than existing evaluation methods such as askGPT and human "vibe checks." With Galileo Luna, enterprises can finally bring trustworthy AI solutions to market faster and at production scale.

"For genAI to achieve mass adoption, it's crucial that enterprises can evaluate hundreds of thousands of AI responses for hallucinations, toxicity, security risk, and more, in real time," said Vikram Chatterji, Co-Founder and CEO of Galileo. "In speaking with customers, we found that existing approaches, such as human evaluation or LLM-based evaluation, were too expensive and slow, so we set out to solve that. With Galileo Luna, we're setting new benchmarks for speed, accuracy, and cost efficiency. Luna can evaluate millions of responses per month 97% cheaper, 11x faster, and 18% more accurately than evaluating using OpenAI GPT3.5."

Core to Luna's innovation is the creation of EFMs, which are the first models purpose built for generative AI evaluation. Each of these models has been fine-tuned to solve specific evaluation tasks, such as detecting hallucinations, context quality, data leakage, and malicious prompts. By creating smaller purpose built EFMs, Luna is able to conduct evaluations with never-before-seen accuracy, speed, and cost-efficiency.

Key Innovations and Features of Galileo Luna:

  • Evaluation Accuracy: Exceeding all existing evaluation models, including Galileo's own Chainpoll, Luna's EFMs lead the industry in detecting hallucinations, prompt injections, PII, and more, outperforming previous methods by up to 20%.
  • Ultra Low-Cost Operations: Proven to be 30x cheaper than conventional methods, such as OpenAI's GPT 3.5.
  • Millisecond Speed: Designed for real-time applications, evaluations are completed in milliseconds, essential for real-time applications like chatbots and AI monitoring systems.
  • No Ground Truth Required: Unlike other evaluation methods that depend on extensive and costly test sets, Luna eliminates the need for ground truth data, facilitating faster deployment and scalability.
  • Unmatched Customizability: Each Luna model can be quickly fine-tuned to meet specific customer needs, providing tailored solutions that achieve over 95% accuracy in critical applications.

"Evaluations are absolutely essential to delivering safe, reliable, production-grade AI products," said Alex Klug, Head of Product, Data Science & AI at HP. "Until now, existing evaluation methods, such as human evaluations or using LLMs as a judge, have been very costly and slow. With Luna, Galileo is overcoming enterprise teams' biggest evaluation hurdles – cost, latency, and accuracy. This is a game changer for the industry."

For more information, read the latest Galileo blog, or watch the June 18th webinar on Luna.

About Galileo

San Francisco-based Galileo is the leading platform for enterprise GenAI evaluation. The Galileo platform, powered by Evaluation Foundation Models (EFMs), supports AI teams across the development lifecycle, from building and iterating to monitoring and protection. Galileo is used by AI teams from startups to Fortune 100 companies. Visit rungalileo.io to learn more about the Galileo suite of products.


Source: Galileo

AIwire