Covering Scientific & Technical AI | Wednesday, January 22, 2025

Cohere Introduces Command R Fine-Tuning 

May 10, 2024 -- Cohere this week announced the introduction of fine-tuning for Command R. In this recent blog post, the Cohere team explores the transformative capabilities of fine-tuning Command R, providing a comprehensive overview of its benefits for enterprise applications. The update highlights the model's enhanced performance metrics, cost-efficiency, and its ability to be finely tailored to meet specific business needs.


Today, Cohere is announcing the availability of fine-tuning for Command R, the smaller edition of our R series of models. Command R is a scalable, frontier large language model (LLM) aimed at enterprise-grade workloads. With this move, we continue to focus on delivering technology that makes a real-world impact on enterprises looking to tap the enormous potential of AI.

We’ve seen high demand from enterprises for customized models that enhance performance in priority applications. Fine-tuning allows enterprises to incorporate company-specific language and documents into their models, making them highly targeted for an enterprise’s unique needs. Organizations can now conduct supervised fine-tuning on Command R to further enhance performance and deliver powerful results at a small fraction of the cost compared to larger models on the market. Customers can adjust up to five hyperparameters, efficiently optimizing model performance while remaining up to 15x more affordable than other industry-leading models.

Command R, like our larger, more powerful Command R+ model, is optimized for long context tasks, such as advanced retrieval-augmented generation (RAG) to mitigate hallucinations with citations, tool use to automate complex business workflows, and multilingual support for 10 languages to easily integrate into global business operations. The model balances high efficiency with strong accuracy to enable businesses to move beyond proof-of-concept and into production with AI.

Proven Impact in Enterprise Use Cases

We evaluated fine-tuning for Command R across multiple enterprise use cases, including summarization, and research and analysis, in information-heavy industries like financial services and scientific research. We found that, across key performance metrics that matter most for businesses, Command R with fine-tuning consistently outperforms larger models, while costing substantially less. These capabilities make a real difference in the daily work of employees across a range of sectors, including financial services, technology, retail, healthcare, legal, HR, and more.

We compared our smaller Command R model to other models in a larger weight class to showcase the significant performance upgrades achieved with fine-tuning. In fact, Command R with fine-tuning demonstrated improved performance by over 20% on average compared to the baseline model in our evaluations. Fine-tuning our Command R model allows companies to unlock even better results than the industry’s largest models, while seeing significant improvements across a range of factors, such as throughput, latency, and cost that are important for enterprise customers.

Summarization 

LLMs can help enterprises streamline workflows by summarizing large amounts of text, from lengthy documents to long meeting transcripts. To test these capabilities, we fine-tuned Command R on a dataset consisting of 1,200 meeting summary training samples from both human- and AI-generated content. We then measured the ability of Command R to summarize meeting transcripts while adhering to tonal, stylistic, length, and format specifications versus larger, more expensive models on the market.

Performance comparison of Command R with fine-tuning, GPT-4 (0613), GPT-4 Turbo (gpt-4-turbo-2024-04-09), and Claude 3 Opus in summarizing meeting transcripts, showing pass rates.

Research & Analysis

When LLMs analyze large amounts of internal or external information, accuracy and reliability are critical for industries like financial services, pharmaceuticals, and biotechnology. After fine-tuning Command R on the ConvFinQA dataset, a popular benchmark for testing a model’s ability to process and answer complex financial queries accurately, we found that it outperformed other industry-leading models.

Command R also saw top scores when fine-tuned and tested on text-only questions from the ScienceQA dataset, a benchmark for evaluating a model’s ability to answer a wide range of multiple-choice science questions.

Command R also saw top scores when fine-tuned and tested on text-only questions from the ScienceQA dataset, a benchmark for evaluating a model’s ability to answer a wide range of multiple-choice science questions.

Performance comparison of Command R with fine-tuning, GPT-4 (0613), GPT-4 Turbo (gpt-4-turbo-2024-04-09), and Claude 3 Opus on ScienceQA, showing accuracy scores in answering text-only science questions.

Industry-Leading Efficiency and Affordability

Given its smaller size, a fine-tuned Command R model can be served more efficiently compared to significantly larger models that have a higher inference footprint and are thus more expensive to host and serve. This efficiency also unlocks a better user experience by delivering a significantly faster time to first token, as well as higher generated tokens throughput.

The combination of these performance metrics with the price point for Command R makes it a compelling option for many enterprise use cases compared to larger and more expensive models.

Latency and throughput comparison between a fine-tuned Command R deployed on a p4d.24xlarge Amazon SageMaker instance as compared to reported numbers for GPT-4 (0613), GPT-4 Turbo (gpt-4-turbo-2024-04-09), and Claude 3 Opus for the same profile, demonstrating an almost 5x shorter time to first token and almost twice the token throughput per stream.

Availability

Command R fine-tuning is immediately available for businesses and developers today.

You can access fine-tuning for Command R on the Cohere platform (visit the Cohere Dashboard, our fine-tuning API, or Python SDK integration), and Amazon SageMaker, and it will be coming to additional platforms in the near future.

To understand how your company can deploy fine-tuned Command R models for your specific use cases at production scale, reach out to our sales team. You can also dive into how to fine-tune Command R using Amazon SageMaker with our notebook.


Source: Cohere

AIwire