Anthropic Unveils Strategies For Testing And Mitigating Elections-Related Risks
Generative artificial intelligence (GenAI) has emerged as a transformative force in various sectors, including finance, IT, and healthcare. While the benefits of GenAI are undeniable, its application in the realm of elections poses significant risks and challenges. This includes the threat of spreading misinformation through deep AI fakes and creating highly personalized political advertisements for microtargeting and manipulation.
The AI models are only as good as the data they are trained on, and if data contains bias, it can have an unintended impact on the democratic process.
Anthropic, one of the leading AI safety and research companies, has shared the work it has done since last summer to test its AI models for election-related risks. The company has developed in-depth expert testing (“Policy Vulnerability Testing”) and large-scale automated evaluations to identify and mitigate potential risks.
The PVT method is designed to evaluate Anthropic AI model responses to election-related queries. It does this by rigorously testing the models for two potential issues. The first issue is where the model gives outdated, inaccurate, or harmful information in response to well-intended questions. The other issue is when the models are used in ways that violate the Anthropic user policy.
As part of the PVT, Anthropic focuses on selected areas and potential misuse applications, and with the assistance of subject matter experts, Anthropic constructs and tests various types of prompts to monitor how the AI model responds.
For this testing, Anthropic has partnered with some of the leading researchers and experts in this field including Isabelle Frances-Wright, Director of Technology and Society at the Institute for Strategic Dialogue.
The outputs from the PVT are documented and compared with Anthropic usage policy and industry benchmarks using similar models. The results are reviewed with the partners to identify gaps in policies and safety systems and to determine the best solutions for mitigating the risks. As an iterative testing method, PVT is expected to only get better with each round of testing.
Anthropic shared a case study in which it used the PVT method to test its models for accuracy based on questions about the election administration in South Africa. The method was successful in identifying 10 remediations to mitigate the risk of providing incorrect, outdated, or inappropriate information in response to elections-related queries. The remediations included “increasing the length of model responses to provide appropriate context and nuance for sensitive questions” and “not providing personal opinions on controversial political topics”.
Anthropic admits that while PVT offers invaluable qualitative insights, it is time-consuming and resource-intensive, making it challenging to scale. This limits the breadth of issues and behavior that can be tested effectively. To overcome these challenges, Anthropic also included automated evaluations for testing AI behavior across a broader range of scenarios.
Complimenting PVT with automated evaluations enables assessment of model performance across a more comprehensive range of scenarios. It also allows for a more consistent process and set of questions across models.
Anthropic used automated testing to review random samples of questions related to EU election administration and found that 89% of the model-generated questions were relevant extensions to the PVT results.
Combining PVT and automated evaluations forms the core of Anthropic’s risk mitigation strategies. The insights generated by these methods enabled Anthropic to refine its policies, fine-tune its models, update Claude’s system prompt, and enhance automated enforcement tools.
Additionally, Anthropic models were enhanced to now automatically detect and redirect election-related queries to authoritative sources. This includes time-sensitive questions about elections that the AI models might not be capable of answering.
After the implementation of changes highlighted by PVT and automated testing, Anthropic used the same testing protocols to measure whether its interventions were successful.
The testing re-run revealed a 47.2% improvement in referencing the model’s knowledge cutoff date, which is one of Anthropic's top priority mitigations. According to Anthropic, the fine-tuning of its models led to a 10.4% improvement in how often users were redirected or referenced to an authoritative source for the appropriate question.
While it may be impossible to completely mitigate the threats posed by AI technology to the election cycle, Anthropic has made significant strides in responsible AI use. Anthropic’s multifaceted approach to testing and mitigating AI risks has ensured that the potential misuse of its AI models during elections is minimized.
Related Items
Anthropic Breaks Open the Black Box
Amazon Invests Another $2.75 Billion Into Anthropic
Anthropic Launches Tool Use, Making It Easier To Create Custom AI Assistants