Companies

Meta Halts Development of 'Critical Risk' AI Models

Published February 4, 2025

Meta has established its policy on artificial intelligence (AI) to govern its future development, announcing that it will cease work on AI models that it considers to pose a “critical risk.” This strategic shift highlights Meta's commitment to safety in the AI landscape.

Although Meta has been steadily gaining recognition as a frontrunner in the AI field, its approach differs markedly from several competitors. Unlike many companies that focus on developing proprietary AI models, Meta has opted to open-source its Llama family of AI models, promoting transparency and collaboration.

A significant challenge for Meta, and the broader AI sector, is ensuring that AI models are developed safely to prevent their misuse. In line with its commitment to responsible AI development, Meta has signed the Frontier AI Safety Commitments and created a new Frontier AI Framework that aligns with these principles.

In the outlined policy, Meta delineates its objectives and the catastrophic outcomes it aims to prevent. The groundwork begins by identifying a set of potential catastrophic outcomes and then mapping the causal pathways that could lead to these outcomes. This assessment has taken into account the various ways different actors, including state-level entities, might exploit frontier AI technologies. Meta describes different threat scenarios that could lead to catastrophic results and establishes risk thresholds to measure how uniquely a frontier AI could enable these scenarios.

By focusing on outcomes, Meta aims to create a stable and precise set of thresholds. Although the capabilities of AI will advance over time, the outcomes we wish to avert remain relatively constant. However, it is important to note that these outcomes are not carved in stone; as understanding of frontier AI improves, some outcomes or threat scenarios may be reevaluated or excluded, while new ones may need to be added, especially with advancements in AI capabilities or changes in the threat landscape. This allows the framework to account for emerging harms and to adjust for increased risks within known threat domains.

The outcomes-led approach enables prioritization of risks based on urgency. For instance, Meta will focus on avoiding immediate catastrophic outcomes such as cybersecurity threats and risks related to chemical and biological weapons, rather than spreading resources too thinly across numerous hypothetical risks that may not be pertinent to the technology being developed.

Meta also clearly defines what constitutes a critical risk AI model and details the steps it will take in response to such assessments. A frontier AI is classified as being at a critical risk threshold if it is determined to uniquely facilitate the execution of a threat scenario. If a model reaches this critical risk threshold and cannot be adequately mitigated, Meta will halt its development and implement specified measures from its policy framework. In contrast, high and moderate risk thresholds are determined by how significantly a model could elevate the realization of identified threat scenarios; these models will be pursued following the guidelines outlined in the Frontier AI Framework.

Furthermore, Meta states it will continuously evaluate the risk levels of high and moderate risk AI models to prevent their escalation into the critical risk category.

Importance of Open Safety Policies in the AI Industry

The decision by Meta to articulate and document its safety standards stands out in an industry that seems to be rapidly advancing toward creating artificial general intelligence (AGI) without adequate safety measures. Many industry experts have expressed concerns that not enough is being done to ensure the safe development of AI technologies.

By establishing clear safety goals and committing to halt the development of critical risk models, Meta is distinguishing itself as a company prioritizing safety, with organizations like Anthropic also sharing similar values. This proactive stance is a hopeful sign that other companies might emulate Meta's example for a safer AI future.

AI, Policy, Safety