Policy Monitor

United States of America - ARIA program for assessing risks an impacts of AI by NIST

The US's NIST has started its ARIA evaluation program. This program will allow AI developers to have their systems evaluated at several levels for different technical and societal risks. The program starts with a pilot project focused on LLM's.

What: Evaluation Program

Impactscore: 4

For who: Developers of AI-systems, policy makers, assessment bodies.

URL: https://www.nist.gov/news-events/news/2024/05/nist-launches-aria...

https://ai-challenges.nist.gov...

The USA's National Institute of Standards and Technology (NIST) launched its ARIA program. ARIA stands for "Assessing Risks and Impacts of AI" and should help organizations and individuals decide whether a particular AI technology is valid, reliable, safe, secure, private and fair once deployed. NIST will evaluate models and systems made available to them by participants using different metrics, which will be developed together with ARIA participants. The evaluation program is open to technology developers whose application meets the program requirements. As of the pilot evaluation, such requirements are based on the type of application (i.e. large language models) and not limited geographically. ARIA is one of several initiatives by NIST to address its tasks under the Executive Order on Safe, secure and trustworthy artificial intelligence.

The pilot evaluation launched by NIST is focused on the risks and impact of large language models (LLMs). The evaluation program will address both functional testing and societal impacts of the model. ARIA will include model testing, red-teaming and field testing of the AI-systems. The model testing examines the AI model and system components’ functionality and capabilities. Red-teaming should find potential adverse outcomes of the model or system through stress testing whereas the field testing should show how the public uses and interprets AI-generated information in their interactions with the technology. The aim is that ARIA will provide knowledge on how AI capabilities connect to risks and impacts in the real world by studying these three fields. Results of the evaluation are made public, with the possibility for participants to anonymise their submissions.

Future ARIA activities can expand beyond the focus on LLMs, for example to other generative AI technologies or to other AI-systems such as recommender systems or decision support tools. In the long run, NIST expects the program to lead to guidelines, tools and metrics that organizations can use themselves for evaluating their AI systems and as part of their processes. NIST considers ARIA to be an expansion of its AI Risk Management Framework and expects the results to support and inform it efforts to build safe, secure and trustworthy AI systems. This program as well as the guidelines and tools derived from it can be interesting for European providers of AI-systems as well, allowing them to evaluate their own systems or participate in future NIST evaluation programs.