800-plus ‘potential vulnerabilities and biases’ may afflict AI-infused medical services

By admin On Jan 6, 2025

More than 800 “potential vulnerabilities and biases” were uncovered by a Pentagon effort to spot problems with using large language models in military medical services, officials said Thursday.

The Chief Digital and Artificial Intelligence Office, or CDAO, said the initiative was conducted through its Crowdsourced AI Red-Teaming Assurance Program, with help from the Program Executive Office, Defense Healthcare Management Systems, and the Defense Health Agency. It was conducted by Humane Intelligence, a technology nonprofit.

CDAO’s LLM pilot focused on identifying potential system weaknesses and flaws when it came to using emerging tools for clinical note summarization and for a medical advisory chatbot. DOD said more than 200 people — including clinical providers and healthcare analysts within the department — participated in the red teaming effort, which “compared three popular LLMs.”

A press release said the initiative had been “successfully concluded.”

“This exercise will result in repeatable and scalable output via the development of benchmark datasets, which can be used to evaluate future vendors and tools for alignment with performance expectations,” it said. “Furthermore, these findings will play a crucial role in shaping DOD policies and best practices for responsible use of Generative AI (GenAI), ultimately improving military medical care.”

Matthew Johnson, who heads CDAO’s Responsible AI Division and served as the office’s lead on the pilot, also said in a statement that “this program acts as an essential pathfinder for generating a mass of testing data, surfacing areas for consideration and validating mitigation options that will shape future research, development and assurance of GenAI systems that may be deployed in the future.”

CDAO, which became operational in June 2022, has worked to test, expand and streamline DOD’s adoption and use of AI capabilities since its creation. The office previously launched a GenAI task force — known as Task Force Lima — in August 2023 to better study and understand how it could use emerging technologies “in a responsible and strategic manner.”

Although the department sunset the task force last month, it also created an Artificial Intelligence Rapid Capabilities Cell to carry out the group’s recommendations. CDAO said the new program, created in partnership with the Defense Innovation Unit, “will lead efforts to accelerate and scale the deployment of cutting-edge AI-enabled tools, to include Frontier models, across the Department of Defense.”

In its Thursday announcement, DOD said, in part, that pilot initiatives conducted as part of its Crowdsourced AI Red-Teaming Assurance Program “will be critical to accelerating the CDAO’s AI Rapid Capabilities Cell.”

Defense One