AWS Unveils Automated Reasoning Checks to Combat AI Hallucinations, Debuts Model Distillation and Multi-Agent Collaboration

At the re:Invent 2024 conference in Las Vegas, Amazon Web Services (AWS) announced a new tool designed to combat AI hallucinations, a phenomenon where AI models behave unreliably. Automated Reasoning checks, available through AWS' Bedrock model hosting service, cross-references customer-supplied information to validate a model's responses and detect potential hallucinations.

While AWS claims Automated Reasoning checks is the "first" and "only" safeguard for hallucinations, it bears a striking resemblance to Microsoft's Correction feature, which also flags AI-generated text that might be factually incorrect. Google's Vertex AI platform also offers a similar tool, allowing customers to "ground" models using data from third-party providers, their own datasets, or Google Search.

Automated Reasoning checks works by attempting to understand how a model arrived at an answer and determining whether the answer is correct. Customers upload information to establish a ground truth, and the tool creates rules that can be refined and applied to the model. As the model generates responses, Automated Reasoning checks verifies them, and in the event of a probable hallucination, draws on the ground truth to provide the correct answer.

AWS says PwC is already using Automated Reasoning checks to design AI assistants for its clients, and the company believes this type of tooling is attracting customers to Bedrock. According to Swami Sivasubramanian, VP of AI and data at AWS, "With the launch of these new capabilities, we are innovating on behalf of customers to solve some of the top challenges that the entire industry is facing when moving generative AI applications to production."

However, some experts argue that eliminating hallucinations from generative AI is a daunting task, as AI models don't actually "know" anything and are statistical systems that identify patterns in data. This means that a model's responses are predictions rather than answers, and are subject to a margin of error.

AWS claims that Automated Reasoning checks uses "logically accurate" and "verifiable reasoning" to arrive at its conclusions, but the company has not provided data to demonstrate the tool's reliability.

In addition to Automated Reasoning checks, AWS also announced Model Distillation, a tool that allows customers to transfer the capabilities of a large model to a smaller, cheaper, and faster model. This feature is similar to Microsoft's Distillation in Azure AI Foundry and provides a way to experiment with various models without incurring significant costs.

Model Distillation only works with Bedrock-hosted models from Anthropic and Meta at present, and customers must select a large and small model from the same model "family". Additionally, distilled models will lose some accuracy, although AWS claims this loss will be less than 2%.

Another new feature announced by AWS is multi-agent collaboration, which enables customers to assign AI to subtasks in a larger project. This feature, part of Bedrock Agents, provides tools to create and tune AI for tasks such as reviewing financial records and assessing global trends. Customers can designate a "supervisor agent" to break up and route tasks to the AIs automatically, and the supervisor can determine what actions can be processed in parallel and which need details from other tasks before an agent can move forward.

While these new features seem promising, it remains to be seen how well they will work when deployed in real-world scenarios. AWS has made significant strides in improving AI model reliability, but the industry still has a long way to go in addressing the challenges posed by hallucinations and other AI-related issues.

AWS Unveils Automated Reasoning Checks to Combat AI Hallucinations, Debuts Model Distillation and Multi-Agent Collaboration

Similiar Posts

Air Taxis Take Off: FAA Rules Clear Way for Electric Flight

Ghana Faces Possible Government Shutdown Amid Budget Crisis Ahead of Election

Google's Project Jarvis: AI-Powered Browser Agent