AI Agents: 4 Lessons for Building Trustworthy Systems | ZDNET
Building Trustworthy AI Agents: A Four-Pronged Approach
The integration of AI agents into business workflows is no longer a distant prospect. it’s actively unfolding. These systems, powered by large language models and increasingly sophisticated data analysis, promise to automate tasks and augment human capabilities. However, realizing the benefits of AI agents requires a deliberate focus on building systems businesses can genuinely trust. According to recent insights from Thomson Reuters Labs CTO Joel Hron, a structured approach encompassing measurement, collaboration, capability development and external awareness is crucial for successful and reliable agent implementation. This isn’t simply about deploying new software; it’s about fundamentally rethinking how expertise is delivered and validated.
The Importance of Rigorous Evaluation
Hron emphasizes that the foundation of trustworthy AI agents lies in robust evaluation. “You need to know what good looks like,” he stated in a recent interview with ZDNet. This seemingly obvious principle often proves challenging to implement effectively. Quantifying success and systematizing the evaluation process are key hurdles. Thomson Reuters tackles this by leveraging both public benchmarks – providing initial performance indicators – and internally developed benchmarks tailored to specific, automated evaluations. These internal benchmarks focus on defining what constitutes a “good” answer, rather than simply assessing proximity to a pre-defined solution.
However, automated evaluations aren’t the final word. Human oversight remains critical. The company maintains a reliance on human experts to assess performance, particularly before product releases. This blended approach – leveraging the speed of automated testing with the nuanced judgment of human professionals – is central to their strategy. As Hron explains, “Automated evaluations help drive the flywheel faster for our development teams… But before we ship, we still want the confidence of our human experts.”
Bridging the Gap: Collaboration Between Experts
Beyond measurement, fostering close collaboration between technical experts and domain specialists is paramount. Hron advocates for a tightly coupled understanding of how agents function and how they are experienced by users. This requires a common language and interface that facilitates insight into the agent’s thought processes. This isn’t merely a user interface (UI) challenge; it’s about creating a shared understanding of the agent’s reasoning.
The most effective approach, Hron suggests, is simple: physical proximity. “This process isn’t scientific — it’s about forcing my designers to sit with data scientists and talk about what’s happening. The closer we can make those two sets of people, and the more often they can sit together, the better you have the osmosis of thinking across those two areas.” This deliberate co-location encourages a cross-pollination of ideas and a more holistic understanding of the agent’s capabilities and limitations.
Leveraging Proven Capabilities, Not Reinventing the Wheel
A common misconception surrounding AI agents is that they are all-knowing, capable of tackling any task. Hron cautions against this assumption. He points out that AI models are continually improving in areas like code writing, plan execution, and multi-step reasoning, and their capabilities can be significantly extended by integrating them with existing software tools.
Thomson Reuters’ strategy centers on decomposing established applications – tools professionals have relied on for decades – into components that agents can utilize. This approach leverages proven capabilities rather than attempting to build everything from scratch. The focus shifts to adapting existing tools for agent interaction, considering the unique ergonomics required for agent-driven workflows. This allows for a more pragmatic and reliable implementation of AI agents, building on a foundation of established functionality.
Looking Beyond Internal Walls: The Value of External Collaboration
Trustworthy AI isn’t built in isolation. Thomson Reuters actively participates in industry-wide initiatives to advance the field, such as the Trust in AI Alliance, a forum bringing together researchers from leading organizations like Anthropic, AWS, Google Cloud, and OpenAI. This collaborative effort focuses on sharing best practices for engineering trust into agentic systems, with a particular emphasis on explainability and transparency.
Hron notes that advancements from these pioneering organizations have dramatically reduced the effort required to achieve initial accuracy levels. However, he stresses that the real challenge lies in achieving the final, critical percentage points of accuracy – the difference between acceptable performance and the level of reliability required for high-stakes applications. This pursuit of “the last two nines of accuracy” is where Thomson Reuters focuses its efforts, recognizing that this is where competitive advantage in fields like law, tax, and compliance is won and lost. The company also collaborates with academic institutions, exemplified by a five-year partnership with Imperial College London to establish a Frontier AI Research Lab.
The development of AI agents is rapidly evolving. Getty Images, for example, is expanding its AI tools to allow users to incorporate reference and product images, enabling more customized AI-generated artwork. This is achieved through a custom-tuned AI model trained on licensed data, ensuring commercial viability and respecting intellectual property rights. This focus on ethical AI practices, including compensating data set contributors, is becoming increasingly important as the technology matures.
building trustworthy AI agents requires a multifaceted approach. It’s not simply about technological prowess; it’s about a commitment to rigorous evaluation, collaborative development, leveraging existing expertise, and engaging with the broader AI community. The path forward involves continuous refinement, a willingness to learn from both successes and failures, and a steadfast focus on delivering value to professionals while upholding the highest standards of reliability and ethical conduct. The next steps for organizations will involve continued investment in these areas, coupled with a proactive approach to monitoring and adapting to the rapidly changing landscape of AI agent technology.