By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy for more information.
Icon Rounded Closed - BRIX Templates

Observe, Optimize, and Protect Your Hosted Agents in Microsoft Foundry - Microsoft Build 2026

AI agents are moving from prototypes into real enterprise workflows, but traditional monitoring tools were not built to explain why an agent behaved unpredictably across multi-step tasks.

Event Snapshot

  • Session: Observe, Optimize, and Protect Your Hosted Agents in Microsoft Foundry
  • Location: In-person at Microsoft Build 2026
  • Format: Hands-on technical lab with sandbox access
  • Focus: Observability, evaluation suites, and secure agent deployment
  • Audience: AI developers, platform engineers, architects, and technical leaders building hosted agents
  • Key Outcomes:
    • Detect and diagnose agent failures traditional monitoring misses
    • Build automated evaluation suites and test datasets for agent quality
    • Implement continuous evaluation and adaptive red teaming for production reliability

The Challenge: Why AI Agents Fail in Production

Modern AI agents do not behave like traditional applications. They reason, call tools, use prompts, rely on context, and complete multi-step tasks, which means standard application monitoring often misses the failures that matter most.

Teams building hosted agents need a better way to answer questions like:

  • Why did the agent generate that response?
  • Which tool, prompt, or context contributed to the issue?
  • How do we measure whether the agent is improving over time?
  • How do we find vulnerabilities before users do?

Microsoft Foundry Observability introduces workflows designed for AI systems, including context-aware evaluation, trace-linked diagnostics, and continuous testing built into the development process.

What You'll Learn in the Lab

By the end of the session, you’ll understand how to:

  • Build context-specific evaluation suites using auto-generated evaluators and test datasets.
  • Integrate evaluation into developer pipelines using skills and MCP tooling.
  • Use trace-linked analysis to diagnose agent decisions and tool calls.
  • Run continuous evaluation to improve agent quality over time.
  • Apply adaptive red teaming to uncover vulnerabilities, edge cases, and security risks.
  • Move hosted agents from prototype to production using a practical evaluation and observability workflow.

Agenda at a Glance

1. The Observability Gap in AI Agents

Understand why traditional application monitoring and logging tools often fall short for hosted AI agents.

2. Building Evaluation Suites

Learn how to create evaluators and test datasets tailored to your agent’s context.

3. Embedding Observability into Developer Workflows

Explore how skills, MCP tooling, and traces can connect evaluation directly to agent decisions and tool usage.

4. Continuous Evaluation and Optimization

See how ongoing quality checks can help teams improve agent reliability and response quality over time.

5. Adaptive Red Teaming

Learn how adaptive red teaming can help surface vulnerabilities, edge cases, and security risks before they reach users.

6. Hands-On Sandbox Exploration

Explore the guided lab environment and continue experimenting after the session.

Who Should Attend?

This lab is built for technical professionals working with AI platforms and hosted agents.

Ideal attendees include:

  • AI developers building agents or Copilot extensions
  • Platform engineers responsible for AI infrastructure and pipelines
  • Solution architects designing with Microsoft AI services
  • Engineering teams taking AI systems from POC to production
  • Builders working with MCP-enabled tools, skills, or hosted agents

You'll get the most value if you:

  • Are deploying or planning to deploy hosted AI agents
  • Need better ways to evaluate agent behavior and reliability
  • Want to integrate observability into your AI development pipeline

Real-World Impact

Teams that adopt structured AI evaluation and observability are able to:

  • Detect agent failures earlier in development cycles
  • Improve response quality through automated evaluation pipelines
  • Reduce deployment risk when shipping agent-powered experiences into enterprise environments

This session is designed to translate emerging AI engineering practices into workflows you can apply immediately.

Get the Session Slide Deck

Get the slide deck from Kanwal's Microsoft Foundry Observability session and learn how technical teams can evaluate, monitor, and improve hosted AI agents using trace-linked diagnostics, continuous evaluation, and adaptive red teaming.

FAQs

What is Microsoft Foundry Observability?

A set of tools and workflows for monitoring, evaluating, and improving hosted AI agents covering behavior analysis, failure detection, and continuous quality improvement.

Is this a lecture or hands-on?

Fully hands-on. You'll work directly with evaluation tools, observability features, and a sandbox environment.

Do I need prior agent-building experience?

Basic familiarity with AI development or agent-based systems will help you get the most out of the lab.

Will there be a sandbox to keep?

Yes, you'll walk away with a sandbox to explore additional Foundry Observability features on your own.

Will the session be recorded?

Availability depends on the final event format and will be confirmed closer to the date.

Last updated on:
May 27, 2026
Published on:
May 27, 2026
share on
Learn more
Right arrow icon

Send Me The Slidedeck

Thank you, your submission request has been received.
Your resource is ready! 🥳
Access Resource
Oops! Something went wrong while submitting the form. Please make sure that all required fields have been filled in.

Similar resources

Check-out these other great resources.
Next steps
Have a question, or just say hi. 🖐 Let's talk about your next big project.
Contact us
Mailing list
Occasionally we like to send clients and friends curated articles that have helped us improve.
Close Modal