Article • 2 min read
Zendesk AI + GPT-5: Setting the pace for the next generation of support
Shashi Upadhyay
President, Products, Engineering and AI at Zendesk
Zuletzt aktualisiert: August 7, 2025
At Zendesk, our platform helps businesses deliver fast, accurate resolutions with less effort. And at the core of our platform is having a system of AI agents working to understand what a customer needs in that specific moment where trust is on the line, act on it correctly, and know when to escalate or step aside.
This is exactly why foundation models are essential and why it’s critical for a company like Zendesk to lead in testing and using the latest models to continuously enhance and accelerate our most advanced AI capabilities.
With the release of GPT-5, we saw a meaningful opportunity to improve key parts of that system. Today, GPT-5 is live in production inside Zendesk’s Resolution Platform, powering real customer conversations across our agent assist and automation workflows.
“Iterative deployment helps ensure we approach and launch new model capabilities with the highest levels of rigor. Working with Zendesk on GPT-5 is the latest example of how early testing and feedback help us identify where the API can drive the most meaningful impact where it matters most for their users.”
– Olivier Godement, Head of Business Products at Open AI
Why we test every model the same way
When we evaluate a new model, we are not looking for benchmark wins. We are asking whether it improves resolution outcomes in the field. Zendesk runs a rigorous benchmarking program to evaluate and tune new models like GPT-5 across key tasks, balancing latency, cost, and performance. This enables rapid testing and rollout of models in under 24 hours.
Our evaluation framework covers:
- Precision: Can the model return accurate, complete answers grounded in trusted knowledge sources such as help center articles?
- Automated resolution: Does it increase the percentage of issues auto-resolved without human touch?
- Execution: Can it follow structured workflows with high fidelity?
- Latency: Is the response fast enough for live support environments?
- Safety: Does it avoid hallucination and only take actions when confident?
We continuously monitor performance with offline and live metrics, ensuring transparent and reliable AI improvements. GPT-5 delivered improvements across almost every one of these criteria. Here are the results.
What GPT-5 improved in our AI Agents, Copilot, and App Builder
- Fewer fallback escalations – reduced by over 20%
GPT-5 delivered more complete responses with fewer missed details, reducing agent handoffs and helping customers get answers more quickly.
- Sharper handling of ambiguity – improvement in intent clarification
GPT-5 clarified vague customer input more effectively, enabling better routing and increasing coverage of automated flows in over 65% of conversations.
- High execution reliability – 95%+ on standard procedures, 30% reduction in failure on large flows
GPT-5 maintained structure across long workflows and adapted to real-world service complexity without losing context.
- Higher quality assist – 5 point lift in agent suggestion accuracy across four languages
Agent productivity increased as GPT-5 suggestions became more concise, contextually relevant, and aligned with tone guidelines.
- Faster app generation – 3 to 4 times more prompt iterations per minute, better alignment in code generation for app builder
GPT-5 was 25–30% faster overall and enabled more prompt iterations per minute, speeding up app builder development workflows.
Technical integration: How we use GPT-5 at Zendesk
GPT-5 is not simply swapped in as a replacement for earlier models. It is one component of a larger, modular AI architecture built to deliver resolutions reliably.
Model selection and use
We use GPT-5 selectively across use cases where it demonstrates measurable value. These include:
- Intent clarification and disambiguation
- Long-context answer generation
- Procedure compilation and execution (PCA / PEA)
- Agent reply generation in auto-assist scenarios
GPT-5 operates in conjunction with our intent classification and reasoning pipeline. We map vague input to clear actions, then use the model to synthesize responses or execute multi-step workflows where appropriate.
Reasoning modes and flow handling
GPT-5 allows for medium reasoning with significantly longer context windows. This is especially helpful for:
- Multi-turn conversations
- Step-by-step execution of internal procedures
- Dynamic generation of structured outputs from loosely worded inputs
In these scenarios, we prioritize maintaining conversational structure, accuracy, and context window efficiency. GPT-5 performs reliably even with higher token loads, which enables smoother automation of service interactions that span multiple turns or inputs.
Scaffolding and control
To safely deploy GPT-5 in production, we surround it with strong operational guardrails:
- Intent-layer pre-routing to reduce risk and improve clarity
- Real-time observability with structured logging of model behavior
- Trigger-level governance to prevent out-of-policy responses
- Fallback protocols that default to safe escalation or agent involvement
We treat the model as a nondeterministic tool within a controlled system—not a standalone decision-maker. That is what enables us to deploy it in enterprise-grade environments.
What this means for our customers
We integrated GPT-5 because it allowed us to resolve more issues faster and with higher reliability.
It helped reduce fallback escalations. It improved performance across multilingual support. It executed workflows with precision and handled ambiguity more gracefully. And it helped our internal teams move faster in the build-test-deploy cycle for AI-powered agents.
Every step forward with AI should result in fewer dropped threads, shorter resolution times, and a better experience for the people on both sides of the conversation. GPT-5 helps us do that.
We will keep testing, tuning, and integrating new models as they evolve. But our focus will stay the same: using AI to deliver resolutions that customers can trust.