UgrĂĄs a tartalomhoz
← Back to the journal

Six months of learning — what we've discovered running the AI Call Center in production

Six months in: 4 metrics survived, 2 dropped. The 22-minute Telnyx EU outage, the secondary trunk that saved us, 14 SME tenants.

Six months of learning

After six months of running Nortinia AI Call Center in production, this is a good moment to reflect. Which metrics do we now trust reliably for decision-making? Which ones did we drop because they were misleading? And what was the one real outage that shook us?

The four metrics we now trust

After six months of psychological calibration we build performance evaluation around these four numbers. Each is visible on the tenant dashboard in real time, and each comes up in every bench review.

1. Post-call CSAT (3-point scale). After every call we ask for feedback on a 3-point scale. Not 5-point, not 10-point, not NPS — simple "good / okay / poor." Two reasons for the simplicity: response rate is dramatically higher (62% vs. 31% on the 5-point), and the thresholds are sharper. The target: minimum 4.0/5 equivalent (which is 2.6/3 average on the 3-point scale).

2. Time-to-resolution (TTR). The total time until the customer's problem is solved — including any callbacks, human handoffs, and the full follow-up. NOT just call duration. TTR tells you whether the AI is really solving things or just shortening the call and passing the problem along.

3. Handoff rate. The share of calls AI hands to humans. Typical value 38-60%. The trend matters: if it's rising, AI knowledge is shrinking; if it's falling, either things improved, or false-handoff crept in.

4. False-handoff rate. Calls where AI escalated to a human but where (in retrospect) AI could have solved it alone. Measured via manual sample review (200 random calls per month). Target: <10%. Currently 8%.

The two metrics we dropped

Two metrics we leaned on hard in the first three months and learned to distrust:

1. Average handle time (AHT) alone. AHT looks simple at first glance: average call duration. The industry KPI for decades. One problem: with AI it tells you nothing. A 40-second AI call could be excellent (instant resolution) or catastrophic (customer gave up). The number alone doesn't tell you which. We moved to looking at AHT only alongside TTR and CSAT, never solo.

2. "% of automated calls." In the early months we proudly said AI handled 60-70% of calls. Then we realised that number is absurdly gameable: loosen the handoff threshold and it goes up, even if CSAT tanks. Today we look at handoff rate and CSAT together; automation share alone is a vanity metric.

The 22-minute Telnyx EU region outage

In six months we had one real major incident. On March 14, Tuesday morning at 09:42, the Telnyx EU-1 region service went down. Our SIP trunks stopped accepting new calls, and existing ones started dropping.

First alert at 09:43 (one-minute detection time, which is good). Monitoring showed clearly the problem wasn't on our side — Telnyx's status page confirmed the outage at 09:48.

Fallback activation: Nortinia keeps every tenant on a secondary SIP trunk with another provider (most on Vonage, three tenants on Twilio). The failover is DNS-level and automatic. By 09:51, 78% of traffic was on the secondary trunk; by 09:54 it was 100%.

Telnyx EU-1 came back at 10:04. Total outage time on our side: 22 minutes. Exactly one call was permanently lost, and even that was reconstructed from the tenant's CRM.

The lesson: don't keep everything on one provider. The secondary SIP trunk had been completed three weeks before the incident — we got lucky. Since then, dual-provider setup is default for every new tenant.

The tenant mix

After six months we have 14 live customers, all SMEs. Industry breakdown:

  • 5 — e-commerce (post-sale support, shipping status)
  • 3 — healthcare provider (appointment scheduling, confirmation)
  • 2 — educational institution (registration, information)
  • 2 — financial advisory (surveys, follow-up)
  • 2 — general SME customer service

The smallest tenant generates 400 calls per month, the largest 12,000. Average 2,600 calls/month, with all tenants combined the platform handles roughly 36,000 calls per month.

The next six months

A few publicly committed improvements:

  • CSAT predictor — model-based estimate at the end of the call, before the customer fills it in. Goal: anomaly detection, so problem cases can be escalated immediately.
  • Hungarian dialect tuning — Transylvanian and Vojvodina-Hungarian STT accuracy is currently below average.
  • Two new jurisdictions — Poland and Czechia after compliance review.
  • Telnyx US-1 region rollout — two tenants expanding to the US market need this.

Thanks to the first 14 design partners for believing us; the next six months will be hard work too, but the fundamentals are solid.

Let's talk about your project

Tell us what you are building — we will figure out how to help.

Six months of learning — what we've discovered running the AI Call Center in production — Nortinia Journal | Nortinia