High CPU Usage when call ring on queue agents

Vitor_Hugo_Costa · August 27, 2024, 5:17pm

i have 3 vitalpbx servers with arround 80 tenants each, and 1k extensions each. 2 servers are Vital3 and one Vital4

Only on vital4 server when a call go to a queue with 3 or 4 static extensions the cpu go very high when ring on extensions. Its not about big traffic becouse i test it at 04:00 am when no one doing calls… and if put only one extension on queue occours too less severe cpu use. This effect is not observed on vital3 servers.

jluis · September 8, 2024, 2:27pm

Were you able to fix this issue?

This happens to me before.

In my case this happens when ring an extension that isn’t available.
I change the users extensions queue from static to dynamic, That maintains the cpu normal.
But definitely this doesn’t happen with v3 only 4.

Vitor_Hugo_Costa · September 8, 2024, 7:03pm

Its exact opose situation: when static extension is online cpu go high, extensions offline dont add the effect. Looks like rebooting pbx reduce effect…all tests done when no one using pbx

dotro · October 8, 2024, 7:06am

Bumping this so it doesn’t get closed.

This seems to be a major issue with a basic function of a PBX: ringing queue members.

CPU load spikes dramatically at the exact moment that the PBX is trying to ring about 50 queue members (load is almost the same when ringing only 25 extensions), as can be seen in the screenshot below.

Also swap gets used up after about 3 weeks of uptime, indicating to me a potential problem with memory usage generated by the dialplan (system in the screenshot has 128GB of memory).

All of my v4 systems are experiencing this issue and the only workaround that I found was manually removing the queue-hints from the extensions__25_-hints.conf file and reloading the dialplan (used it only for testing, not for production because I don’t yet grasp all of the implications of removing them).

A system restart or Asterisk “core restart” lessens the effect for a couple of days, but then it comes back.
v3 systems do not have this issue regardless of how many extensions are rang at the same time.

Kudos @Vitor_Hugo_Costa for finding this!

Vitor_Hugo_Costa · October 8, 2024, 11:16am

I open a ticket about it on vitalpbx …vital staff still dont believe its a issue…
Maybe more costumers comply can help then to up the priority… some time ago i found very slow code on sonata stats and give some advices about index and querys…some months pass and vital staff implement it

dotro · October 8, 2024, 1:06pm

I basically got the same response from the VPBX staff after about 3 months of going back and forth with a support ticket in February.

Things got so worse with the CPU load in my case that Announcements and Music on Hold playback would be very choppy or even interrupted for entire seconds during peak usage hours, and your discovery might explain the reason why.
Keep in mind that at that time I was running a v4 with around 100 tenants and 1.5k calls per hour, while other v3 systems at the same time ran 200 tenants and 10k calls without issues.

Also, hint generation was disabled on my system… shouldn’t there be no hints generated?

Vitor_Hugo_Costa · October 9, 2024, 3:33am

Yes…pbx staff keep saying this is memory…or cpu or load or dell/hp problem…but for sure is software at some level (asterisk ? Os ? Dial plan ?).
I dont disabled hint generation. You do it ? Solved then problem ? How disable hint generation ?
I never test if this problem happens on ring groups…if happens its a ring problem…if dont is queue problem

dotro · October 9, 2024, 7:27am

I don’t think this is a memory, CPU or system manufacturer hardware problem… I’ve tested with different systems and configurations: 20C/40T, 32C/64T CPUs, 128/256GB RAM, HP Gen8/Gen9, and all of them show the exact same issue.

I think hint generation is disabled by default, but I’m not sure if this setting is supposed to inhibit queue-hints generation. Check under SETTINGS → System General → Create Hints
I’ve always had this setting disabled and it did not seem to help with the problem.

Didn’t think to test with Ring Groups unfortunately, and my v4 system is no longer in production…

Vitor_Hugo_Costa · October 9, 2024, 11:46am

whats this hints do for a queue ?
how you remove it ?
for sure the cpu use drop after ?

dotro · October 10, 2024, 1:13pm

Didn’t have the time to trace the dialplan so I’m not exactly sure, that’s why I never tested it further.
If I had to guess: removing the queue-hints breaks Diversions, and/or FollowMe, and/or Agent Login/Logout features… at least.
Manually remove contents of the [TenantID_queue-hints] context from /etc/asterisk/vitalpbx/extensions__25-TenantID-hints.conf file, and then issue a “dialplan reload” command to Asterisk.
The CPU load issue was completely gone for me once the queue-hints were removed, and all the extensions were successfully rang locally.

But as I said, I don’t fully grasp the implications of removing them so please don’t try/use this in production.
I only did it because I wanted to see if there was something wrong at the OS/Asterisk level and it seems the issue is not there.

Vitor_Hugo_Costa · October 16, 2024, 9:35pm

I tested on my vital4 server, removing [TenantID_queue-hints] from a tenant = cpu high use when call como to a queue on this tenant keep same. Extentions keep ringing and cpu kepp high usage.
Only removind [TenantID_extensions-hints] on same file cpu become low… but no extension ring.

maybe this problem come from asterisk version ?
or vital4 dialplan diferent vital3 dialplan ?

RM1740 · October 25, 2024, 7:32pm

We are having the same issue with very high CPU usage on queue rings. Its causing IVR and recordings to be very choppy. This needs to be fixed ASAP.

Vitor_Hugo_Costa · October 26, 2024, 9:45am

Try open a support ticket. Maybe if enough vital users do it, vital team fix it.
This problem really broke my bussines

admin · October 26, 2024, 11:31am

We already had a similar case with a client and the problem was that he had music on hold online with an invalid URL. If you have music on hold online, it would be good for you to check.

miguel · October 26, 2024, 1:20pm

Questions for those reporting the issue:

Where the server is hosted?
Is a bare metal or a VM?
What are the server SPECs?
What ring strategy are you using?
Are you hosting the VMs on DELL or HP servers?
How many tenants are call centers on your PBX?
How many extensions and tenants do you have?

Vitor_Hugo_Costa · October 26, 2024, 4:26pm

Its not about music on hold. Local music and ring only have same effect

Vitor_Hugo_Costa · October 26, 2024, 4:33pm

Where the server is hosted? Exclusive servers on Brasil datacenter
Is a bare metal or a VM? Vms - proxmox
What are the server SPECs? 8 vcpus 8 gb memory - no matter how much cpu or memory, problem is same
What ring strategy are you using? Ringall will be more severe becouse all extentions ring. Same strategt on vital 3 have no issue
Are you hosting the VMs on DELL or HP servers? Dell
How many tenants are call centers on your PBX? 100 tenants or so. But no matter, if only one tenant problem is same

You can reproduce problem with no concurrent usage. Compare vital3 and vital4 on same server, cpu, memory, etc etc and you see the problem cristal clear. If you replace queue by ring group on vital4 problem dont happen

For sure is someting on queues dialplan or how this asterisk version works…or maybe about debian (vital3 use centos).

Vitor_Hugo_Costa · November 22, 2024, 4:18pm

Hi Guys, good news

The team at VitalPBX gave me a chance and allowed me to demonstrate the problem.
They created a server in their environment with 1 gb of memory and 1 vcpu and I was able to create some extensions, a queue and show that no effect other than just the ring with the busy extensions generates the effect on the cpu.
I created similar machines in my environment with vital 3 and vital 4 and I was able to show that there is an important difference when ringa in the vital3 x vital4 queue (with all parameters equal).
They then made a small change to the base dialplan and generated an improvement of about 10% (this improvement was integrated into vital 4.2).
A very basic analysis I made of the dialplan of vital 3 x vital 4, indicates that the vital4 node they make some calls to external scripts, as well as read data in the database … while in Vital3 this does not occur.

Anyway, the improvement they implemented was not enough to solve my problem, and I believe, that yours too.

So, I tried something unorthodox: I injected the base dial plan of the vital3 into the vital 4.
Initially it didn’t work, because some dialplans generated by the tenants and such called things that didn’t exist in the dialplan… but with some testing, copy and paste, I created this Frankstein dialplan that works. And it delivers the same or similar performance.
Of course, it also eliminates some new features of vital4 or generates other problems. But given the situation I was in, so far the risk makes sense to me.

I will post the Frank file here and everyone can do what they want at their own risk.

Vitor_Hugo_Costa · November 22, 2024, 4:21pm

vital34-frank-cpu usage

Here u see 2 days cpu usage: Left is orignal vital4.2 base dialplan runing. Right is my base Frank dial plan (vital3 with some essential copy past from vital4 dialplan

Vitor_Hugo_Costa · November 22, 2024, 7:01pm

Frank is here: http://arquivos.iquest.com.br:1180/wl/?id=aRZnB48Pk8dvb8Ye9K9tNWaRhPGbzCdH