Compare commits

..

45 Commits

Author SHA1 Message Date
Hoan Luu Huu
c177373817 Merge branch 'main' into fix/fd_1828 2026-01-02 14:27:54 +07:00
Dave Horton
fdce05fa40 add handler for SIGUSR1 to start drying up calls, useful as a generic mechanism on non-AWS deployments (#1482) 2025-12-30 13:31:42 -05:00
xquanluu
037378c732 clean testcases 2025-12-30 06:05:16 +07:00
xquanluu
8cbb12bd9a fixed send whitespace to tts stream modules 2025-12-28 16:41:07 +07:00
Sam Machin
3bd1dd6323 put removeListner in a try/catch (#1479)
* put removeListner in a try/catch

* typo
2025-12-19 13:31:06 -05:00
Ed Robbins
54dc172ebd Allow defining an ENV for specific webhook error return SIP code (#1476) 2025-12-16 17:14:42 -05:00
Hoan Luu Huu
e007e0e2d3 fixed callsession cannot close tts streaming (#1472) 2025-12-16 07:58:54 -05:00
Hoan Luu Huu
c5cd488fdf fixed gather should ignore transcription if task is killed/resolved. (#1465)
* fixed gather should ignore transcription if task is killed/resolved.

* wip
2025-12-12 09:03:08 -05:00
Sam Machin
57982335e0 add label to STT/TTS alerts (#1468)
* add label to STT/TTS alerts

* update time-series
2025-12-11 11:07:24 -05:00
Hoan Luu Huu
5cea91e18a add support for sending DTMF to ultravox (#1471) 2025-12-11 07:53:59 -05:00
Dave Horton
e396b6aa98 fix #1466: (#1467)
* fix #1466:

* do not send tts streaming events when we are not doing tts streaming
2025-12-09 09:43:53 -05:00
Vinod Dharashive
9104ebb603 Add configurable say chunk size (#1461) 2025-12-08 10:54:27 -05:00
Vinod Dharashive
1ad0261336 Enhance TTS sentence boundary detection for Arabic and Japanese (#1464)
Update sentenceEndRegex to treat the following as sentence boundaries: ASCII .!? followed by whitespace or end-of-text; Arabic question mark (؟) and full stop (۔) with the same rule; Japanese 。, !, ? treated as boundaries regardless of following character; and double newlines (\n\n). This improves streaming chunking for mixed-language content.
2025-12-08 10:44:20 -05:00
Hoan Luu Huu
7802822773 fixed dial verb cannot bridge 2 leg endpoints due to transcoding (#1457)
* fixed dial verb cannot bridge 2 leg endpoints due to transcoding

* wip
2025-12-03 07:16:25 -05:00
Hoan Luu Huu
edb4d21ce1 fixed undefine issue when setting tts streaming channel vars (#1456) 2025-12-02 19:46:28 -05:00
Dave Horton
8048e9cf88 when dialing the B leg we check to see if we are using opus on the A leg, and if so we outdial B with opus first; however we were incorrectly checking the SDP on the A leg invite not the 200 OK we send back (#1455) 2025-12-02 19:22:20 -05:00
Sam Machin
451feafed4 use timeout on HTTP requests (#1453) 2025-12-02 07:41:47 -05:00
Ed Robbins
7f1543a0f3 Add ability to enable/disable Azure audio logging via azureOptions (#1432) 2025-11-30 11:56:56 -05:00
Hoan Luu Huu
83955ba972 SoundHound support audio endpoint from speech credential (#1446)
* SoundHound support audio endpoint from speech credential

* add requestInfo and sampleRate to houndify channel variable

* add requestInfo and sampleRate to houndify channel variable

* wip

* wip

* wip

* wip

* wip

* wip

* wip
2025-11-30 11:55:20 -05:00
Hoan Luu Huu
a5fa5fce5b Fixed transcribe 2 legs cannot fallback (#1451)
* fixed transcribe cannot fallback for specific endpoint

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip
2025-11-28 21:43:05 -05:00
Dave Horton
cc1751f500 fix race condition where gather resolves with speech transcript but t… (#1449)
* fix race condition where gather resolves with speech transcript but timeout timer gets set after the resolve and is left running after gather completes

* remove unneeded line of code
2025-11-27 11:44:49 -06:00
Ed Robbins
1a1f53aede Compare sdp to determine if transcoding is being used. (#1444)
* compare sdp for transcoding

* refactor sdp check for leading codec

* fix reference to epOther

* minor changes

* minor

* fix #1447

* fix security issue

* use convenience getter appIsUsingWebsockets in CallSession

---------

Co-authored-by: Dave Horton <daveh@beachdognet.com>
2025-11-24 10:50:41 -06:00
Hoan Luu Huu
1984b6d3ea allow say verb failed as NonFatalTaskError for File Not Found (#1443)
* allow say verb failed as NonFatalTaskError for File Not Found

* wip
2025-11-20 07:22:28 -05:00
Hoan Luu Huu
769b66f57e fixed playbackIds is not in correct order compare with say.text array (#1439)
* fixed playbackIds is not in correct order compare with say.text array

* wip

* wip
2025-11-19 19:00:44 -05:00
Hoan Luu Huu
98b845f489 fix say verb does not close streaming when finish say (#1412)
* fix say verb does not close streaming when finish say

* wip

* wip

* ttsStreamingBuffer reset eventHandlerCount after remove listeners

* only send tokens to module if connected

* wip

* sent stream_open when successfully connected to vendor
2025-11-17 08:56:09 -05:00
Ed Robbins
f92b1dbc97 Add ability to override certain tts streaming options via the config … (#1429)
* Add ability to override certain tts streaming options via the config verb.

* Update to null operator(??), support parameter override via config
2025-11-12 13:54:01 -05:00
Dave Horton
0442144793 fix bug escaping backspace character 2025-11-03 15:33:59 -05:00
Hoan Luu Huu
2de24af169 fixed gather does not start timeout on bargin (#1421)
* fixed gather does not start timeout on bargin

* with previous change, no need to emit playDone since no where in the code are we listening for it

---------

Co-authored-by: Dave Horton <daveh@beachdognet.com>
2025-11-03 13:11:59 -05:00
Dave Horton
a884880321 fix for #1422 (#1423)
* fix for #1422

* fix prev commit
2025-11-03 12:53:43 -05:00
Dave Horton
b307df79d0 update deps (#1417) 2025-10-31 07:31:32 -04:00
Hoan Luu Huu
77bd11dd47 update speech util 0.2.26 (#1416) 2025-10-31 07:14:38 -04:00
Hoan Luu Huu
46d56fe546 fd_1574: should not send only whitespace to streaming tts engine (#1415) 2025-10-30 20:59:25 -04:00
Hoan Luu Huu
30ab281ea2 support disableTtsCache from config verb (#1410) 2025-10-28 08:19:03 -04:00
Sam Machin
0869a73052 add distributeDtmf to conference (#1401)
* add distributeDtmf to conference

* lint

* bump verb specs
2025-10-21 11:20:12 -04:00
Sam Machin
a0a579ccee escape json special chars in metadata (#1399) 2025-10-20 10:30:03 -04:00
Sam Machin
4218653852 add customerData on transferred calls (#1391)
* add customerData on transferred calls

* change to if statement
2025-10-20 09:20:12 -04:00
Hoan Luu Huu
89cc39f726 support gladia stt (#1397)
* support gladia stt

* wip

* wip

* update verb specification
2025-10-20 04:56:39 -04:00
Sam Machin
b231593bff bump dbhelpers for cache change (#1396) 2025-10-15 11:38:43 -04:00
Sam Machin
4309d25376 don't encode querystring if its the filename (#1395)
* don't encode querystring if its the filename

* lint

* update link to issue

u
2025-10-14 10:48:50 -04:00
Hoan Luu Huu
a00703a067 support houndify stt (#1364)
* support houndify stt

* wip

* wip

* wip

* update houndify stt parameters

* wip

* wip
2025-10-14 00:55:21 -04:00
Hoan Luu Huu
89c985b564 fixed does not send final status call back if call canceled quickly (#1393)
* fixed callsession should cleanup resource if call was canceled while fetching app

* wip

* wip

* wip

* wip

* wip
2025-10-11 03:44:42 -04:00
Dave Horton
b4ed4c8c46 #1385: Gather - dont start the continuous asr timer when we first start listening if this is a background gather (#1386) 2025-10-09 08:47:51 -04:00
Hoan Luu Huu
581d309f36 support elevenlabs different endpoint (#1387)
* support elevenlabs different endpoint

* wip

* wip
2025-10-09 08:19:40 -04:00
Sam Machin
d1baf2fe37 if call is transferred from another FS then always answer (#1383)
Currently if the call being transferred was originally an outbound call then the direction thats retrieved from redis is outbound and the invite of the refer from the other FS is never answered,
However a transferredCall will always need to be answered regardless of CallDirection
2025-10-07 07:19:11 -04:00
Dave Horton
28bf0d3477 send eager_eot events (#1382) 2025-10-06 16:50:20 -04:00
30 changed files with 4515 additions and 1831 deletions

View File

@@ -119,7 +119,7 @@ const ENCRYPTION_SECRET = process.env.ENCRYPTION_SECRET;
const HTTP_POOL = process.env.HTTP_POOL && parseInt(process.env.HTTP_POOL);
const HTTP_POOLSIZE = parseInt(process.env.HTTP_POOLSIZE, 10) || 10;
const HTTP_PIPELINING = parseInt(process.env.HTTP_PIPELINING, 10) || 1;
const HTTP_TIMEOUT = 10000;
const HTTP_TIMEOUT = parseInt(process.env.JAMBONES_HTTP_TIMEOUT, 10) || 10000;
const HTTP_PROXY_IP = process.env.JAMBONES_HTTP_PROXY_IP;
const HTTP_PROXY_PORT = process.env.JAMBONES_HTTP_PROXY_PORT;
const HTTP_PROXY_PROTOCOL = process.env.JAMBONES_HTTP_PROXY_PROTOCOL || 'http';
@@ -139,6 +139,11 @@ const JAMBONES_USE_FREESWITCH_TIMER_FD = process.env.JAMBONES_USE_FREESWITCH_TIM
const JAMBONES_DIAL_SBC_FOR_REGISTERED_USER = process.env.JAMBONES_DIAL_SBC_FOR_REGISTERED_USER || false;
const JAMBONES_MEDIA_TIMEOUT_MS = process.env.JAMBONES_MEDIA_TIMEOUT_MS || 0;
const JAMBONES_MEDIA_HOLD_TIMEOUT_MS = process.env.JAMBONES_MEDIA_HOLD_TIMEOUT_MS || 0;
const JAMBONES_WEBHOOK_ERROR_RETURN = parseInt(process.env.JAMBONES_WEBHOOK_ERROR_RETURN, 10) || 480;
/* say / tts */
const JAMBONES_SAY_CHUNK_SIZE = parseInt(process.env.JAMBONES_SAY_CHUNK_SIZE, 10) || 900;
// jambonz
const JAMBONES_TRANSCRIBE_EP_DESTROY_DELAY_MS =
process.env.JAMBONES_TRANSCRIBE_EP_DESTROY_DELAY_MS;
@@ -231,5 +236,7 @@ module.exports = {
JAMBONES_DIAL_SBC_FOR_REGISTERED_USER,
JAMBONES_MEDIA_TIMEOUT_MS,
JAMBONES_MEDIA_HOLD_TIMEOUT_MS,
JAMBONES_SAY_CHUNK_SIZE,
JAMBONES_TRANSCRIBE_EP_DESTROY_DELAY_MS,
JAMBONES_WEBHOOK_ERROR_RETURN
};

View File

@@ -291,7 +291,7 @@ router.post('/',
}, {
...(account.enable_debug_log && {level: 'debug'})
});
app.requestor.logger = app.notifier.logger = sipLogger;
app.requestor.logger = app.notifier.logger = restDial.logger = sipLogger;
const callInfo = new CallInfo({
direction: CallDirection.Outbound,
req: inviteReq,

View File

@@ -12,7 +12,8 @@ const RootSpan = require('./utils/call-tracer');
const listTaskNames = require('./utils/summarize-tasks');
const {
JAMBONES_MYSQL_REFRESH_TTL,
JAMBONES_DISABLE_DIRECT_P2P_CALL
JAMBONES_DISABLE_DIRECT_P2P_CALL,
JAMBONES_WEBHOOK_ERROR_RETURN
} = require('./config');
const { createJambonzApp } = require('./dynamic-apps');
const { decrypt } = require('./utils/encrypt-decrypt');
@@ -112,6 +113,14 @@ module.exports = function(srf, logger) {
req.locals.callingNumber = sipURIs[1];
}
}
// Feature server INVITE request pipelines taking time to finish,
// while connecting and fetch application from db and invoking webhook.
// call can be canceled without any handling, so we add a listener here
req.once('cancel', (sipMsg) => {
logger.info(`${callId} got CANCEL request`);
req.locals.canceled = true;
});
next();
}
@@ -362,13 +371,14 @@ module.exports = function(srf, logger) {
});
// if transferred call contains callInfo, let update original data to newly created callInfo in this instance.
if (app.transferredCall && app.callInfo) {
const {direction, callerName, from, to, originatingSipIp, originatingSipTrunkName} = app.callInfo;
const {direction, callerName, from, to, originatingSipIp, originatingSipTrunkName, customerData} = app.callInfo;
req.locals.callInfo.direction = direction;
req.locals.callInfo.callerName = callerName;
req.locals.callInfo.from = from;
req.locals.callInfo.to = to;
req.locals.callInfo.originatingSipIp = originatingSipIp;
req.locals.callInfo.originatingSipTrunkName = originatingSipTrunkName;
if (customerData) req.locals.callInfo.customerData = customerData;
delete app.callInfo;
}
next();
@@ -471,7 +481,7 @@ module.exports = function(srf, logger) {
message: `${err?.message}`.trim()
}).catch((err) => this.logger.info({err}, 'Error generating alert for parsing application'));
logger.info({err}, `Error retrieving or parsing application: ${err?.message}`);
res.send(480, {headers: {'X-Reason': err?.message || 'unknown'}});
res.send(JAMBONES_WEBHOOK_ERROR_RETURN, {headers: {'X-Reason': err?.message || 'unknown'}});
app.requestor.close(WS_CLOSE_CODES.GoingAway);
}
}

View File

@@ -504,7 +504,12 @@ class CallSession extends Emitter {
}
get isTtsStreamEnabled() {
return this.backgroundTaskManager.isTaskRunning('ttsStream');
// 1st background tts stream
return this.backgroundTaskManager.isTaskRunning('ttsStream') ||
// 2nd current task streaming tts
TaskName.Say === this.currentTask?.name && this.currentTask?.isStreamingTts ||
// 3rd nested verb is streaming tts
TaskName.Gather === this.currentTask?.name && this.currentTask.sayTask?.isStreamingTts;
}
get isListenEnabled() {
@@ -658,6 +663,15 @@ class CallSession extends Emitter {
}
}
// disableTtsCache
get disableTtsCache() {
return this._disableTtsCache || false;
}
set disableTtsCache(d) {
this._disableTtsCache = d;
}
getTsStreamingVendor() {
let v;
if (this.currentTask?.isStreamingTts) {
@@ -918,7 +932,7 @@ class CallSession extends Emitter {
this.logger.debug('CallSession:enableBackgroundTtsStream - ttsStream enabled');
} else {
this.logger.debug(
'CallSession:enableBackgroundTtsStream - ignoring request as call does not have required conditions');
'CallSession:enableBackgroundTtsStream - ignoring request; conditions not met (probably not using ws api)');
}
} catch (err) {
this.logger.info({err, say}, 'CallSession:enableBackgroundTtsStream - Error creating background tts stream task');
@@ -932,15 +946,25 @@ class CallSession extends Emitter {
}
}
clearTtsStream() {
this.requestor?.request('tts:streaming-event', '/streaming-event', {event_type: 'user_interruption'})
.catch((err) => this.logger.info({err}, 'CallSession:clearTtsStream - Error sending user_interruption'));
this.ttsStreamingBuffer?.clear();
if (this.isTtsStreamEnabled) {
this.requestor?.request('tts:streaming-event', '/streaming-event', {event_type: 'user_interruption'})
.catch((err) => this.logger.info({err}, 'CallSession:clearTtsStream - Error sending user_interruption'));
this.ttsStreamingBuffer?.clear();
}
}
startTtsStream() {
this.ttsStreamingBuffer?.start();
}
stopTtsStream() {
if (this.isTtsStreamEnabled) {
this.requestor?.request('tts:streaming-event', '/streaming-event', {event_type: 'stream_closed'})
.catch((err) => this.logger.info({err}, 'CallSession:clearTtsStream - Error sending user_interruption'));
this.ttsStreamingBuffer?.stop();
}
}
async enableBotMode(gather, autoEnable) {
try {
let task;
@@ -964,7 +988,7 @@ class CallSession extends Emitter {
task.sticky = autoEnable;
// listen to the bargein-done from background manager
this.backgroundTaskManager.on('bargeIn-done', () => {
if (this.requestor instanceof WsRequestor) {
if (this.appIsUsingWebsockets) {
try {
this.kill(true);
} catch (err) {}
@@ -1086,6 +1110,13 @@ class CallSession extends Emitter {
deepgram_stt_use_tls: credential.deepgram_stt_use_tls
};
}
else if ('gladia' === vendor) {
return {
speech_credential_sid: credential.speech_credential_sid,
api_key: credential.api_key,
region: credential.region,
};
}
else if ('soniox' === vendor) {
return {
speech_credential_sid: credential.speech_credential_sid,
@@ -1117,6 +1148,7 @@ class CallSession extends Emitter {
return {
api_key: credential.api_key,
model_id: credential.model_id,
api_uri: credential.api_uri,
options: credential.options
};
}
@@ -1165,6 +1197,15 @@ class CallSession extends Emitter {
service_version: credential.service_version
};
}
else if ('houndify' === vendor) {
return {
speech_credential_sid: credential.speech_credential_sid,
client_id: credential.client_id,
client_key: credential.client_key,
user_id: credential.user_id,
houndify_server_uri: credential.houndify_server_uri
};
}
else if ('deepgramflux' === vendor) {
return {
speech_credential_sid: credential.speech_credential_sid,
@@ -1214,9 +1255,10 @@ class CallSession extends Emitter {
}
else {
writeAlerts({
alert_type: AlertType.STT_NOT_PROVISIONED,
alert_type: type === 'tts' ? AlertType.TTS_NOT_PROVISIONED : AlertType.STT_NOT_PROVISIONED,
account_sid: this.accountSid,
vendor,
label,
target_sid: this.callSid
}).catch((err) => this.logger.error({err}, 'Error writing tts alert'));
}
@@ -1247,6 +1289,7 @@ class CallSession extends Emitter {
this.ttsStreamingBuffer.on(TtsStreamingEvents.Pause, this._onTtsStreamingPause.bind(this));
this.ttsStreamingBuffer.on(TtsStreamingEvents.Resume, this._onTtsStreamingResume.bind(this));
this.ttsStreamingBuffer.on(TtsStreamingEvents.ConnectFailure, this._onTtsStreamingConnectFailure.bind(this));
this.ttsStreamingBuffer.on(TtsStreamingEvents.Connected, this._onTtsStreamingConnected.bind(this));
}
else {
this.logger.info(`CallSession:exec - not a normal call session: ${this.constructor.name}`);
@@ -1305,7 +1348,7 @@ class CallSession extends Emitter {
}
if (0 === this.tasks.length &&
this.requestor instanceof WsRequestor &&
this.appIsUsingWebsockets &&
!this.requestor.closedGracefully &&
!this.callGone &&
!this.isConfirmCallSession
@@ -2399,7 +2442,7 @@ Duration=${duration} `
this.logger.debug(`endpoint was destroyed!! ${this.ep.uuid}`);
});
if (this.direction === CallDirection.Inbound) {
if (this.direction === CallDirection.Inbound || this.application?.transferredCall) {
if (task.earlyMedia && !this.req.finalResponseSent) {
this.res.send(183, {body: ep.local.sdp});
return {ep};
@@ -2531,7 +2574,7 @@ Duration=${duration} `
this.backgroundTaskManager.stopAll();
this.clearOrRestoreActionHookDelayProcessor().catch((err) => {});
this.ttsStreamingBuffer?.stop();
this.stopTtsStream();
this.sttLatencyCalculator?.stop();
}
@@ -2991,14 +3034,14 @@ Duration=${duration} `
*/
_notifyTaskError(obj) {
if (this.requestor instanceof WsRequestor) {
if (this.appIsUsingWebsockets) {
this.requestor.request('jambonz:error', '/error', obj)
.catch((err) => this.logger.debug({err}, 'CallSession:_notifyTaskError - Error sending'));
}
}
_notifyTaskStatus(task, evt) {
if (this.notifyEvents && this.requestor instanceof WsRequestor) {
if (this.notifyEvents && this.appIsUsingWebsockets) {
const obj = {...evt, id: task.id, name: task.name};
this.requestor.request('verb:status', '/status', obj)
.catch((err) => this.logger.debug({err}, 'CallSession:_notifyTaskStatus - Error sending'));
@@ -3050,7 +3093,7 @@ Duration=${duration} `
}
_clearTasks(backgroundGather, evt) {
if (this.requestor instanceof WsRequestor && !backgroundGather.cleared) {
if (this.appIsUsingWebsockets && !backgroundGather.cleared) {
this.logger.debug({evt}, 'CallSession:_clearTasks on event from background gather');
try {
backgroundGather.cleared = true;
@@ -3078,6 +3121,11 @@ Duration=${duration} `
}
}
_onTtsStreamingConnected() {
this.requestor?.request('tts:streaming-event', '/streaming-event', {event_type: 'stream_open'})
.catch((err) => this.logger.info({err}, 'CallSession:_onTtsStreamingConnected - Error sending'));
}
_onTtsStreamingEmpty() {
const task = this.currentTask;
if (task && TaskName.Say === task.name) {

View File

@@ -22,6 +22,12 @@ class InboundCallSession extends CallSession {
this.req = req;
this.res = res;
// if the call was canceled before we got here, handle it
if (this.req.locals.canceled) {
req.locals.logger.info('InboundCallSession: constructor - call was already canceled');
this._onCancel();
}
req.once('cancel', this._onCancel.bind(this));
this.on('callStatusChange', this._notifyCallStatusChange.bind(this));

View File

@@ -49,7 +49,8 @@ class Conference extends Task {
this.confName = this.data.name;
[
'beep', 'startConferenceOnEnter', 'endConferenceOnExit', 'joinMuted',
'maxParticipants', 'waitHook', 'statusHook', 'endHook', 'enterHook', 'endConferenceDuration'
'maxParticipants', 'waitHook', 'statusHook', 'endHook', 'enterHook',
'endConferenceDuration', 'distributeDtmf'
].forEach((attr) => this[attr] = this.data[attr]);
this.record = this.data.record || {};
this.statusEvents = [];
@@ -356,6 +357,7 @@ class Conference extends Task {
//https://developer.signalwire.com/freeswitch/FreeSWITCH-Explained/Modules/mod_conference_3965534/
// mute | Enter conference muted
...((this.joinMuted || this.speakOnlyTo) && {mute: true}),
...(this.distributeDtmf && {'dist-dtmf': true})
}});
/**

View File

@@ -18,7 +18,8 @@ class TaskConfig extends Task {
'boostAudioSignal',
'vad',
'ttsStream',
'autoStreamTts'
'autoStreamTts',
'disableTtsCache'
].forEach((k) => this[k] = this.data[k] || {});
if ('notifyEvents' in this.data) {
@@ -88,6 +89,7 @@ class TaskConfig extends Task {
get hasReferHook() { return Object.keys(this.data).includes('referHook'); }
get hasNotifySttLatency() { return Object.keys(this.data).includes('notifySttLatency'); }
get hasTtsStream() { return Object.keys(this.ttsStream).length; }
get hasDisableTtsCache() { return Object.keys(this.data).includes('disableTtsCache'); }
get summary() {
const phrase = [];
@@ -125,6 +127,7 @@ class TaskConfig extends Task {
phrase.push(`${this.ttsStream.enable ? 'enable' : 'disable'} ttsStream`);
}
if ('autoStreamTts' in this.data) phrase.push(`enable Say.stream value ${this.data.autoStreamTts ? 'on' : 'off'}`);
if (this.hasDisableTtsCache) phrase.push(`disableTtsCache ${this.data.disableTtsCache ? 'on' : 'off'}`);
return `${this.name}{${phrase.join(',')}}`;
}
@@ -357,6 +360,11 @@ class TaskConfig extends Task {
this.logger.info('Config: disabling ttsStream');
cs.disableTtsStream();
}
if (this.hasDisableTtsCache) {
this.logger.info(`set disableTtsCache = ${this.disableTtsCache}`);
cs.disableTtsCache = this.data.disableTtsCache;
}
}
async kill(cs) {

View File

@@ -21,7 +21,7 @@ const {parseUri} = require('drachtio-srf');
const {ANCHOR_MEDIA_ALWAYS,
JAMBONZ_DIAL_PAI_HEADER,
JAMBONES_DIAL_SBC_FOR_REGISTERED_USER} = require('../config');
const { isOnhold, isOpusFirst } = require('../utils/sdp-utils');
const { isOnhold, isOpusFirst, getLeadingCodec } = require('../utils/sdp-utils');
const { normalizeJambones } = require('@jambonz/verb-specifications');
const { selectHostPort } = require('../utils/network');
const { sleepFor } = require('../utils/helpers');
@@ -158,6 +158,7 @@ class TaskDial extends Task {
get canReleaseMedia() {
const keepAnchor = this.data.anchorMedia ||
this.isTranscoding ||
this.cs.isBackGroundListen ||
this.cs.onHoldMusic ||
ANCHOR_MEDIA_ALWAYS ||
@@ -575,7 +576,7 @@ class TaskDial extends Task {
proxy: `sip:${sbcAddress}`,
callingNumber: this.callerId || fromUri.user,
...(this.callerName && {callingName: this.callerName}),
opusFirst: isOpusFirst(this.cs.ep.remote.sdp),
opusFirst: isOpusFirst(this.cs.ep.local.sdp),
isVideoCall: this.cs.ep.remote.sdp.includes('m=video')
};
@@ -772,6 +773,15 @@ class TaskDial extends Task {
}
async _connectSingleDial(cs, sd) {
// start connect with dialed leg, this is the soonest we can identify transcoding
if (this.epOther && sd.ep) {
const codecA = getLeadingCodec(this.epOther.local.sdp);
const codecB = getLeadingCodec(sd.ep.remote.sdp);
this.isTranscoding = (codecA !== codecB);
if (this.isTranscoding) {
this.logger.info(`Dial:_connectSingleDial - transcoding from ${codecA} (A leg) to ${codecB} (B leg)`);
}
}
if (!this.bridged && !this.canReleaseMedia) {
this.logger.debug('Dial:_connectSingleDial bridging endpoints');
if (this.epOther) {
@@ -929,7 +939,6 @@ class TaskDial extends Task {
this.logger.info({err}, 'Dial:_selectSingleDial - Error boosting audio signal');
}
}
/* if we can release the media back to the SBC, do so now */
if (this.canReleaseMedia || this.shouldExitMediaPathEntirely) {
setTimeout(this._releaseMedia.bind(this, cs, sd, this.shouldExitMediaPathEntirely), 200);

View File

@@ -5,12 +5,14 @@ const {
AwsTranscriptionEvents,
AzureTranscriptionEvents,
DeepgramTranscriptionEvents,
GladiaTranscriptionEvents,
SonioxTranscriptionEvents,
CobaltTranscriptionEvents,
IbmTranscriptionEvents,
NvidiaTranscriptionEvents,
JambonzTranscriptionEvents,
AssemblyAiTranscriptionEvents,
HoundifyTranscriptionEvents,
DeepgramfluxTranscriptionEvents,
VoxistTranscriptionEvents,
CartesiaTranscriptionEvents,
@@ -93,6 +95,8 @@ class TaskGather extends SttTask {
get needsStt() { return this.input.includes('speech'); }
get isBackgroundGather() { return this.bugname_prefix === 'background_bargeIn_'; }
get wantsSingleUtterance() {
return this.data.recognizer?.singleUtterance === true;
}
@@ -227,7 +231,9 @@ class TaskGather extends SttTask {
const startListening = async(cs, ep) => {
this._startTimer();
if (this.isContinuousAsr && 0 === this.timeout) this._startAsrTimer();
if (this.isContinuousAsr && 0 === this.timeout && !this.isBackgroundGather) {
this._startAsrTimer();
}
if (this.input.includes('speech') && !this.listenDuringPrompt) {
try {
await this._setSpeechHandlers(cs, ep);
@@ -252,7 +258,7 @@ class TaskGather extends SttTask {
startDtmfListener();
}
this._stopVad();
if (!this.killed) {
if (!this.killed && !this.resolved) {
startListening(cs, ep);
if (this.input.includes('speech') && this.vendor === 'nuance' && this.listenDuringPrompt) {
this.logger.debug('Gather:exec - starting transcription timers after say completes');
@@ -264,19 +270,21 @@ class TaskGather extends SttTask {
};
this.sayTask.span = span;
this.sayTask.ctx = ctx;
this.sayTask.exec(cs, {ep}) // kicked off, _not_ waiting for it to complete
this.sayTask
.exec(cs, {ep}) // kicked off, _not_ waiting for it to complete
.then(() => {
if (this.sayTask.isStreamingTts) return;
this.logger.debug('Gather:exec - nested say task completed');
span.end();
process();
return;
})
.catch((err) => {
process();
});
if (this.sayTask.isStreamingTts && !this.sayTask.closeOnStreamEmpty) {
// if streaming tts, we do not wait for it to complete if it is not closing the stream automatically
process();
} else {
this.sayTask.on('playDone', (err) => {
span.end();
if (err) this.logger.error({err}, 'Gather:exec Error playing tts');
process();
});
}
}
else if (this.playTask) {
@@ -288,7 +296,7 @@ class TaskGather extends SttTask {
startDtmfListener();
}
this._stopVad();
if (!this.killed) {
if (!this.killed && !this.resolved) {
startListening(cs, ep);
if (this.input.includes('speech') && this.vendor === 'nuance' && this.listenDuringPrompt) {
this.logger.debug('Gather:exec - starting transcription timers after play completes');
@@ -300,15 +308,17 @@ class TaskGather extends SttTask {
};
this.playTask.span = span;
this.playTask.ctx = ctx;
this.playTask.exec(cs, {ep}) // kicked off, _not_ waiting for it to complete
this.playTask
.exec(cs, {ep}) // kicked off, _not_ waiting for it to complete
.then(() => {
this.logger.debug('Gather:exec - nested play task completed');
span.end();
process();
return;
})
.catch((err) => {
process();
});
this.playTask.on('playDone', (err) => {
span.end();
if (err) this.logger.error({err}, 'Gather:exec Error playing url');
process();
});
}
else {
if (this.killed) {
@@ -482,6 +492,16 @@ class TaskGather extends SttTask {
this.addCustomEventListener(ep, DeepgramfluxTranscriptionEvents.Error, this._onVendorError.bind(this, cs, ep));
break;
case 'gladia':
this.bugname = `${this.bugname_prefix}gladia_transcribe`;
this.addCustomEventListener(
ep, GladiaTranscriptionEvents.Transcription, this._onTranscription.bind(this, cs, ep));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.Connect, this._onVendorConnect.bind(this, cs, ep));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.ConnectFailure,
this._onVendorConnectFailure.bind(this, cs, ep));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.Error, this._onVendorError.bind(this, cs, ep));
break;
case 'soniox':
this.bugname = `${this.bugname_prefix}soniox_transcribe`;
this.addCustomEventListener(
@@ -559,6 +579,18 @@ class TaskGather extends SttTask {
this._onVendorConnectFailure.bind(this, cs, ep));
break;
case 'houndify':
this.bugname = `${this.bugname_prefix}houndify_transcribe`;
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Transcription,
this._onTranscription.bind(this, cs, ep));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Error,
this._onVendorError.bind(this, cs, ep));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.ConnectFailure,
this._onVendorConnectFailure.bind(this, cs, ep));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Connect,
this._onVendorConnect.bind(this, cs, ep));
break;
case 'voxist':
this.bugname = `${this.bugname_prefix}voxist_transcribe`;
this.addCustomEventListener(ep, VoxistTranscriptionEvents.Transcription,
@@ -849,17 +881,15 @@ class TaskGather extends SttTask {
this._fillerNoiseOn = false; // in a race, if we just started audio it may sneak through here
this.ep.api('uuid_break', this.ep.uuid)
.catch((err) => this.logger.info(err, 'Error killing audio'));
cs.clearTtsStream();
if (cs.isTtsStreamEnabled) cs.clearTtsStream();
}
return;
}
if (this.sayTask && !this.sayTask.killed) {
this.sayTask.removeAllListeners('playDone');
this.sayTask.kill(cs);
this.sayTask = null;
}
if (this.playTask && !this.playTask.killed) {
this.playTask.removeAllListeners('playDone');
this.playTask.kill(cs);
this.playTask = null;
}
@@ -914,7 +944,7 @@ class TaskGather extends SttTask {
evt = this.normalizeTranscription(evt, this.vendor, 1, this.language,
this.shortUtterance, this.data.recognizer.punctuation);
//this.logger.debug({evt, bugname, finished, vendor: this.vendor}, 'Gather:_onTranscription normalized transcript');
this.logger.debug({evt, bugname, finished, vendor: this.vendor}, 'Gather:_onTranscription normalized transcript');
if (evt.alternatives.length === 0) {
this.logger.info({evt}, 'TaskGather:_onTranscription - got empty transcript, continue listening');
@@ -1080,6 +1110,11 @@ class TaskGather extends SttTask {
this.cs.requestor.request('verb:hook', this.partialResultHook, Object.assign({speech: evt},
this.cs.callInfo, httpHeaders));
}
else if (this.vendor === 'deepgramflux' &&
['EagerEndOfTurn', 'TurnResumed'].includes(evt.vendor.evt?.event)) {
this.logger.debug(`Gather:_onTranscription - deepgramflux event detected: ${evt.event}`);
this.performAction({speech: evt, reason: 'speechDetected'}, false);
}
if (this.vendor === 'soniox') {
if (evt.vendor.finalWords.length) {
this.logger.debug({evt}, 'TaskGather:_onTranscription - buffering soniox transcript');
@@ -1126,7 +1161,7 @@ class TaskGather extends SttTask {
}
async _startFallback(cs, ep, evt) {
if (this.canFallback) {
if (this.canFallback()) {
this._stopTranscribing(ep);
try {
this.logger.debug('gather:_startFallback');
@@ -1283,6 +1318,8 @@ class TaskGather extends SttTask {
}
this.resolved = true;
// gather is resolved, prevent any further transcription events while resolve in progress
this.removeCustomEventListeners();
// If bargin is false and ws application return ack to verb:hook
// the gather should not play any audio
this._killAudio(this.cs);

View File

@@ -5,6 +5,17 @@ const moment = require('moment');
const MAX_PLAY_AUDIO_QUEUE_SIZE = 10;
const DTMF_SPAN_NAME = 'dtmf';
function escapeString(str) {
return str
.replace(/\\/g, '\\\\') // Escape backslashes
.replace(/"/g, '\\"') // Escape double quotes
.replace(/[\b]/g, '\\b') // Escape backspace (NOTE: [\b] not \b)
.replace(/\f/g, '\\f') // Escape formfeed
.replace(/\n/g, '\\n') // Escape newlines
.replace(/\r/g, '\\r') // Escape carriage returns
.replace(/\t/g, '\\t'); // Escape tabs
}
class TaskListen extends Task {
constructor(logger, opts, parentTask) {
super(logger, opts);
@@ -16,10 +27,21 @@ class TaskListen extends Task {
this.preconditions = TaskPreconditions.Endpoint;
[
'action', 'auth', 'method', 'url', 'finishOnKey', 'maxLength', 'metadata', 'mixType', 'passDtmf', 'playBeep',
'action', 'auth', 'method', 'url', 'finishOnKey', 'maxLength', 'mixType', 'passDtmf', 'playBeep',
'sampleRate', 'timeout', 'transcribe', 'wsAuth', 'disableBidirectionalAudio', 'channel'
].forEach((k) => this[k] = this.data[k]);
//Escape JSON special characters in metadata
if (this.data.metadata) {
this.metadata = {};
for (const key in this.data.metadata) {
if (this.data.metadata.hasOwnProperty(key)) {
const value = this.data.metadata[key];
this.metadata[key] = typeof value === 'string' ? escapeString(value) : value;
}
}
}
this.mixType = this.mixType || 'mono';
this.sampleRate = this.sampleRate || 8000;
this.earlyMedia = this.data.earlyMedia === true;

View File

@@ -146,8 +146,9 @@ class TaskLlmUltravox_S2S extends Task {
return data;
}
_unregisterHandlers() {
_unregisterHandlers(ep) {
this.removeCustomEventListeners();
ep.removeAllListeners('dtmf');
}
_registerHandlers(ep) {
@@ -155,6 +156,7 @@ class TaskLlmUltravox_S2S extends Task {
this.addCustomEventListener(ep, LlmEvents_Ultravox.ConnectFailure, this._onConnectFailure.bind(this, ep));
this.addCustomEventListener(ep, LlmEvents_Ultravox.Disconnect, this._onDisconnect.bind(this, ep));
this.addCustomEventListener(ep, LlmEvents_Ultravox.ServerEvent, this._onServerEvent.bind(this, ep));
ep.on('dtmf', this._onDtmf.bind(this, ep));
}
async _startListening(cs, ep) {
@@ -189,7 +191,7 @@ class TaskLlmUltravox_S2S extends Task {
/* note: the parent llm verb started the span, which is why this is necessary */
await this.parent.performAction(this.results);
this._unregisterHandlers();
this._unregisterHandlers(ep);
}
async kill(cs) {
@@ -346,6 +348,18 @@ class TaskLlmUltravox_S2S extends Task {
excludeEvents: this.excludeEvents
}, 'TaskLlmUltravox_S2S:_populateEvents');
}
_onDtmf(ep, evt) {
this.logger.info({evt}, 'TaskLlmUltravox_S2S:_onDtmf - DTMF received');
const {dtmf} = evt;
const data = {
type: 'user_text_message',
text: `DTMF received: ${dtmf}`,
urgency: 'immediate'
};
this._api(ep, [ep.uuid, ClientEvent, JSON.stringify(data)])
.catch((err) => this.logger.info({err, evt}, 'TaskLlmUltravox_S2S:_onDtmf - Error sending DTMF as text message'));
}
}
module.exports = TaskLlmUltravox_S2S;

View File

@@ -6,9 +6,21 @@ class TaskPlay extends Task {
super(logger, opts);
this.preconditions = TaskPreconditions.Endpoint;
this.url = this.data.url.includes('?')
? this.data.url.split('?')[0] + '?' + this.data.url.split('?')[1].replaceAll('.', '%2E')
: this.data.url;
//Cleanup URLs that contain a querystring with a . unless that querystring is the filename
// see https://github.com/jambonz/jambonz-feature-server/pull/1293
// and https://github.com/jambonz/jambonz-feature-server/issues/1394 for background
if (this.data.url.includes('?')) {
if (['.mp3', '.wav'].includes(this.data.url.slice(-4))) {
this.url = this.data.url;
}
else {
this.url = this.data.url.split('?')[0] + '?' + this.data.url.split('?')[1].replaceAll('.', '%2E');
}
}
else {
this.url = this.data.url;
}
this.seekOffset = this.data.seekOffset || -1;
this.timeoutSecs = this.data.timeoutSecs || -1;
this.loop = this.data.loop || 1;

View File

@@ -1,9 +1,11 @@
const assert = require('assert');
const TtsTask = require('./tts-task');
const {TaskName, TaskPreconditions} = require('../utils/constants');
const {JAMBONES_SAY_CHUNK_SIZE} = require('../config');
const pollySSMLSplit = require('polly-ssml-split');
const { SpeechCredentialError } = require('../utils/error');
const { SpeechCredentialError, NonFatalTaskError } = require('../utils/error');
const { sleepFor } = require('../utils/helpers');
const { NON_FANTAL_ERRORS } = require('../utils/constants.json');
/**
* Discard unmatching responses:
@@ -30,7 +32,7 @@ const isMatchingEvent = (logger, filename, playbackId, evt) => {
const breakLengthyTextIfNeeded = (logger, text) => {
// As The text can be used for tts streaming, we need to break lengthy text into smaller chunks
// HIGH_WATER_BUFFER_SIZE defined in tts-streaming-buffer.js
const chunkSize = 900;
const chunkSize = JAMBONES_SAY_CHUNK_SIZE;
const isSSML = text.startsWith('<speak>');
const options = {
softLimit: 100,
@@ -120,13 +122,11 @@ class TaskSay extends TtsTask {
}
if (this.isStreamingTts) await this.handlingStreaming(cs, obj);
else await this.handling(cs, obj);
this.emit('playDone');
} catch (error) {
if (error instanceof SpeechCredentialError) {
// if say failed due to speech credentials, alarm is writtern and error notification is sent
// finished this say to move to next task.
this.logger.info({error}, 'Say failed due to SpeechCredentialError, finished!');
this.emit('playDone');
return;
}
throw error;
@@ -147,9 +147,6 @@ class TaskSay extends TtsTask {
await cs.startTtsStream();
cs.requestor?.request('tts:streaming-event', '/streaming-event', {event_type: 'stream_open'})
.catch((err) => this.logger.info({err}, 'TaskSay:handlingStreaming - Error sending'));
if (this.text.length !== 0) {
this.logger.info('TaskSay:handlingStreaming - sending text to TTS stream');
for (const t of this.text) {
@@ -407,11 +404,19 @@ class TaskSay extends TtsTask {
this._playResolve = resolve;
this._playReject = reject;
});
const r = await ep.play(filename);
this.logger.debug({r}, 'Say:exec play result');
if (r.playbackSeconds == null && r.playbackMilliseconds == null && r.playbackLastOffsetPos == null) {
this._playReject(new Error('Playback failed to start'));
try {
const r = await ep.play(filename);
this.logger.debug({r}, 'Say:exec play result');
if (r.playbackSeconds == null && r.playbackMilliseconds == null && r.playbackLastOffsetPos == null) {
this._playReject(new Error('Playback failed to start'));
}
} catch (err) {
if (NON_FANTAL_ERRORS.includes(err.message)) {
throw new NonFatalTaskError(err.message);
}
throw err;
}
try {
// wait for playback-stop event received to confirm if the playback is successful
await this._playPromise;
@@ -449,8 +454,8 @@ class TaskSay extends TtsTask {
const {memberId, confName} = cs;
this.killPlayToConfMember(this.ep, memberId, confName);
} else if (this.isStreamingTts) {
this.logger.debug('TaskSay:kill - clearing TTS stream for streaming audio');
cs.clearTtsStream();
this.logger.debug('TaskSay:kill - stopping TTS stream for streaming audio');
cs.stopTtsStream();
} else {
if (!this.notifiedPlayBackStop) {
this.notifyStatus({event: 'stop-playback'});

View File

@@ -171,7 +171,7 @@ class SttTask extends Task {
try {
this.sttCredentials = await this._initSpeechCredentials(this.cs, this.vendor, this.label);
} catch (error) {
if (this.canFallback) {
if (this.canFallback()) {
this.notifyError(
{
msg: 'ASR error', details:`Invalid vendor ${this.vendor}, Error: ${error}`,
@@ -203,6 +203,56 @@ class SttTask extends Task {
if (cs.hasGlobalSttPunctuation && !this.data.recognizer.punctuation) {
this.data.recognizer.punctuation = cs.globalSttPunctuation;
}
if (this.vendor === 'gladia') {
const { api_key, region } = this.sttCredentials;
const {url} = await this.createGladiaLiveSession({
api_key, region,
model: this.data.recognizer.model || 'solaria-1',
options: this.data.recognizer.gladiaOptions || {}
});
const {host, pathname, search} = new URL(url);
this.sttCredentials.host = host;
this.sttCredentials.path = `${pathname}${search}`;
}
}
async createGladiaLiveSession({
api_key,
region = 'us-west',
model = 'solaria-1',
options = {},
}) {
const url = `https://api.gladia.io/v2/live?region=${region}`;
const response = await fetch(url, {
method: 'POST',
headers: {
'x-gladia-key': api_key,
'Content-Type': 'application/json'
},
body: JSON.stringify({
encoding: 'wav/pcm',
bit_depth: 16,
sample_rate: 8000,
channels: 1,
model,
...options,
messages_config: {
receive_final_transcripts: true,
receive_speech_events: true,
receive_errors: true,
}
})
});
if (!response.ok) {
const error = await response.text();
this.logger.error({url, status: response.status, error}, 'Error creating Gladia live session');
throw new Error(`Error creating Gladia live session: ${response.status} ${error}`);
}
const data = await response.json();
this.logger.debug({url: data.url}, 'Gladia Call registered');
return data;
}
addCustomEventListener(ep, event, handler) {
@@ -210,8 +260,19 @@ class SttTask extends Task {
ep.addCustomEventListener(event, handler);
}
removeCustomEventListeners() {
this.eventHandlers.forEach((h) => h.ep.removeCustomEventListener(h.event, h.handler));
removeCustomEventListeners(ep) {
if (ep) {
// for specific endpoint
this.eventHandlers.filter((h) => h.ep === ep).forEach((h) => {
h.ep.removeCustomEventListener(h.event, h.handler);
});
this.eventHandlers = this.eventHandlers.filter((h) => h.ep !== ep);
return;
} else {
// for all endpoints
this.eventHandlers.forEach((h) => h.ep.removeCustomEventListener(h.event, h.handler));
this.eventHandlers = [];
}
}
async _initSpeechCredentials(cs, vendor, label) {
@@ -225,6 +286,7 @@ class SttTask extends Task {
account_sid: cs.accountSid,
alert_type: AlertType.STT_NOT_PROVISIONED,
vendor,
label,
target_sid: cs.callSid
}).catch((err) => this.logger.info({err}, 'Error generating alert for no stt'));
// the ASR might have fallback configuration, should not done task here.
@@ -279,11 +341,13 @@ class SttTask extends Task {
return credentials;
}
get canFallback() {
canFallback() {
return this.fallbackVendor && this.isHandledByPrimaryProvider && !this.cs.hasFallbackAsr;
}
async _initFallback() {
// ep is optional for gather or any verb that have single ep,
// but transcribe does need as it might has 2 eps
async _initFallback(ep) {
assert(this.fallbackVendor, 'fallback failed without fallbackVendor configuration');
this.logger.info(`Failed to use primary STT provider, fallback to ${this.fallbackVendor}`);
this.isHandledByPrimaryProvider = false;
@@ -296,7 +360,7 @@ class SttTask extends Task {
this.data.recognizer.label = this.label;
this.sttCredentials = await this._initSpeechCredentials(this.cs, this.vendor, this.label);
// cleanup previous listener from previous vendor
this.removeCustomEventListeners();
this.removeCustomEventListeners(ep);
}
async compileHintsForCobalt(ep, hostport, model, token, hints) {
@@ -423,6 +487,7 @@ class SttTask extends Task {
message: 'STT failure reported by vendor',
detail: evt.error,
vendor: this.vendor,
label: this.label,
target_sid: cs.callSid
}).catch((err) => this.logger.info({err}, `Error generating alert for ${this.vendor} connection failure`));
}
@@ -436,6 +501,7 @@ class SttTask extends Task {
alert_type: AlertType.STT_FAILURE,
message: `Failed connecting to ${this.vendor} speech recognizer: ${reason}`,
vendor: this.vendor,
label: this.label,
target_sid: cs.callSid
}).catch((err) => this.logger.info({err}, `Error generating alert for ${this.vendor} connection failure`));
}

View File

@@ -6,6 +6,7 @@ const {
AwsTranscriptionEvents,
AzureTranscriptionEvents,
DeepgramTranscriptionEvents,
GladiaTranscriptionEvents,
DeepgramfluxTranscriptionEvents,
SonioxTranscriptionEvents,
CobaltTranscriptionEvents,
@@ -14,6 +15,7 @@ const {
JambonzTranscriptionEvents,
TranscribeStatus,
AssemblyAiTranscriptionEvents,
HoundifyTranscriptionEvents,
VoxistTranscriptionEvents,
CartesiaTranscriptionEvents,
OpenAITranscriptionEvents,
@@ -68,6 +70,9 @@ class TaskTranscribe extends SttTask {
this._bufferedTranscripts = [ [], [] ]; // for channel 1 and 2
this.bugname_prefix = 'transcribe_';
this.paused = false;
// fallback flags
this.isHandledByPrimaryProviderForEp1 = true;
this.isHandledByPrimaryProviderForEp2 = true;
}
get name() { return TaskName.Transcribe; }
@@ -254,6 +259,18 @@ class TaskTranscribe extends SttTask {
this._onVendorConnectFailure.bind(this, cs, ep, channel));
this.addCustomEventListener(ep, DeepgramfluxTranscriptionEvents.Error, this._onVendorError.bind(this, cs, ep));
break;
case 'gladia':
this.bugname = `${this.bugname_prefix}gladia_transcribe`;
this.addCustomEventListener(ep, GladiaTranscriptionEvents.Transcription,
this._onTranscription.bind(this, cs, ep, channel));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.Connect,
this._onVendorConnect.bind(this, cs, ep));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.ConnectFailure,
this._onVendorConnectFailure.bind(this, cs, ep, channel));
this.addCustomEventListener(ep, GladiaTranscriptionEvents.Error, this._onVendorError.bind(this, cs, ep));
break;
case 'soniox':
this.bugname = `${this.bugname_prefix}soniox_transcribe`;
@@ -324,6 +341,18 @@ class TaskTranscribe extends SttTask {
this._onVendorConnectFailure.bind(this, cs, ep, channel));
break;
case 'houndify':
this.bugname = `${this.bugname_prefix}houndify_transcribe`;
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Transcription,
this._onTranscription.bind(this, cs, ep, channel));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Error,
this._onVendorError.bind(this, cs, ep));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.ConnectFailure,
this._onVendorConnectFailure.bind(this, cs, ep, channel));
this.addCustomEventListener(ep, HoundifyTranscriptionEvents.Connect,
this._onVendorConnect.bind(this, cs, ep));
break;
case 'voxist':
this.bugname = `${this.bugname_prefix}voxist_transcribe`;
this.addCustomEventListener(ep, VoxistTranscriptionEvents.Transcription,
@@ -750,7 +779,7 @@ class TaskTranscribe extends SttTask {
}
async _startFallback(cs, _ep, evt) {
if (this.canFallback) {
if (this.canFallback(_ep)) {
_ep.stopTranscription({
vendor: this.vendor,
bugname: this.bugname,
@@ -760,7 +789,7 @@ class TaskTranscribe extends SttTask {
try {
this.notifyError({ msg: 'ASR error',
details:`STT Vendor ${this.vendor} error: ${evt.error || evt.reason}`, failover: 'in progress'});
await this._initFallback();
await this._initFallback(_ep);
let channel = 1;
if (this.ep !== _ep) {
channel = 2;
@@ -869,6 +898,41 @@ class TaskTranscribe extends SttTask {
if (this._asrTimer) clearTimeout(this._asrTimer);
this._asrTimer = null;
}
// We need to keep track the fallback is happened for each endpoint
// override the canFallback and _initFallback methods to make sure that
// we only fallback once per endpoint
// we want to keep track this on task level instead of endpoint level
// because the endpoint instance is used across multiple tasks.
canFallback(ep) {
let isHandledByPrimaryProvider = this.isHandledByPrimaryProvider;
if (ep === this.ep) {
isHandledByPrimaryProvider = this.isHandledByPrimaryProviderForEp1;
} else if (ep === this.ep2) {
isHandledByPrimaryProvider = this.isHandledByPrimaryProviderForEp2;
}
const isOneOfEndpointAlreadyFallenBack = !!this.ep && !!this.ep2 &&
this.isHandledByPrimaryProviderForEp1 !== this.isHandledByPrimaryProviderForEp2;
// fallback is configured
return this.fallbackVendor &&
// has this endpoint already fallen back
isHandledByPrimaryProvider &&
// in global level, is there any fallback is already happened
// one fallen endpoint will mark cs.hasFallbackAsr to true,
// so if one endpoint was fallen, the other endpoint would be able to fallback.
(isOneOfEndpointAlreadyFallenBack || !this.cs.hasFallbackAsr);
}
_initFallback(ep) {
if (ep === this.ep) {
this.isHandledByPrimaryProviderForEp1 = false;
} else if (ep === this.ep2) {
this.isHandledByPrimaryProviderForEp2 = false;
}
return super._initFallback(ep);
}
}
module.exports = TaskTranscribe;

View File

@@ -41,6 +41,10 @@ class TtsTask extends Task {
async exec(cs) {
super.exec(cs);
// update disableTtsCache from call session if not set in task
if (this.data.disableTtsCache == null) {
this.disableTtsCache = cs.disableTtsCache;
}
if (cs.synthesizer) {
this.options = {...cs.synthesizer.options, ...this.options};
this.data.synthesizer = this.data.synthesizer || {};
@@ -81,54 +85,67 @@ class TtsTask extends Task {
}
async setTtsStreamingChannelVars(vendor, language, voice, credentials, ep) {
const {api_key, model_id, custom_tts_streaming_url, auth_token} = credentials;
let obj;
const {api_key, model_id, api_uri, custom_tts_streaming_url, auth_token, options} = credentials;
// api_key, model_id, api_uri, custom_tts_streaming_url, and auth_token are encoded in the credentials
// allow them to be overriden via config, using options
// give preference to options passed in via config
const parsed_options = options ? JSON.parse(options) : {};
const local_options = {...parsed_options, ...this.options};
const local_voice_settings = {...(parsed_options.voice_settings || {}), ...(this.options.voice_settings || {})};
const local_api_key = local_options.api_key ?? api_key;
const local_model_id = local_options.model_id ?? model_id;
const local_api_uri = local_options.api_uri ?? api_uri;
const local_custom_tts_streaming_url = local_options.custom_tts_streaming_url ?? custom_tts_streaming_url;
const local_auth_token = local_options.auth_token ?? auth_token;
this.logger.debug(`setTtsStreamingChannelVars: vendor: ${vendor}, language: ${language}, voice: ${voice}`);
let obj;
switch (vendor) {
case 'deepgram':
obj = {
DEEPGRAM_API_KEY: api_key,
DEEPGRAM_API_KEY: local_api_key,
DEEPGRAM_TTS_STREAMING_MODEL: voice
};
break;
case 'cartesia':
obj = {
CARTESIA_API_KEY: api_key,
CARTESIA_TTS_STREAMING_MODEL_ID: model_id,
CARTESIA_API_KEY: local_api_key,
CARTESIA_TTS_STREAMING_MODEL_ID: local_model_id,
CARTESIA_TTS_STREAMING_VOICE_ID: voice,
CARTESIA_TTS_STREAMING_LANGUAGE: language || 'en',
};
break;
case 'elevenlabs':
const {stability, similarity_boost, use_speaker_boost, style, speed} = this.options.voice_settings || {};
// eslint-disable-next-line max-len
const {stability, similarity_boost, use_speaker_boost, style, speed} = local_voice_settings || {};
obj = {
ELEVENLABS_API_KEY: api_key,
ELEVENLABS_TTS_STREAMING_MODEL_ID: model_id,
ELEVENLABS_API_KEY: local_api_key,
...(api_uri && {ELEVENLABS_API_URI: local_api_uri}),
ELEVENLABS_TTS_STREAMING_MODEL_ID: local_model_id,
ELEVENLABS_TTS_STREAMING_VOICE_ID: voice,
// 20/12/2024 - only eleven_turbo_v2_5 support multiple language
...(['eleven_turbo_v2_5'].includes(model_id) && {ELEVENLABS_TTS_STREAMING_LANGUAGE: language}),
...(['eleven_turbo_v2_5'].includes(local_model_id) && {ELEVENLABS_TTS_STREAMING_LANGUAGE: language}),
...(stability && {ELEVENLABS_TTS_STREAMING_VOICE_SETTINGS_STABILITY: stability}),
...(similarity_boost && {ELEVENLABS_TTS_STREAMING_VOICE_SETTINGS_SIMILARITY_BOOST: similarity_boost}),
...(use_speaker_boost && {ELEVENLABS_TTS_STREAMING_VOICE_SETTINGS_USE_SPEAKER_BOOST: use_speaker_boost}),
...(style && {ELEVENLABS_TTS_STREAMING_VOICE_SETTINGS_STYLE: style}),
// speed has value 0.7 to 1.2, 1.0 is default, make sure we send the value event it's 0
...(speed !== null && speed !== undefined && {ELEVENLABS_TTS_STREAMING_VOICE_SETTINGS_SPEED: `${speed}`}),
...(this.options.pronunciation_dictionary_locators &&
Array.isArray(this.options.pronunciation_dictionary_locators) && {
...(local_options.pronunciation_dictionary_locators &&
Array.isArray(local_options.pronunciation_dictionary_locators) && {
ELEVENLABS_TTS_STREAMING_PRONUNCIATION_DICTIONARY_LOCATORS:
JSON.stringify(this.options.pronunciation_dictionary_locators)
JSON.stringify(local_options.pronunciation_dictionary_locators)
}),
};
break;
case 'rimelabs':
const {
pauseBetweenBrackets, phonemizeBetweenBrackets, inlineSpeedAlpha, speedAlpha, reduceLatency
} = this.options;
} = local_options;
obj = {
RIMELABS_API_KEY: api_key,
RIMELABS_TTS_STREAMING_MODEL_ID: model_id,
RIMELABS_API_KEY: local_api_key,
RIMELABS_TTS_STREAMING_MODEL_ID: local_model_id,
RIMELABS_TTS_STREAMING_VOICE_ID: voice,
RIMELABS_TTS_STREAMING_LANGUAGE: language || 'en',
...(pauseBetweenBrackets && {RIMELABS_TTS_STREAMING_PAUSE_BETWEEN_BRACKETS: pauseBetweenBrackets}),
@@ -143,8 +160,8 @@ class TtsTask extends Task {
if (vendor.startsWith('custom:')) {
const use_tls = custom_tts_streaming_url.startsWith('wss://');
obj = {
CUSTOM_TTS_STREAMING_HOST: custom_tts_streaming_url.replace(/^(ws|wss):\/\//, ''),
CUSTOM_TTS_STREAMING_API_KEY: auth_token,
CUSTOM_TTS_STREAMING_HOST: local_custom_tts_streaming_url.replace(/^(ws|wss):\/\//, ''),
CUSTOM_TTS_STREAMING_API_KEY: local_auth_token,
CUSTOM_TTS_STREAMING_VOICE_ID: voice,
CUSTOM_TTS_STREAMING_LANGUAGE: language || 'en',
CUSTOM_TTS_STREAMING_USE_TLS: use_tls
@@ -257,15 +274,16 @@ class TtsTask extends Task {
account_sid,
alert_type: AlertType.TTS_NOT_PROVISIONED,
vendor,
label,
target_sid: cs.callSid
}).catch((err) => this.logger.info({err}, 'Error generating alert for no tts'));
throw new SpeechCredentialError('no provisioned speech credentials for TTS');
}
/* produce an audio segment from the provided text */
const generateAudio = async(text) => {
if (this.killed) return;
if (text.startsWith('silence_stream://')) return text;
const generateAudio = async(text, index) => {
if (this.killed) return {index, filePath: null};
if (text.startsWith('silence_stream://')) return {index, filePath: text};
/* otel: trace time for tts */
if (!preCache && !this._disableTracing) {
@@ -294,7 +312,6 @@ class TtsTask extends Task {
renderForCaching: preCache
});
if (!filePath.startsWith('say:')) {
this.playbackIds.push(null);
this.logger.debug(`Say: file ${filePath}, served from cache ${servedFromCache}`);
if (filePath) cs.trackTmpFile(filePath);
if (this.otelSpan) {
@@ -322,10 +339,11 @@ class TtsTask extends Task {
'id': this.id
});
}
return {index, filePath, playbackId: null};
}
else {
this.playbackIds.push(extractPlaybackId(filePath));
this.logger.debug({playbackIds: this.playbackIds}, 'Say: a streaming tts api will be used');
const playbackId = extractPlaybackId(filePath);
this.logger.debug('Say: a streaming tts api will be used');
const modifiedPath = filePath.replace('say:{', `say:{session-uuid=${ep.uuid},`);
this.notifyStatus({
event: 'synthesized-audio',
@@ -334,9 +352,8 @@ class TtsTask extends Task {
servedFromCache,
'id': this.id
});
return modifiedPath;
return {index, filePath: modifiedPath, playbackId};
}
return filePath;
} catch (err) {
this.logger.info({err}, 'Error synthesizing tts');
if (this.otelSpan) this.otelSpan.end();
@@ -344,6 +361,7 @@ class TtsTask extends Task {
account_sid: cs.accountSid,
alert_type: AlertType.TTS_FAILURE,
vendor,
label,
detail: err.message,
target_sid: cs.callSid
}).catch((err) => this.logger.info({err}, 'Error generating alert for tts failure'));
@@ -351,8 +369,20 @@ class TtsTask extends Task {
}
};
const arr = this.text.map((t) => (this._validateURL(t) ? t : generateAudio(t)));
return (await Promise.all(arr)).filter((fp) => fp && fp.length);
// process all text segments in parallel will cause ordering issue
// so we attach index to each promise result and sort them later
const arr = this.text.map((t, index) => (this._validateURL(t) ?
Promise.resolve({index, filePath: t, playbackId: null}) : generateAudio(t, index)));
const results = await Promise.all(arr);
const sorted = results.sort((a, b) => a.index - b.index);
return sorted
.filter((fp) => fp.filePath && fp.filePath.length)
.map((r) => {
this.playbackIds.push(r.playbackId);
return r.filePath;
});
} catch (err) {
this.logger.info(err, 'TaskSay:exec error');
throw err;

View File

@@ -405,19 +405,21 @@ module.exports = (logger) => {
if (ep.amd) {
vendor = ep.amd.vendor;
ep.amd.stopAllTimers();
ep.removeListener(GoogleTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(GoogleTranscriptionEvents.EndOfUtterance, ep.amd.EndOfUtteranceHandler);
ep.removeListener(AwsTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(AzureTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(AzureTranscriptionEvents.NoSpeechDetected, ep.amd.noSpeechHandler);
ep.removeListener(NuanceTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(DeepgramTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(SonioxTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(IbmTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(NvidiaTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(JambonzTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
try {
ep.removeListener(GoogleTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(GoogleTranscriptionEvents.EndOfUtterance, ep.amd.EndOfUtteranceHandler);
ep.removeListener(AwsTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(AzureTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(AzureTranscriptionEvents.NoSpeechDetected, ep.amd.noSpeechHandler);
ep.removeListener(NuanceTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(DeepgramTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(SonioxTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(IbmTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(NvidiaTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
ep.removeListener(JambonzTranscriptionEvents.Transcription, ep.amd.transcriptionHandler);
} catch (error) {
logger.error('Unable to Remove AMD Listener', error);
}
ep.amd = null;
}

View File

@@ -103,6 +103,12 @@
"Connect": "deepgramflux_transcribe::connect",
"Error": "deepgramflux_transcribe::error"
},
"GladiaTranscriptionEvents": {
"Transcription": "gladia_transcribe::transcription",
"ConnectFailure": "gladia_transcribe::connect_failed",
"Connect": "gladia_transcribe::connect",
"Error": "gladia_transcribe::error"
},
"SonioxTranscriptionEvents": {
"Transcription": "soniox_transcribe::transcription",
"Error": "soniox_transcribe::error"
@@ -169,6 +175,12 @@
"ConnectFailure": "assemblyai_transcribe::connect_failed",
"Connect": "assemblyai_transcribe::connect"
},
"HoundifyTranscriptionEvents": {
"Transcription": "houndify_transcribe::transcription",
"Error": "houndify_transcribe::error",
"ConnectFailure": "houndify_transcribe::connect_failed",
"Connect": "houndify_transcribe::connect"
},
"VoxistTranscriptionEvents": {
"Transcription": "voxist_transcribe::transcription",
"Error": "voxist_transcribe::error",
@@ -323,7 +335,8 @@
"Empty": "tts_streaming::empty",
"Pause": "tts_streaming::pause",
"Resume": "tts_streaming::resume",
"ConnectFailure": "tts_streaming::connect_failed"
"ConnectFailure": "tts_streaming::connect_failed",
"Connected": "tts_streaming::connected"
},
"TtsStreamingConnectionStatus": {
"NotConnected": "not_connected",
@@ -343,5 +356,8 @@
"WS_CLOSE_CODES": {
"NormalClosure": 1000,
"GoingAway": 1001
}
},
"NON_FANTAL_ERRORS": [
"File Not Found"
]
}

View File

@@ -81,6 +81,11 @@ const speechMapper = (cred) => {
obj.deepgram_tts_uri = o.deepgram_tts_uri;
obj.deepgram_stt_use_tls = o.deepgram_stt_use_tls;
}
else if ('gladia' === obj.vendor) {
const o = JSON.parse(decrypt(credential));
obj.api_key = o.api_key;
obj.region = o.region;
}
else if ('deepgramflux' === obj.vendor) {
const o = JSON.parse(decrypt(credential));
obj.api_key = o.api_key;
@@ -101,6 +106,7 @@ const speechMapper = (cred) => {
const o = JSON.parse(decrypt(credential));
obj.api_key = o.api_key;
obj.model_id = o.model_id;
obj.api_uri = o.api_uri;
obj.options = o.options;
}
else if ('playht' === obj.vendor) {
@@ -141,6 +147,13 @@ const speechMapper = (cred) => {
obj.api_key = o.api_key;
obj.service_version = o.service_version;
}
else if ('houndify' === obj.vendor) {
const o = JSON.parse(decrypt(credential));
obj.client_id = o.client_id;
obj.client_key = o.client_key;
obj.user_id = o.user_id;
obj.houndify_server_uri = o.houndify_server_uri;
}
else if ('voxist' === obj.vendor) {
const o = JSON.parse(decrypt(credential));
obj.api_key = o.api_key;

View File

@@ -191,7 +191,7 @@ class HttpRequestor extends BaseRequestor {
method,
headers: hdrs,
...('POST' === method && {body: JSON.stringify(payload)}),
timeout: HTTP_TIMEOUT,
headersTimeout: HTTP_TIMEOUT,
followRedirects: false
};

View File

@@ -100,6 +100,39 @@ module.exports = (logger) => {
else if (K8S) {
lifecycleEmitter.scaleIn = () => process.exit(0);
}
else {
process.on('SIGUSR1', () => {
logger.info('received SIGUSR1: begin drying up calls for scale-in');
dryUpCalls = true;
const {srf} = require('../..');
const {writeSystemAlerts} = srf.locals;
if (writeSystemAlerts) {
const {SystemState, FEATURE_SERVER} = require('./constants');
writeSystemAlerts({
system_component: FEATURE_SERVER,
state : SystemState.GracefulShutdownInProgress,
fields : {
detail: `feature-server with process_id ${process.pid} shutdown in progress`,
host: srf.locals?.ipv4
}
});
}
pingProxies(srf);
// if we have zero calls, we can complete the scale-in right
setTimeout(() => {
const calls = srf.locals.sessionTracker.count;
if (calls === 0) {
logger.info('scale-in can complete immediately as we have no calls in progress');
process.exit(0);
}
else {
logger.info(`${calls} calls in progress; scale-in will complete when they are done`);
}
}, 5000);
});
}
async function pingProxies(srf) {

View File

@@ -55,11 +55,28 @@ const extractSdpMedia = (sdp) => {
}
};
const getLeadingCodec = (sdp) => {
if (!sdp) {
return null;
}
const parsed = sdpTransform.parse(sdp);
const audio = parsed.media?.find((m) => m.type === 'audio');
if (!audio) {
return null;
}
return audio.rtp?.[0]?.codec || null;
};
module.exports = {
isOnhold,
mergeSdpMedia,
extractSdpMedia,
isOpusFirst,
makeOpusFirst,
removeVideoSdp
removeVideoSdp,
getLeadingCodec
};

View File

@@ -127,7 +127,6 @@ class SttLatencyCalculator extends Emitter {
calculateLatency() {
if (!this.isRunning) {
this.logger.debug('Latency calculator is not running, cannot calculate latency, returning default values');
return null;
}

View File

@@ -131,6 +131,43 @@ const stickyVars = {
'OPENAI_TURN_DETECTION_PREFIX_PADDING_MS',
'OPENAI_TURN_DETECTION_SILENCE_DURATION_MS',
],
houndify: [
'HOUNDIFY_CLIENT_ID',
'HOUNDIFY_CLIENT_KEY',
'HOUNDIFY_USER_ID',
'HOUNDIFY_MAX_SILENCE_SECONDS',
'HOUNDIFY_MAX_SILENCE_AFTER_FULL_QUERY_SECONDS',
'HOUNDIFY_MAX_SILENCE_AFTER_PARTIAL_QUERY_SECONDS',
'HOUNDIFY_VAD_SENSITIVITY',
'HOUNDIFY_VAD_TIMEOUT',
'HOUNDIFY_VAD_MODE',
'HOUNDIFY_VAD_VOICE_MS',
'HOUNDIFY_VAD_SILENCE_MS',
'HOUNDIFY_VAD_DEBUG',
'HOUNDIFY_AUDIO_FORMAT',
'HOUNDIFY_ENABLE_NOISE_REDUCTION',
'HOUNDIFY_AUDIO_ENDPOINT',
'HOUNDIFY_ENABLE_PROFANITY_FILTER',
'HOUNDIFY_ENABLE_PUNCTUATION',
'HOUNDIFY_ENABLE_CAPITALIZATION',
'HOUNDIFY_CONFIDENCE_THRESHOLD',
'HOUNDIFY_ENABLE_DISFLUENCY_FILTER',
'HOUNDIFY_MAX_RESULTS',
'HOUNDIFY_ENABLE_WORD_TIMESTAMPS',
'HOUNDIFY_MAX_ALTERNATIVES',
'HOUNDIFY_PARTIAL_TRANSCRIPT_INTERVAL',
'HOUNDIFY_SESSION_TIMEOUT',
'HOUNDIFY_CONNECTION_TIMEOUT',
'HOUNDIFY_LATITUDE',
'HOUNDIFY_LONGITUDE',
'HOUNDIFY_CITY',
'HOUNDIFY_STATE',
'HOUNDIFY_COUNTRY',
'HOUNDIFY_TIMEZONE',
'HOUNDIFY_DOMAIN',
'HOUNDIFY_CUSTOM_VOCABULARY',
'HOUNDIFY_LANGUAGE_MODEL'
],
};
/**
@@ -339,6 +376,30 @@ const normalizeDeepgram = (evt, channel, language, shortUtterance) => {
};
};
const normalizeGladia = (evt, channel, language, shortUtterance) => {
const copy = JSON.parse(JSON.stringify(evt));
// Handle Gladia transcript format
if (evt.type === 'transcript' && evt.data && evt.data.utterance) {
const utterance = evt.data.utterance;
const alternatives = [{
confidence: utterance.confidence || 0,
transcript: utterance.text || '',
}];
return {
language_code: utterance.language || language,
channel_tag: channel,
is_final: evt.data.is_final || false,
alternatives,
vendor: {
name: 'gladia',
evt: copy
}
};
}
};
const normalizeDeepgramFlux = (evt, channel, language) => {
const copy = JSON.parse(JSON.stringify(evt));
@@ -582,6 +643,30 @@ const normalizeAssemblyAi = (evt, channel, language) => {
};
};
const normalizeHoundify = (evt, channel, language) => {
const copy = JSON.parse(JSON.stringify(evt));
const alternatives = [];
const is_final = evt.ResultsAreFinal && evt.ResultsAreFinal[0] === true;
if (evt.Disambiguation && evt.Disambiguation.ChoiceData && evt.Disambiguation.ChoiceData.length > 0) {
// Handle Houndify Voice Search Result format
const choiceData = evt.Disambiguation.ChoiceData[0];
alternatives.push({
confidence: choiceData.ConfidenceScore || choiceData.ASRConfidence || 0.0,
transcript: choiceData.FormattedTranscription || choiceData.Transcription || '',
});
}
return {
language_code: language,
channel_tag: channel,
is_final,
alternatives,
vendor: {
name: 'houndify',
evt: copy
}
};
};
const normalizeVoxist = (evt, channel, language) => {
const copy = JSON.parse(JSON.stringify(evt));
return {
@@ -681,6 +766,8 @@ module.exports = (logger) => {
switch (vendor) {
case 'deepgram':
return normalizeDeepgram(evt, channel, language, shortUtterance);
case 'gladia':
return normalizeGladia(evt, channel, language, shortUtterance);
case 'deepgramflux':
return normalizeDeepgramFlux(evt, channel, language, shortUtterance);
case 'microsoft':
@@ -701,6 +788,8 @@ module.exports = (logger) => {
return normalizeCobalt(evt, channel, language);
case 'assemblyai':
return normalizeAssemblyAi(evt, channel, language, shortUtterance);
case 'houndify':
return normalizeHoundify(evt, channel, language, shortUtterance);
case 'voxist':
return normalizeVoxist(evt, channel, language);
case 'cartesia':
@@ -831,7 +920,7 @@ module.exports = (logger) => {
...(rOpts.initialSpeechTimeoutMs > 0 &&
{AZURE_INITIAL_SPEECH_TIMEOUT_MS: rOpts.initialSpeechTimeoutMs}),
...(rOpts.requestSnr && {AZURE_REQUEST_SNR: 1}),
...(rOpts.audioLogging && {AZURE_AUDIO_LOGGING: 1}),
...(azureOptions.audioLogging && {AZURE_AUDIO_LOGGING: 1}),
...{AZURE_USE_OUTPUT_FORMAT_DETAILED: 1},
...(azureOptions.speechSegmentationSilenceTimeoutMs &&
{AZURE_SPEECH_SEGMENTATION_SILENCE_TIMEOUT_MS: azureOptions.speechSegmentationSilenceTimeoutMs}),
@@ -996,6 +1085,13 @@ module.exports = (logger) => {
...(keyterms && keyterms.length > 0 && {DEEPGRAMFLUX_SPEECH_KEYTERMS: keyterms.join(',')}),
};
}
else if ('gladia' === vendor) {
const {host, path} = sttCredentials;
opts = {
GLADIA_SPEECH_HOST: host,
GLADIA_SPEECH_PATH: path,
};
}
else if ('soniox' === vendor) {
const {sonioxOptions = {}} = rOpts;
const {storage = {}} = sonioxOptions;
@@ -1122,6 +1218,61 @@ module.exports = (logger) => {
{ASSEMBLYAI_WORD_BOOST: JSON.stringify(rOpts.hints)})
};
}
else if ('houndify' === vendor) {
const {
latitude, longitude, city, state, country, timeZone, domain, audioEndpoint,
maxSilenceSeconds, maxSilenceAfterFullQuerySeconds, maxSilenceAfterPartialQuerySeconds,
vadSensitivity, vadTimeout, vadMode, vadVoiceMs, vadSilenceMs, vadDebug,
audioFormat, enableNoiseReduction, enableProfanityFilter, enablePunctuation,
enableCapitalization, confidenceThreshold, enableDisfluencyFilter,
maxResults, enableWordTimestamps, maxAlternatives, partialTranscriptInterval,
sessionTimeout, connectionTimeout, customVocabulary, languageModel,
requestInfo, sampleRate
} = rOpts.houndifyOptions || {};
const audioEndpointUri = audioEndpoint || sttCredentials.houndify_server_uri;
opts = {
...opts,
HOUNDIFY_CLIENT_ID: sttCredentials.client_id,
HOUNDIFY_CLIENT_KEY: sttCredentials.client_key,
HOUNDIFY_USER_ID: sttCredentials.user_id,
HOUNDIFY_MAX_SILENCE_SECONDS: maxSilenceSeconds || 5,
HOUNDIFY_MAX_SILENCE_AFTER_FULL_QUERY_SECONDS: maxSilenceAfterFullQuerySeconds || 1,
HOUNDIFY_MAX_SILENCE_AFTER_PARTIAL_QUERY_SECONDS: maxSilenceAfterPartialQuerySeconds || 1.5,
...(vadSensitivity && {HOUNDIFY_VAD_SENSITIVITY: vadSensitivity}),
...(vadTimeout && {HOUNDIFY_VAD_TIMEOUT: vadTimeout}),
...(vadMode && {HOUNDIFY_VAD_MODE: vadMode}),
...(vadVoiceMs && {HOUNDIFY_VAD_VOICE_MS: vadVoiceMs}),
...(vadSilenceMs && {HOUNDIFY_VAD_SILENCE_MS: vadSilenceMs}),
...(vadDebug && {HOUNDIFY_VAD_DEBUG: vadDebug}),
...(audioFormat && {HOUNDIFY_AUDIO_FORMAT: audioFormat}),
...(enableNoiseReduction && {HOUNDIFY_ENABLE_NOISE_REDUCTION: enableNoiseReduction}),
...(enableProfanityFilter && {HOUNDIFY_ENABLE_PROFANITY_FILTER: enableProfanityFilter}),
...(enablePunctuation && {HOUNDIFY_ENABLE_PUNCTUATION: enablePunctuation}),
...(enableCapitalization && {HOUNDIFY_ENABLE_CAPITALIZATION: enableCapitalization}),
...(confidenceThreshold && {HOUNDIFY_CONFIDENCE_THRESHOLD: confidenceThreshold}),
...(enableDisfluencyFilter && {HOUNDIFY_ENABLE_DISFLUENCY_FILTER: enableDisfluencyFilter}),
...(maxResults && {HOUNDIFY_MAX_RESULTS: maxResults}),
...(enableWordTimestamps && {HOUNDIFY_ENABLE_WORD_TIMESTAMPS: enableWordTimestamps}),
...(maxAlternatives && {HOUNDIFY_MAX_ALTERNATIVES: maxAlternatives}),
...(partialTranscriptInterval && {HOUNDIFY_PARTIAL_TRANSCRIPT_INTERVAL: partialTranscriptInterval}),
...(sessionTimeout && {HOUNDIFY_SESSION_TIMEOUT: sessionTimeout}),
...(connectionTimeout && {HOUNDIFY_CONNECTION_TIMEOUT: connectionTimeout}),
...(latitude && {HOUNDIFY_LATITUDE: latitude}),
...(longitude && {HOUNDIFY_LONGITUDE: longitude}),
...(city && {HOUNDIFY_CITY: city}),
...(state && {HOUNDIFY_STATE: state}),
...(country && {HOUNDIFY_COUNTRY: country}),
...(timeZone && {HOUNDIFY_TIMEZONE: timeZone}),
...(domain && {HOUNDIFY_DOMAIN: domain}),
...(audioEndpointUri && {HOUNDIFY_AUDIO_ENDPOINT: audioEndpointUri}),
...(customVocabulary && {HOUNDIFY_CUSTOM_VOCABULARY:
Array.isArray(customVocabulary) ? customVocabulary.join(',') : customVocabulary}),
...(languageModel && {HOUNDIFY_LANGUAGE_MODEL: languageModel}),
...(requestInfo && {HOUNDIFY_REQUEST_INFO: JSON.stringify(requestInfo)}),
...(sampleRate && {HOUNDIFY_SAMPLING_RATE: sampleRate}),
};
}
else if ('voxist' === vendor) {
opts = {
...opts,

View File

@@ -80,7 +80,7 @@ class TtsStreamingBuffer extends Emitter {
clearTimeout(this.timer);
this.removeCustomEventListeners();
if (this.ep) {
this._api(this.ep, [this.ep.uuid, 'close'])
this._api(this.ep, [this.ep.uuid, 'stop'])
.catch((err) =>
this.logger.info({ err }, 'TtsStreamingBuffer:stop Error closing TTS streaming')
);
@@ -163,7 +163,6 @@ class TtsStreamingBuffer extends Emitter {
}
clear() {
this.logger.debug('TtsStreamingBuffer:clear');
if (this._connectionStatus !== TtsStreamingConnectionStatus.Connected) return;
clearTimeout(this.timer);
this._api(this.ep, [this.ep.uuid, 'clear']).catch((err) =>
@@ -193,10 +192,7 @@ class TtsStreamingBuffer extends Emitter {
this.logger.debug('TtsStreamingBuffer:_feedQueue TTS stream is not open or no endpoint available');
return;
}
if (
this._connectionStatus === TtsStreamingConnectionStatus.NotConnected ||
this._connectionStatus === TtsStreamingConnectionStatus.Failed
) {
if (this._connectionStatus !== TtsStreamingConnectionStatus.Connected) {
this.logger.debug('TtsStreamingBuffer:_feedQueue TTS stream is not connected');
return;
}
@@ -224,7 +220,8 @@ class TtsStreamingBuffer extends Emitter {
this.queue.shift();
}
// Immediately send all accumulated text (ignoring sentence boundaries).
if (flushText.length > 0) {
// Skip sending if flushText is only whitespace.
if (flushText.length > 0 && !isWhitespace(flushText)) {
const modifiedFlushText = flushText.replace(/\n\n/g, '\n \n');
try {
await this._api(this.ep, [this.ep.uuid, 'send', modifiedFlushText]);
@@ -278,6 +275,14 @@ class TtsStreamingBuffer extends Emitter {
}
const chunk = combinedText.slice(0, chunkEnd);
// Check if the chunk is only whitespace before processing the queue
// If so, wait for more meaningful text
if (isWhitespace(chunk)) {
this.logger.debug('TtsStreamingBuffer:_feedQueue chunk is only whitespace, waiting for more text');
this._setTimerIfNeeded();
return;
}
// Now we iterate over the queue items
// and deduct their lengths until we've accounted for chunkEnd characters.
let remaining = chunkEnd;
@@ -301,6 +306,14 @@ class TtsStreamingBuffer extends Emitter {
this.bufferedLength -= chunkEnd;
const modifiedChunk = chunk.replace(/\n\n/g, '\n \n');
if (isWhitespace(modifiedChunk)) {
this.logger.debug('TtsStreamingBuffer:_feedQueue modified chunk is only whitespace, restoring queue');
this.queue.unshift({ type: 'text', value: chunk });
this.bufferedLength += chunkEnd;
this._setTimerIfNeeded();
return;
}
this.logger.debug(`TtsStreamingBuffer:_feedQueue sending chunk to tts: ${modifiedChunk}`);
try {
@@ -349,6 +362,7 @@ class TtsStreamingBuffer extends Emitter {
if (this.queue.length > 0) {
await this._feedQueue();
}
this.emit(TtsStreamingEvents.Connected, { vendor });
}
_onConnectFailure(vendor) {
@@ -399,6 +413,7 @@ class TtsStreamingBuffer extends Emitter {
removeCustomEventListeners() {
this.eventHandlers.forEach((h) => h.ep.removeCustomEventListener(h.event, h.handler));
this.eventHandlers.length = 0;
}
_initHandlers(ep) {
@@ -422,7 +437,15 @@ class TtsStreamingBuffer extends Emitter {
const findSentenceBoundary = (text, limit) => {
// Look for punctuation or double newline that signals sentence end.
const sentenceEndRegex = /[.!?](?=\s|$)|\n\n/g;
// Includes:
// - ASCII: . ! ?
// - Arabic: ؟ (question mark), ۔ (full stop)
// - Japanese: 。 (full stop), , (full-width exclamation/question)
//
// For languages that use spaces between sentences, we still require
// whitespace or end-of-string after the mark. For Japanese (no spaces),
// we treat the punctuation itself as a boundary regardless of following char.
const sentenceEndRegex = /[.!?؟۔](?=\s|$)|[。!?]|\n\n/g;
let lastSentenceBoundary = -1;
let match;
while ((match = sentenceEndRegex.exec(text)) && match.index < limit) {

5307
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -27,14 +27,14 @@
"dependencies": {
"@aws-sdk/client-auto-scaling": "^3.549.0",
"@aws-sdk/client-sns": "^3.549.0",
"@jambonz/db-helpers": "^0.9.17",
"@jambonz/db-helpers": "^0.9.18",
"@jambonz/http-health-check": "^0.0.1",
"@jambonz/mw-registrar": "^0.2.7",
"@jambonz/realtimedb-helpers": "^0.8.15",
"@jambonz/speech-utils": "^0.2.24",
"@jambonz/speech-utils": "^0.2.26",
"@jambonz/stats-collector": "^0.1.10",
"@jambonz/time-series": "^0.2.14",
"@jambonz/verb-specifications": "^0.0.116",
"@jambonz/time-series": "^0.2.15",
"@jambonz/verb-specifications": "^0.0.122",
"@modelcontextprotocol/sdk": "^1.9.0",
"@opentelemetry/api": "^1.8.0",
"@opentelemetry/exporter-jaeger": "^1.23.0",
@@ -49,12 +49,12 @@
"debug": "^4.3.4",
"deepcopy": "^2.1.0",
"drachtio-fsmrf": "^4.1.2",
"drachtio-srf": "^5.0.11",
"drachtio-srf": "^5.0.14",
"express": "^4.19.2",
"express-validator": "^7.0.1",
"moment": "^2.30.1",
"parse-url": "^9.2.0",
"pino": "^8.20.0",
"pino": "^10.1.0",
"polly-ssml-split": "^0.1.0",
"sdp-transform": "^2.15.0",
"short-uuid": "^5.1.0",

View File

@@ -4,6 +4,7 @@ require('./ws-requestor-unit-test');
require('./http-requestor-retry-test');
require('./http-requestor-unit-test');
require('./unit-tests');
require('./tts-streaming-buffer-test');
require('./docker_start');
require('./create-test-db');
require('./account-validation-tests');

View File

@@ -0,0 +1,177 @@
const test = require('tape');
const sinon = require('sinon');
const noop = () => {};
const logger = {
error: noop,
info: noop,
debug: noop
};
const {
TtsStreamingConnectionStatus
} = require('../lib/utils/constants.json');
const TtsStreamingBuffer = require('../lib/utils/tts-streaming-buffer');
// Helper to create a mock CallSession
function createMockCs(options = {}) {
const mockEp = {
uuid: 'test-uuid-1234',
api: sinon.stub().resolves({ body: '+OK' }),
addCustomEventListener: sinon.stub(),
removeCustomEventListener: sinon.stub()
};
return {
logger,
ep: mockEp,
isTtsStreamOpen: options.isTtsStreamOpen !== undefined ? options.isTtsStreamOpen : true,
getTsStreamingVendor: () => options.vendor || 'deepgram'
};
}
/**
* BUG REPRODUCTION TEST
*
* This test reproduces the exact issue from production logs:
* {
* "args": ["uuid", "send", " "],
* "msg": "Error calling uuid_deepgram_tts_streaming: -USAGE: <uuid> connect|send|clear|close [tokens]"
* }
*
* Root cause: When multiple flushes are queued while connecting, and a space token
* gets buffered between flushes, Phase 1 of _feedQueue sends that space to the TTS vendor.
*
* Sequence:
* 1. bufferTokens('Hello.') while connecting
* 2. flush() while connecting
* 3. bufferTokens(' ') while connecting (passes because bufferedLength > 0)
* 4. flush() while connecting
* 5. Connection completes, _feedQueue processes: [text:Hello., flush, text:" ", flush]
* 6. First flush sends "Hello." - OK
* 7. Second flush sends " " - BUG!
*/
test('TtsStreamingBuffer: multiple flushes while connecting - space token sent to TTS vendor', async(t) => {
const cs = createMockCs();
const buffer = new TtsStreamingBuffer(cs);
buffer._connectionStatus = TtsStreamingConnectionStatus.Connecting;
buffer.vendor = 'deepgram';
const apiCalls = [];
const originalApi = buffer._api.bind(buffer);
buffer._api = async function(ep, args) {
apiCalls.push({ args: [...args] });
return originalApi(ep, args);
};
// First batch while connecting
await buffer.bufferTokens('Hello.');
buffer.flush();
// Second batch - just a space (passes because bufferedLength > 0)
await buffer.bufferTokens(' ');
buffer.flush();
// Verify queue state before connect
t.equal(buffer.queue.length, 4, 'queue should have 4 items: [text, flush, text, flush]');
t.equal(buffer.queue[0].type, 'text', 'first item should be text');
t.equal(buffer.queue[0].value, 'Hello.', 'first text should be "Hello."');
t.equal(buffer.queue[1].type, 'flush', 'second item should be flush');
t.equal(buffer.queue[2].type, 'text', 'third item should be text');
t.equal(buffer.queue[2].value, ' ', 'third item should be space');
t.equal(buffer.queue[3].type, 'flush', 'fourth item should be flush');
// Connect - triggers _feedQueue
buffer._connectionStatus = TtsStreamingConnectionStatus.Connected;
await buffer._feedQueue();
// Check API calls
const sendCalls = apiCalls.filter(call => call.args[1] === 'send');
// This assertion will FAIL until the bug is fixed
const whitespaceOnlySends = sendCalls.filter(call => /^\s*$/.test(call.args[2]));
t.equal(whitespaceOnlySends.length, 0,
`should not send whitespace-only tokens, but sent: ${whitespaceOnlySends.map(c => JSON.stringify(c.args[2])).join(', ')}`);
t.end();
});
/**
* Additional test: Verify text with trailing space in same flush is OK
*/
test('TtsStreamingBuffer: text with trailing space in same flush should work', async(t) => {
const cs = createMockCs();
const buffer = new TtsStreamingBuffer(cs);
buffer._connectionStatus = TtsStreamingConnectionStatus.Connecting;
buffer.vendor = 'deepgram';
const apiCalls = [];
const originalApi = buffer._api.bind(buffer);
buffer._api = async function(ep, args) {
apiCalls.push({ args: [...args] });
return originalApi(ep, args);
};
// Buffer text with trailing space, then flush
await buffer.bufferTokens('Hello.');
await buffer.bufferTokens(' ');
buffer.flush();
// Connect
buffer._connectionStatus = TtsStreamingConnectionStatus.Connected;
await buffer._feedQueue();
const sendCalls = apiCalls.filter(call => call.args[1] === 'send');
t.equal(sendCalls.length, 1, 'should have one send call');
t.equal(sendCalls[0].args[2], 'Hello. ', 'should send "Hello. " (text with trailing space)');
t.end();
});
/**
* Test: Leading whitespace should be discarded when buffer is empty
*/
test('TtsStreamingBuffer: leading whitespace discarded when buffer empty', async(t) => {
const cs = createMockCs();
const buffer = new TtsStreamingBuffer(cs);
buffer._connectionStatus = TtsStreamingConnectionStatus.Connected;
buffer.vendor = 'deepgram';
// Try to buffer whitespace when buffer is empty
const result = await buffer.bufferTokens(' ');
t.equal(result.status, 'ok', 'should return ok status');
t.equal(buffer.bufferedLength, 0, 'buffer should remain empty');
t.equal(buffer.queue.length, 0, 'queue should remain empty');
t.end();
});
/**
* Test: Whitespace can be buffered when buffer has content
*/
test('TtsStreamingBuffer: whitespace accepted when buffer has content', async(t) => {
const cs = createMockCs();
const buffer = new TtsStreamingBuffer(cs);
buffer._connectionStatus = TtsStreamingConnectionStatus.Connecting;
buffer.vendor = 'deepgram';
// Buffer real text first
await buffer.bufferTokens('Hello');
// Now buffer whitespace (should pass because bufferedLength > 0)
const result = await buffer.bufferTokens(' ');
t.equal(result.status, 'ok', 'should return ok status');
t.equal(buffer.bufferedLength, 6, 'buffer should have 6 chars');
t.equal(buffer.queue.length, 2, 'queue should have 2 items');
t.end();
});

View File

@@ -83,7 +83,8 @@ test('invalid jambonz json create alert tests', async(t) => {
{account_sid: 'bb845d4b-83a9-4cde-a6e9-50f3743bab3f', page: 1, page_size: 25, days: 7});
let checked = false;
for (let i = 0; i < data.total; i++) {
checked = data.data[i].message === 'malformed jambonz payload: must be array'
checked = data.data[i].message === 'malformed jambonz payload: must be array';
if (checked) break;
}
t.ok(checked, 'alert is raised as expected');
disconnect();