Release notes/0.9.0 (#88)

* wip

* initial changes

* release notes

* wip

* wip

* wip

* wip

* wip
This commit is contained in:
Dave Horton
2024-04-21 11:16:13 -04:00
committed by GitHub
parent 0bcaa4b41a
commit ee9f7c23de
6 changed files with 98 additions and 8 deletions

View File

@@ -165,27 +165,30 @@ navi:
path: release-notes
title: Release Notes
pages:
-
path: 0.9.0
title: 0.9.0
-
path: v0.8.5
title: v0.8.5
title: 0.8.5
-
path: v0.8.4
title: v0.8.4
title: 0.8.4
-
path: v0.8.3
title: v0.8.3
title: 0.8.3
-
path: v0.8.2
title: v0.8.2
title: 0.8.2
-
path: v0.8.1
title: v0.8.1
title: 0.8.1
-
path: v0.8.0
title: v0.8.0
title: 0.8.0
-
path: v0.7.9
title: v0.7.9
title: 0.7.9
-
path: jambonz-ui
title: Jambonz UI

View File

@@ -0,0 +1,39 @@
# Release 0.9.0
#### Info
- Release Date: April 20, 2024
#### New Features
- Add support for google v2 STT api
- Add support for additional TTS vendors: [PlayHT](https://play.ht/), [RimeLabs](https://rime.ai/), and [Deepgram](https://deepgram.com/product/text-to-speech)
- Add support for streaming TTS (reduces latency) for Deepgram, ElevenLabs, Microsoft, PlayHT, RimeLabs, and Whisper
- Add support for [bidirectional audio](/docs/supporting-articles/bidirectional-audio) in [listen](/docs/webhooks/listen) verb
- Add new verb: [dub](/docs/webhooks/dub/) to insert additional audio tracks into the conversation; see [here](/docs/supporting-articles/using-dub-tracks/) for example usage.
- Add `boostAudioSignal` to [config](/docs/webhooks/config) verb allowing the volume of a conversation to be increased or lowered.
- Add support for "filler" audio to the [gather](/docs/webhooks/config) verb allowing brief audio to be played to a caller while the user application is processing a user utterance or dtmf collection, this can be useful in scenarios where an AI bot is expected to take a lengthy time to process a request
- Add support for sending outbound OPTIONS pings to configured SIP trunks
- If Deepgram endpointing is enabled, default utterance_end_ms to 1000 if none specified by the application (per Deepgram recommendation)
- various improvements and enhancements to [node-client-ws](https://github.com/jambonz/node-client-ws)
#### Bug fixes
- various fixes for Deepgram STT
- [714](https://github.com/jambonz/jambonz-feature-server/issues/714) bargein "sticky" only works twice
- [710](https://github.com/jambonz/jambonz-feature-server/issues/710) fix for actionHookDelay action
- [671](https://github.com/jambonz/jambonz-feature-server/issues/671) handling of siprec invite failure
- [666](https://github.com/jambonz/jambonz-feature-server/issues/666) transcribe on dial verb does not transcribe B leg by default
- fix for precaching of TTS
- check if sip gateway is in blacklist before sending outbound call
#### SQL changes
```
ALTER TABLE sip_gateways ADD COLUMN send_options_ping BOOLEAN NOT NULL DEFAULT 0
ALTER TABLE applications MODIFY COLUMN speech_synthesis_voice VARCHAR(256)
ALTER TABLE applications MODIFY COLUMN fallback_speech_synthesis_voice VARCHAR(256)
```
#### Availability
- Available now on jambonz.cloud
- devops scripts (packer, cloudformation, helm) available now for subscription customers
**Questions?** Contact us at <a href="mailto:support@jambonz.org">support@jambonz.org</a>

View File

@@ -0,0 +1,20 @@
# Bidirectional (streaming) audio
As of release 0.9.0, the jambonz [listen](/docs/webhooks/listen) verb supports streaming bidirectional audio.
>> Prior to release 0.9.0 bidirectional audio was supported but streaming was one-way: from jambonz to your application. Any audio you provided back had to be provided in the form of a base64-encoded file that was received and then played in its entirety.
To enable bidirectional audio, you must explicitly enable it in the listen verb with the `streaming` property as shown below:
```js
{
verb: 'listen',
bdirectionalAudio: {
enabled: true,
streaming: true,
sampleRate: 8000
}
}
```
Your application should then send binary frames of linear-16 pcm raw data with the specified sample rate over the websocket connection.

View File

@@ -2,7 +2,7 @@
Sometimes in conversational AI scenarios there may be significant latency while the remote application processes a user response and is determing the next action to take. In these scenarios it is common to play a typing sound or other audio to provide an audio cue to the caller that the system is processing the response, that the agent is thinking or retrieving, etc.
Support for "filler noise" can enabled either at the session level using the `config.fillerNoise` property or at the individual `gather` level using the same property. In the example below, we set a session-wide setting for filler noise (in the form of a typing sound) to kick in after waiting 2 seconds for the remote app to respond to user input.
Support for "filler noise" can enabled either at the session level using the [config.fillerNoise](/docs/webhooks/config) property or at the individual [gather](/docs/webhooks/gather) level using the same property. In the example below, we set a session-wide setting for filler noise (in the form of a typing sound) to kick in after waiting 2 seconds for the remote app to respond to user input.
```js
/* websocket application */

View File

@@ -29,6 +29,10 @@ You can use the following options in the `listen` action:
| option | description | required |
| ------------- |-------------| -----|
| actionHook | webhook to invoke when listen operation ends. The information will include the duration of the audio stream, and also a 'digits' property if the recording was terminated by a dtmf key. | yes |
|bidirectionalAudio.enabled|if true, enable bidirectional audio | no (default: true)|
|bidirectionalAudio.streaming|if true, enable streaming of audio from your application to jambonz (and the remote caller)|no (default: false)|
|bidirectionalAudio.sampleRate|required if streaming| no|
|disableBidirectionalAudio| (deprecated) if true, disable bidirectional audio (same as setting bidirectionalAudio.enabled = false)|no|
| finishOnKey | The set of digits that can end the listen action | no |
| maxLength | the maximum length of the listened audio stream, in secs | no |
| metadata | arbitrary data to add to the JSON payload sent to the remote server when websocket connection is first connected | no |
@@ -58,6 +62,12 @@ Any DTMF digits entered by the far end party on the call can optionally be passe
Audio can also be sent back over the websocket to jambonz. This audio, if supplied, will be played out to the caller. (Note: Bidirectional audio is not supported when the `listen` is nested in the context of a `dial` verb).
There are two separate modes for bidirectional audio:
- non-streaming, where you provide a full base64-encoded audio file as JSON text frames
- streaming, where stream audio as L16 pcm raw audio as binary frames
<h5 id="bidirectional_audio_non_streaming">non-streaming</h5>
The far-end websocket server supplies bidirectional audio by sending a JSON text frame over the websocket connection:
```json
{
@@ -92,6 +102,22 @@ And finally, if the websocket connection wishes to end the `listen`, it can send
}
```
<h5 id="bidirectional_audio_streaming">streaming</h5>
To enable streaming bidirectional audio, you must explicitly enable it as shown below:
```js
{
verb: 'listen',
bdirectionalAudio: {
enabled: true,
streaming: true,
sampleRate: 8000
}
}
```
Your application should then send binary frames of linear-16 pcm raw data with the specified sample rate over the websocket connection.
<p class="flex">
<a href="/docs/webhooks/lex">Prev: lex</a>
<a href="/docs/webhooks/message">Next: message</a>

View File

@@ -22,8 +22,10 @@ The `command` property must be one of the values shown below.
|conf:mute-status|mute or unmute all non-moderator conference legs|data must include a `conf_mute_status` property with a value of either 'mute' or 'unmute'|
|conf:hold-status|place a conference leg on hold or take off hold|data must include a `conf_hold_status` property with a value of either 'hold' or 'unhold'|
|listen:status|Change the status of a listen stream|data must include a `listen_status` property with a value of 'pause' or 'resume'|
|record|manage call recording that is done via SIPREC to a remote recording server|data must include an `action` with one of "startCallRecording", "stopCallRecording", "pauseCallRecording", or "resumeCallRecording". When starting a recording you must also supply "recordingID" and "siprecServerURL". You may optionally supply a `headers` object with custom headers to be sent to the remote SIPREC recording server.|
|whisper|Play a whisper prompt to the caller (i.e only one party hears the prompt)|data must include a `whisper` property that can be an array of say or play verbs|
|sip:request|Send a SIP INFO, NOTIFY, or MESSAGE request to the far end party|data must include a 'method' property (allowed values: 'INFO', 'NOTIFY', 'MESSAGE') and can include 'content_type', 'content', and 'headers' properties.|
|dub|add, remove or operate on a dub track.|The data must include the properties defined for the [dub](./docs/webhooks/dub) verb|
> Note: In the data payload when `redirect` is used, each jambonz verb in the `data` array may optionally include an `id` property. If present, jambonz will provide `verb:status` notifications when the verb starts and ends execution.