Release/0.8.0 (#55)

* added nvidia

* add release notes

* add links
This commit is contained in:
Dave Horton
2023-02-21 14:03:07 -05:00
committed by GitHub
parent f510cc6946
commit cb60307b72
5 changed files with 132 additions and 27 deletions

View File

@@ -152,6 +152,9 @@ navi:
path: release-notes
title: Release Notes
pages:
-
path: v0.8.0
title: v0.8.0
-
path: v0.7.9
title: v0.7.9

View File

@@ -0,0 +1,67 @@
# Release v0.8.0
> Release Date: Feb 21, 2023
#### New Features
- Completely re-written jambonz portal
- Multi-user support in jambonz portal
- [Deepgram](/docs/webhooks/recognizer/#deepgramOptions) STT support
- [Nuance](/docs/webhooks/recognizer/#nuanceOptions) Mix STT/TTS support
- [Nvidia Riva](/docs/webhooks/recognizer/#nvidiaOptions) STT/TTS support
- [IBM Watson](/docs/webhooks/recognizer/#ibmOptions) STT/TTS support
- option to force TTS (re)generation
- add `listen` option to config verb to enable streaming of audio during conversation
- ability to provision initial app content (eliminate overhead of initial webhook)
- allow per-phrase boosting for google STT
- update Simwood and Twilio gateway addresses
#### Bug fixes
- performance improvements in Feature server
- prevent 2 simultaneous background gathers
- refer blocks if notify nor bye received
- when closing websocket at end of call send ws code 1000
- ACK to 487 response must have same branch in via as invite
- reset variables like hints so that previous hints do not automatically carry over
- proper shut down in K8S
- switching to http webhook during a ws session
- fix uncaught exception in certain ws reconnect scenarios
- fixes for SIPREC pause and resume operations
#### SQL changes
```
ALTER TABLE `applications` ADD COLUMN `app_json` TEXT';
ALTER TABLE voip_carriers CHANGE register_public_domain_in_contact register_public_ip_in_contact BOOLEAN;
'alter table phone_numbers modify number varchar(132) NOT NULL UNIQUE;
CREATE TABLE permissions
(
permission_sid CHAR(36) NOT NULL UNIQUE ,
name VARCHAR(32) NOT NULL UNIQUE ,
description VARCHAR(255),
PRIMARY KEY (permission_sid)
);
CREATE TABLE user_permissions
(
user_permissions_sid CHAR(36) NOT NULL UNIQUE ,
user_sid CHAR(36) NOT NULL,
permission_sid CHAR(36) NOT NULL,
PRIMARY KEY (user_permissions_sid)
);
CREATE TABLE password_settings
(
min_password_length INTEGER NOT NULL DEFAULT 8,
require_digit BOOLEAN NOT NULL DEFAULT false,
require_special_character BOOLEAN NOT NULL DEFAULT false
);
CREATE INDEX user_permissions_sid_idx ON user_permissions (user_permissions_sid);
CREATE INDEX user_sid_idx ON user_permissions (user_sid);
ALTER TABLE user_permissions ADD FOREIGN KEY user_sid_idxfk (user_sid) REFERENCES users (user_sid) ON DELETE CASCADE;
ALTER TABLE user_permissions ADD FOREIGN KEY permission_sid_idxfk (permission_sid) REFERENCES permissions (permission_sid);
ALTER TABLE `users` ADD COLUMN `is_active` BOOLEAN NOT NULL default true;
```
#### Availability
- Available shortly on AWS Marketplace
- Deploy to Kubernetes using [this Helm chart](https://github.com/jambonz/helm-charts)
**Questions?** Contact us at <a href="mailto:support@jambonz.org">support@jambonz.org</a>

View File

@@ -31,6 +31,7 @@ You can use the following attributes in the `config` command:
| recognizer | change the session-level default speech recognition settings. See [the transcribe verb](/docs/webhooks/transcribe) for details on the `recognizer` property.| no |
|bargein|this object contains properties that are used to instantiate a 'background' [gather verb](/docs/webhooks/gather)|no|
| bargeIn.enable| if true, begin listening for speech or dtmf input while the session is executing other verbs. This is known as a "background gather" and an application to capture user input outside of a [gather verb](/docs/webhooks/gather). If false, stop any background listening task that is in progress| no|
| bargeIn.sticky | If true and bargeIn.enable is true, then when the background gather completes with speech or dtmf detected, it will automatically start another background gather|no|
| bargeIn.actionHook | A webhook to call if user input is collected from the background gather.| no |
| bargeIn.input |Array, specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech']. | yes |
| bargeIn.finishOnKey | Dmtf key that signals the end of dtmf input | no |

View File

@@ -3,43 +3,63 @@ The `recognizer` property is used in multiple verbs (gather, transcribe, etc). I
| option | description | required |
| ------------- |-------------| -----|
| vendor | Speech vendor to use (google, aws, microsoft, deepgram, nuance, or ibm) | no |
| vendor | Speech vendor to use (google, aws, microsoft, deepgram, nuance, nvidia, and ibm are currently supported) | no |
| language | Language code to use for speech detection. Defaults to the application level setting | no |
| interim | If true, interim transcriptions are sent | no (default: false) |
| hints | (google, microsoft, deepgram, nvidia) Array of words or phrases to assist speech detection. See [examples](#hints) below. | no |
| hintsBoost | (google, nvidia) Number indicating the strength to assign to the configured hints. See examples below. | no |
| profanityFilter | (google, deepgram, nuance, nvidia) If true, filter profanity from speech transcription . Default: no| no |
| vad.enable|If true, delay connecting to cloud recognizer until speech is detected|no|
| vad.voiceMs|If vad is enabled, the number of milliseconds of speech required before connecting to cloud recognizer|no|
| vad.mode|If vad is enabled, this setting governs the sensitivity of the voice activity detector; value must be between 0 to 3 inclusive, lower numbers mean more sensitive|no|
| separateRecognitionPerChannel | If true, recognize both caller and called party speech using separate recognition sessions | no |
| altLanguages |(google/microsoft only) An array of alternative languages that the speaker may be using | no |
| punctuation |(google only) Enable automatic punctuation | no |
| enhancedModel |(google only) Use enhanced model | no |
| words |(google only) Enable word offsets | no |
| diarization |(google only) Enable speaker diarization | no |
| diarizationMinSpeakers |(google only) Set the minimum speaker count | no |
| diarizationMaxSpeakers |(google only) Set the maximum speaker count | no |
| interactionType |(google only) Set the interaction type: discussion, presentation, phone_call, voicemail, professionally_produced, voice_search, voice_command, dictation | no |
| naicsCode |(google only) set an industry [NAICS](https://www.census.gov/naics/?58967?yearbck=2022) code that is relevant to the speech | no |
| hints | (google and microsoft only) Array of words or phrases to assist speech detection | no |
| hintsBoost | (google only) Number indicating the strength to assign to the configured hints | no |
| profanityFilter | (google only) If true, filter profanity from speech transcription . Default: no| no |
| vocabularyName | (aws only) The name of a vocabulary to use when processing the speech.| no |
| vocabularyFilterName | (aws only) The name of a vocabulary filter to use when processing the speech.| no |
| filterMethod | (aws only) The method to use when filtering the speech: remove, mask, or tag.| no |
| identifyChannels | (aws only) Enable channel identification. | no |
| profanityOption | (microsoft only) masked, removed, or raw. Default: raw| no |
| outputFormat | (microsoft only) simple or detailed. Default: simple| no |
| requestSnr | (microsoft only) Request signal to noise information| no |
| initialSpeechTimeoutMs | (microsoft only) Initial speech timeout in milliseconds| no |
| altLanguages |(google, microsoft) An array of alternative languages that the speaker may be using | no |
| punctuation |(google) Enable automatic punctuation | no |
| enhancedModel |(google) Use enhanced model | no |
| words |(google) Enable word offsets | no |
| diarization |(google) Enable speaker diarization | no |
| diarizationMinSpeakers |(google) Set the minimum speaker count | no |
| diarizationMaxSpeakers |(google) Set the maximum speaker count | no |
| interactionType |(google) Set the interaction type: discussion, presentation, phone_call, voicemail, professionally_produced, voice_search, voice_command, dictation | no |
| naicsCode |(google) set an industry [NAICS](https://www.census.gov/naics/?58967?yearbck=2022) code that is relevant to the speech | no |
| vocabularyName | (aws) The name of a vocabulary to use when processing the speech.| no |
| vocabularyFilterName | (aws) The name of a vocabulary filter to use when processing the speech.| no |
| filterMethod | (aws) The method to use when filtering the speech: remove, mask, or tag.| no |
| identifyChannels | (aws) Enable channel identification. | no |
| profanityOption | (microsoft) masked, removed, or raw. Default: raw| no |
| outputFormat | (microsoft) simple or detailed. Default: simple| no |
| requestSnr | (microsoft) Request signal to noise information| no |
| initialSpeechTimeoutMs | (microsoft) Initial speech timeout in milliseconds| no |
| transcriptionHook | Webhook to receive an HTPP POST when an interim or final transcription is received. | yes |
| asrTimeout|timeout value for [continuous ASR feature](/docs/supporting-articles/continuous-asr)| no |
| asrDtmfTerminationDigit|DMTF key that terminates [continuous ASR feature](/docs/supporting-articles/continuous-asr)| no |
| nuanceOptions (added in 0.8.0)|Nuance-specific speech recognition options (see below)| no |
| deepgramOptions (added in 0.8.0)|Deepgram-specific speech recognition options (see below)| no |
| nvidiaOptions (added in 0.8.0)|Nvidia-specific speech recognition options (see below)| no |
| ibmOptions (added in 0.8.0)|IBM Watson-specific speech recognition options (see below)| no |
<h2 id="hints">Providing speech hints</h2>
google, microsoft, deepgram, and nvidia all support the ability to provide a dynamic list of words or phrases that should be "boosted" by the recognizer, i.e. the recognizer should be more likely to detect this terms and return them in the transcript. A boost factor can also be applied. In the most basic implementation it would look like this:
```json
"hints": ["benign", "malignant", "biopsy"],
"hintsBoost": 50
```
Additionally, google and nvidia allow a boost factor to be specified at the phrase level, e.g.
```json
"hints": [
{"phrase": "benign", "boost": 50},
{"phrase": "malignant", "boost": 10},
{"phrase": "biopsy", "boost": 20},
]
```
<h2 id="nuanceOptions">nuanceOptions</h2>
`nuanceOptions` is an object with the following properties. Please refer to the [Nuance Documentation](https://docs.nuance.com/mix/apis/asr-grpc/v1/#recognitionparameters) for detailed descriptions. This option is only available in jambonz 0.8.0 or above.
`nuanceOptions` is an object with the following properties. Please refer to the [Nuance Documentation](https://docs.nuance.com/mix/apis/asr-grpc/v1/#recognitionparameters) for detailed descriptions. This option is available in jambonz 0.8.0 or above.
| option | description | required |
| ------------- |-------------| -----|
@@ -84,7 +104,7 @@ The `recognizer` property is used in multiple verbs (gather, transcribe, etc). I
<h2 id="deepgramOptions">deepgramOptions</h2>
`deepgramOptions` is an object with the following properties. Please refer to the [Deepgram Documentation](https://developers.deepgram.com/api-reference/transcription/#transcribe-live-streaming-audio) for detailed descriptions. This option is only available in jambonz 0.8.0 or above.
`deepgramOptions` is an object with the following properties. Please refer to the [Deepgram Documentation](https://developers.deepgram.com/api-reference/transcription/#transcribe-live-streaming-audio) for detailed descriptions. This option is available in jambonz 0.8.0 or above.
| option | description | required |
| ------------- |-------------| -----|
@@ -110,7 +130,7 @@ The `recognizer` property is used in multiple verbs (gather, transcribe, etc). I
<h2 id="ibmOptions">ibmOptions</h2>
`ibmOptions` is an object with the following properties. Please refer to the [IBM Watson Documentation](https://cloud.ibm.com/apidocs/speech-to-text?code=node#recognize) for detailed descriptions. This option is only available in jambonz 0.8.0 or above.
`ibmOptions` is an object with the following properties. Please refer to the [IBM Watson Documentation](https://cloud.ibm.com/apidocs/speech-to-text?code=node#recognize) for detailed descriptions. This option is available in jambonz 0.8.0 or above.
| option | description | required |
| ------------- |-------------| -----|
@@ -123,3 +143,17 @@ The `recognizer` property is used in multiple verbs (gather, transcribe, etc). I
| baseModelVersion | Base model to be used | no |
| watsonMetadata | a [tag value](https://cloud.ibm.com/apidocs/speech-to-text?code=node#getting-started-data-labels) to apply to the request data provided | no |
| watsonLearningOptOut | set to true to prevent IBM from using your api request data to improve their service| no |
<h2 id="nvidiaOptions">nvidiaOptions</h2>
`nvidiaOptions` is an object with the following properties. Please refer to the [Nvidia Riva Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html) for detailed descriptions. This option is available in jambonz 0.8.0 or above.
| option | description | required |
| ------------- |-------------| -----|
| rivaUri | grcp endpoint (ip:port) that Nvidia Riva is listening on | no |
| maxAlternatives | number of alternatives to return| no |
| profanityFilter | Indicates whether to remove profanity from the transcript | no |
| punctuation | Indicates whether to provide puncutation in the transcripts | no |
| wordTimeOffsets | indicates whether to provide word-level detail | no |
| verbatimTranscripts | Indicates whether to provide verbatim transcripts| no |
| customConfiguration | An object of key-value pairs that can be sent to Nvidia for custom configuration | no |

View File

@@ -1,13 +1,13 @@
# say
The say command is used to send synthesized speech to the remote party. The text provided may be either plain text or may use SSML tags.
The say command is used to send synthesized speech to the remote party. The text provided may be either plain text or may use SSML tags. The following vendors are supported: google, microsoft, aws, nuance, nvidia, ibm, and wellsaid,
```json
{
"verb": "say",
"text": "hi there!",
"synthesizer" : {
"vendor": "Google",
"vendor": "google",
"language": "en-US"
}
}
@@ -18,7 +18,7 @@ You can use the following options in the `say` action:
| option | description | required |
| ------------- |-------------| -----|
| text | text to speak; may contain SSML tags | yes |
| synthesizer.vendor | speech vendor to use: <br>- 'google', <br>- 'aws', <br>- 'microsoft'| no |
| synthesizer.vendor | speech vendor to use| no |
| synthesizer.language | language code to use. | no |
| synthesizer.gender | (Google only) MALE, FEMALE, or NEUTRAL. | no |
| synthesizer.voice | voice to use. Note that the voice list differs whether you are using aws or Google. Defaults to application setting, if provided. | no |