Verb/config (#32)

* add config verb * more docs * minor * more docs * add some properties to gather * additional gather options * more gather opts * add support for dtmf verb * 0.7.4 release notes * dial referHook details
2026-07-24 04:52:06 +00:00 · 2022-03-18 15:58:34 -04:00
parent fea4f829ae
commit c07a37d64a
11 changed files with 130 additions and 10 deletions
@@ -12,6 +12,9 @@ navi:
      -
        path: conference
        title: conference
+      -
+        path: config
+        title: config
      -
        path: dequeue
        title: dequeue
@@ -21,6 +24,9 @@ navi:
      -
        path: dialogflow
        title: dialogflow
+      -
+        path: dtmf
+        title: dtmf
      -
        path: enqueue
        title: enqueue
@@ -109,6 +115,9 @@ navi:
    path: release-notes
    title: Release Notes
    pages:
+      -
+        path: v0.7.4
+        title: v0.7.4
      -
        path: v0.7.3
        title: v0.7.3
@@ -0,0 +1,24 @@
+# Release v0.7.4
+> Release Date: Mar 17, 2022
+
+#### New Features
+- Adds support for using a websocket connection as an alternative to webhooks.
+- [config](/docs/webhooks/config/) verb was added to allow session-level speech defaults to be manipulated during a call.
+- [gather](/docs/webhooks/gather) and [transcribe](/docs/webhooks/transcribe) now support voice activity detection, which can be used to delay the connection to a speech service until speech is detected.  This can reduce the costs of using some speech providers.
+- Allow target-level headers on [dial](/docs/webhooks/dial) verb.
+- Add support for handling incoming SIP REFER while in a [dial](/docs/webhooks/dial) verb.
+- Additional parameters were added to the [gather](/docs/webhooks/gather) verb.
+- Add rate limiting of API requests.
+- Add support for redis user/password authentication.
+
+#### Bug fixes
+- Rest outdial sometimes failed due to req.srf not properly set.
+- When running on kubernetes, use sbc-sip service rather than pinging sbcs.
+- Use registered contact as uri when sending to user.
+- Disable DNS caching on Kubernetes when routing calls from SBC to feature servers to prevent intermittent failures when service endpoints change.
+
+#### Availability
+- Available shortly on <a href="https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo" target="_blank" >AWS Marketplace</a>
+- Deploy to Kubernetes using [this Helm chart](https://github.com/jambonz/helm-charts)
+
+**Questions?** Contact us at <a href="mailto:support@jambonz.org">support@jambonz.org</a>
@@ -47,5 +47,5 @@ Conference status webhooks will contain the following additional parameters:

 <p class="flex">
 <a href="/docs/webhooks/overview">Prev: Overview</a>
-<a href="/docs/webhooks/dequeue">Next: dequeue</a>
+<a href="/docs/webhooks/config">Next: config</a>
 </p>
@@ -0,0 +1,44 @@
+# config
+
+The `config` verb allows the developer to change the default speech settings for the current session, or to listen in the background while other verbs are executing.  The latter technique is useful mainly for certain scenarios when integrating with some conversational AI systems.
+
+This verb is non-blocking; i.e. the specified settings are changed and execution immediately continues with the next verb in the application.
+
+```json
+  {
+    "verb": "config",
+    "synthesizer": {
+      "voice": "Jenny"
+    },
+    "recognizer": {
+      "vendor": "google",
+      "language": "de-DE"
+    },
+    "bargeIn": {
+      "enable": true,
+      "input" : ["speech"],
+      "actionHook: "/userInput"
+      }
+    }
+  },
+```
+You can use the following attributes in the `config` command:
+
+| option        | description | required  |
+| ------------- |-------------| -----|
+| synthesizer | change the session-level default text-to-speech settings. See [the say verb](/docs/webhooks/say) for details on the `synthesizer` property.| no |
+| recognizer | change the session-level default speech recognition settings. See [the transcribe verb](/docs/webhooks/transcribe) for details on the `recognizer` property.| no |
+| bargeIn.enable| if true, begin listening for speech or dtmf input while the session is executing other verbs.  This is known as a "background gather" and an application to capture user input outside of a [gather verb](/docs/webhooks/gather).  If false, stop any background listening task that is in progress| no|
+| bargeIn.actionHook | A webhook to call if user input is collected from the background gather.| no |
+| bargeIn.input |Array, specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech']. | yes |
+| bargeIn.finishOnKey | Dmtf key that signals the end of dtmf input | no |
+| bargeIn.numDigits | Exact number of dtmf digits expected to gather | no |
+| bargeIn.minDigits | Minimum number of dtmf digits expected to gather.  Defaults to 1. | no |
+| bargeIn.maxDigits | Maximum number of dtmf digits expected to gather | no |
+| bargeIn.interDigitTimeout | Amount of time to wait between digits after minDigits have been entered.| no |
+
+
+<p class="flex">
+<a href="/docs/webhooks/conference">Prev: Conference</a>
+<a href="/docs/webhooks/dequeue">Next: dequeue</a>
+</p>
@@ -17,9 +17,12 @@ You can use the following options in the `dequeue` command:
 | name | name of the queue | yes |
 | actionHook | A webhook invoke when call ends. If no webhook is provided, execution will continue with the next verb in the current application. <br/>See below for specified request parameters.| no |
 | beep | if true, play a beep tone to this caller only just prior to connecting the queued call; this provides an auditory cue that the call is now connected | no |
-| confirmHook | A webhook for an application to run on the callee's end before the call is bridged.  This will allow the application to play an informative message to a caller as they leave the queue (e.g. "your call may be recorded") | no |
 | timeout | number of seconds to wait on an empty queue before returning (default: wait forever) | no |

+<!--
+| confirmHook | A webhook for an application to run on the callee's end before the call is bridged.  This will allow the application to play an informative message to a caller as they leave the queue (e.g. "your call may be recorded") | no |
+-->
+
 The *actionHook* webhook will contain a `dequeueResult` property indicating the completion reason:

 - 'hangup' - the bridged call was abandoned while listening to the confirmHook message
@@ -28,6 +31,6 @@ The *actionHook* webhook will contain a `dequeueResult` property indicating the
 - 'error' - a system error of some kind occurred

 <p class="flex">
-<a href="/docs/webhooks/conference">Prev: conference</a>
+<a href="/docs/webhooks/config">Prev: config</a>
 <a href="/docs/webhooks/dial">Next: dial</a>
 </p>
@@ -57,6 +57,7 @@ You can use the following attributes in the `dial` command:
 | dtmfHook | a webhook to call when a dtmfCapture entry is matched.  This is a notification only -- no response is expected, and any desired actions must be carried out via the REST updateCall API. | no|
 | headers | an object containing arbitrary sip headers to apply to the outbound call attempt(s) | no |
 | listen | a nested [listen](#listen) action, which will cause audio from the call to be streamed to a remote server over a websocket connection | no |
+| referHook | webhook to invoke when an incoming SIP REFER is received on a dialed call.  If the application wishes to accept and process the REFER, the webhook application should simply return an HTTP status code 200 with no body, and jambonz will send a SIP 202 Accepted.  Otherwise, any HTTP non-success status will cause jambonz to send a SIP response to the REFER with the same status code.  <br/><br/>Note that jambonz will send the 202 Accepted and do nothing further.  It is the responsibility of the third-party application to then outdial a new call and bridge the other leg, presumably by using the REST API.  See [this example app](https://github.com/jambonz/sip-blind-transfer) for more details.| no|
 | target | array of to 10 [destinations](#target-types) to simultaneously dial. The first person (or entity) to answer the call will be connected to the caller and the rest of the called numbers will be hung up.| yes |
 | timeLimit | max length of call in seconds | no |
 | timeout | ring no answer timeout, in seconds.  <br/>Defaults to 60. | no |
@@ -84,5 +84,5 @@ Please refer to [this tutorial](/tutorials/#dialogflow-part-2-adding-call-transf

 <p class="flex">
 <a href="/docs/webhooks/dial">Prev: dial</a>
-<a href="/docs/webhooks/enqueue">Next: enqueue</a>
+<a href="/docs/webhooks/dtmf">Next: dtmf</a>
 </p>
@@ -0,0 +1,23 @@
+# dtmf
+
+The `dtmf` verb generates a string of dtmf digit signals.  These are sent as RTP payloads using [RFC 3822](https://datatracker.ietf.org/doc/html/rfc2833).
+
+```json
+{
+  "verb": "dtmf",
+  "dtmf": "0276",
+  "duration": 250
+}
+```
+
+You can use the following options in the `dtmf` action:
+
+| option        | description | required  |
+| ------------- |-------------| -----|
+| dtmf | a string containing a sequence of dtmf digits (0-9,*,#) | yes |
+| duration | the length of each digit, in milliseconds,  Defaults to 500 | no |
+
+<p class="flex">
+<a href="/docs/webhooks/dialogflow">Prev: dialogflow</a>
+<a href="/docs/webhooks/enqueue">Next: enqueue</a>
+</p>
@@ -38,6 +38,6 @@ The *waitHook* webhook will contain the following additional parameters:
 YOu can also optionally receive [queue webhook notifications](/docs/webhooks/queue-notifications) any time a members joins or leaves a queue.

 <p class="flex">
-<a href="/docs/webhooks/dialogflow">Prev: dialogflow</a>
+<a href="/docs/webhooks/dtmf">Prev: dtmf</a>
 <a href="/docs/webhooks/gather">Next: gather</a>
 </p>
@@ -7,15 +7,19 @@ The `gather` command is used to collect dtmf or speech input.
  "verb": "gather",
  "actionHook": "http://example.com/collect",
  "input": ["digits", "speech"],
+  "bargein": true,
+  "dtmfBargein": true,
  "finishOnKey": "#",
  "numDigits": 5,
  "timeout": 8,
  "recognizer": {
    "vendor": "Google",
-    "language": "en-US"
+    "language": "en-US",
+    "hints": ["sales", "support"],
+    "hintsBoost": 10
  },
  "say": {
-    "text": "To speak to Sales press 1.  To speak to customer support press 2.",
+    "text": "To speak to Sales press 1 or say Sales.  To speak to customer support press 2 or say Support",
    "synthesizer": {
      "vendor": "Google",
      "language": "en-US"
@@ -29,9 +33,16 @@ You can use the following options in the `gather` command:
 | option        | description | required  |
 | ------------- |-------------| -----|
 | actionHook | Webhook POST to invoke with the collected digits or speech. The payload will include a 'speech' or 'dtmf' property along with the standard attributes.  See below for more detail.| yes |
+| bargein | allow speech bargein, i.e. kill audio playback if caller begins speaking | no |
+| dtmfBargein | allow dtmf bargein, i.e. kill audio playback if caller enters dtmf | no |
 | finishOnKey | Dmtf key that signals the end of input | no |
 | input |Array, specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech'].  Default: ['digits'] | no |
-| numDigits | Number of dtmf digits expected to gather | no |
+| interDigitTimeout | Amount of time to wait between digits after minDigits have been entered.| no |
+|listenDuringPrompt| if false, do not listen for user speech until say or play has completed.  Defaults to true|no|
+| minBargeinWordCount | if bargein is true, only kill speech when this many words are spoken.  Defaults to 1 | no|
+| minDigits | Minimum number of dtmf digits expected to gather.  Defaults to 1. | no |
+| maxDigits | Maximum number of dtmf digits expected to gather | no |
+| numDigits | Exact number of dtmf digits expected to gather | no |
 | partialResultHook | Webhook to send interim transcription results to. Partial transcriptions are only generated if this property is set. | no |
 | play | nested [play](#play) Command that can be used to prompt the user | no |
 | recognizer.vendor | Speech vendor to use (google, aws, or microsoft) | no |
@@ -40,7 +51,9 @@ You can use the following options in the `gather` command:
 | recognizer.vad.voiceMs|If vad is enabled, the number of milliseconds of speech required before connecting to cloud recognizer|no|
 | recognizer.vad.mode|If vad is enabled, this setting governs the sensitivity of the voice activity detector; value must be between 0 to 3 inclusive (lower numbers mean more sensitivity, i.e. more likely to return a false positive). Default: 2|no|
 | recognizer.hints | (google and microsoft only) Array of words or phrases to assist speech detection | no |
+| recognizer.hintsBoost | (google only) A value between 0 to 20 inclusive; higher number means assign more weight to the hints | no |
 | recognizer.altLanguages |(google only) An array of alternative languages that the speaker may be using | no |
+| recognizer.punctuation |(google only) Enable automatic punctuation | no |
 | recognizer.profanityFilter | (google only) If true, filter profanity from speech transcription .  Default:  no| no |
 | recognizer.vocabularyName |  (aws only) The name of a vocabulary to use when processing the speech.| no |
 | recognizer.vocabularyFilterName |  (aws only) The name of a vocabulary filter to use when processing the speech.| no |
@@ -49,6 +62,7 @@ You can use the following options in the `gather` command:
 | recognizer.outputFormat | (microsoft only) simple or detailed.  Default:  simple| no |
 | recognizer.requestSnr | (microsoft only) Request signal to noise information| no |
 | recognizer.initialSpeechTimeoutMs | (microsoft only) Initial speech timeout in milliseconds| no |
+| recognizer.azureServiceEndpoint | (microsoft only) URI of a custom speech endpoint to connect to| no |
 | say | nested [say](#say) Command that can be used to prompt the user | no |
 | timeout | The number of seconds of silence or inaction that denote the end of caller input.  The timeout timer will begin after any nested play or say command completes.  Defaults to 5 | no |

@@ -12,7 +12,9 @@ To utilize the listen verb, the customer must implement a websocket server to re

 The format of the audio data sent over the websocket is 16-bit PCM encoding, with a user-specified sample rate.  The audio is sent in binary frames over the websocket connection.  

-Additionally, one text frame is sent immediately after the websocket connection is established.  This text frame contains a JSON string with all of the call attributes normally sent on an HTTP request (e.g. callSid, etc), plus **sampleRate** and **mixType** properties describing the audio sample rate and stream(s).  Additional metadata can also be added to this payload using the **metadata** property as described in the table below.  Once the intial text frame containing the metadata has been sent, the remote side should expect to receive only binary frames, containing audio.  The remote side is not expected to send any data back over the websocket.
+Additionally, one text frame is sent immediately after the websocket connection is established.  This text frame contains a JSON string with all of the call attributes normally sent on an HTTP request (e.g. callSid, etc), plus **sampleRate** and **mixType** properties describing the audio sample rate and stream(s).  Additional metadata can also be added to this payload using the **metadata** property as described in the table below.  Once the intial text frame containing the metadata has been sent, the remote side should expect to receive only binary frames, containing audio.  
+
+Note that the remote side can optionally send messages and audio back over the websocket connection, as described below in [Birectional Audio](#birectional_audio).

 ```json
 {
@@ -39,7 +41,7 @@ You can use the following options in the `listen` action:
 | wsAuth.username | HTTP basic auth username to use on websocket connection | no |
 | wsAuth.password | HTTP basic auth password to use on websocket connection | no |

-<h4 id="#birectional_audio">Bidirectional audio</h4>
+<h4 id="birectional_audio">Bidirectional audio</h4>

 Audio can also be sent back over the websocket to jambonz.  This audio, if supplied, will be played out to the caller.  (Note: Bidirectional audio is not supported when the `listen` is nested in the context of a `dial` verb).