wip (#106)

* wip * release notes * wip
2026-01-25 02:08:03 +00:00 · 2025-01-29 10:55:56 -05:00
parent 15ab47e964
commit 8406b7778b
13 changed files with 324 additions and 49 deletions
--- a/data/docs.yml
+++ b/data/docs.yml
@@ -103,31 +103,61 @@ navi:
        title: Overview
      -
        path: session-new
-        title: session:new
+        title: (client) session:new
      -
        path: session-redirect
-        title: session:redirect
+        title: (client) session:redirect
      -
        path: session-reconnect
-        title: session:reconnect
+        title: (client) session:reconnect
      -
        path: call-status
-        title: call:status
+        title: (client) call:status
      -
        path: verb-hook
-        title: verb:hook
+        title: (client) verb:hook
      -
        path: verb-status
-        title: verb:status
+        title: (client) verb:status
+      -
+        path: llm-event
+        title: (client) llm:event
+      -
+        path: llm-tool-call
+        title: (client) llm:tool-call
+      -
+        path: tts-tokens-result
+        title: (client) tts:tokens-result
+      -
+        path: tts-streaming-event
+        title: (client) tts:streaming-eventresult
+      -
+        path: dial-confirm
+        title: (client) dial:confirm
      -
        path: jambonz-error
-        title: jambonz:error
+        title: (client) jambonz:error
      -
        path: ack
-        title: ack
+        title: (server) ack
      -
        path: command
-        title: command
+        title: (server) command
+      -
+        path: llm-tool-output
+        title: (server) llm:tool-output
+      -
+        path: llm-update
+        title: (server) llm:update
+      -
+        path: tts-tokens
+        title: (server) tts:tokens
+      -
+        path: tts-flush
+        title: (server) tts:flush
+      -
+        path: tts-clear
+        title: (server) tts:clear
  -
    path: speech-api
    title: Speech API
@@ -171,6 +201,9 @@ navi:
    path: release-notes
    title: Release Notes
    pages:
+      -
+        path: 0.9.3
+        title: 0.9.3
      -
        path: 0.9.2
        title: 0.9.2
@@ -183,21 +216,6 @@ navi:
      -
        path: v0.8.5
        title: 0.8.5
-      -
-        path: v0.8.4
-        title: 0.8.4
-      -
-        path: v0.8.3
-        title: 0.8.3
-      -
-        path: v0.8.2
-        title: 0.8.2
-      -
-        path: v0.8.1
-        title: 0.8.1
-      -
-        path: v0.8.0
-        title: 0.8.0
  -
    path: jambonz-ui
    title: Jambonz UI
--- a/markdown/docs/release-notes/0.9.3.md
+++ b/markdown/docs/release-notes/0.9.3.md
@@ -0,0 +1,62 @@
+# Release 0.9.3
+#### Info
+- Release Date: Jan 21, 205
+
+#### New Features
+- support for [TTS streaming](https://blog.jambonz.org/how-to-stream-text-from-llms-using-jambonz), enables streaming of text tokens from LLMs directly to TTS engines.  Currently, Deepgram, Elevenlabs, Cartesia, and Rimelabs are supported as TTS engines.
+- support for Deepgram Voice Agent speech-to-speech service ([see example app](https://github.com/jambonz/deepgram-voice-agent-example)).
+- support for Ultravox.ai speech-to-speech service ([see example app](https://github.com/jambonz/ultravox-s2s-example)).
+- support for Cartesia as TTS engine
+- updated Speechmatics support with additional options
+- major improvements in feature server performance, particularly with bidirectional audio, TTS and dub audio inserts
+- support for Google speech cloning
+- support for Deepgram filler words in STT
+- added the ability to create a bidirectional stream on the B leg of a dialed call
+- added new speech api to allow developers to implement custom tts streaming solution
+- support for handling incoming 3pcc invites (no body) from carriers
+- support SIP privacy header
+- added ability to send refer custom header to referhook
+- added ability to specify the Refer-To display name
+- added support for dub verb as a live call control request
+- added abjility to export to more than one otel platform
+- sending socket close code when there is no response from the websocket app
+- rest:dial support timeLimit
+- sending reason in X-Reason header when AHD processor giveup
+- support kill dial if sd ep is media timeout
+- capture system_alert when feature-server is online or offline
+- enable dtmf recognition of audible tones for carriers that do not support RFC 2833
+- support recents call dropdown filter to have yesterday option
+- add ability to filter easily by account in portal when viewing large numbers of accounts
+
+
+#### Bug fixes
+- update drachtio-srf and fsmrf to main branch releases
+- fix inband dtmf does not work in dial verb
+- fix for sticky bargein
+- Make voicemail hints case insensitive
+- fix ConfirmCallSession cannot be played
+- fix incorrectly sending final transcript with is_final=false
+- fix cannot replace endpoint for adulting session
+- A play verb in an actionhookdelay property that contains an invalid url will cause that play to block and subsequent verbs will not be executed
+- fixed iamrole from sessionToken to securityToken
+- fix to allow hints objects array
+- fix for stopping continuos asr when asrDtmfTerminationDigit is configured
+- custom stt vendor ws connection  should not be closed in asrTimeout
+- feature server should send USER call to the sbc sip that is connect with the user (#949)
+- fixed dial verb should use calling id from From header (#958)
+- rest api: added support for name query parameter for retrieving application
+- per RFC 3261 the request-uri of REGISTER must not have userinfo
+- Change timer for next REGISTER expires / 2 to avoid delayed registrations
+
+#### SQL changes
+```
+ALTER TABLE google_custom_voices ADD COLUMN voice_cloning_key MEDIUMTEXT;
+ALTER TABLE google_custom_voices ADD COLUMN use_voice_cloning_key BOOLEAN DEFAULT false;
+ALTER TABLE voip_carriers ADD COLUMN dtmf_type ENUM('rfc2833','tones','info') NOT NULL DEFAULT 'rfc2833';
+```
+
+#### Availability
+- Available now on jambonz.cloud
+- Available now with devops scripts for subscription customers
+
+**Questions?** Contact us at <a href="mailto:support@jambonz.org">support@jambonz.org</a>
--- a/markdown/docs/ws/dial-confirm.md
+++ b/markdown/docs/ws/dial-confirm.md
@@ -0,0 +1,11 @@
+# dial:confirm
+
+>> jambonz => websocket server
+
+A `dial:confirm` message is sent by jambonz to the websocket server when dial verb is using a confirm hook to interact with the called party before connecting them to the caller.  The application should respond with an array of verbs that will drive that interaction.
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/llm-event.md
+++ b/markdown/docs/ws/llm-event.md
@@ -0,0 +1,11 @@
+# llm:event
+
+>> jambonz => websocket server
+
+An `llm:event` message is sent by jambonz to the application when an LLM being managed by jambonz (i.e., the [llm verb](docs/webhooks/llm/)) has generated any sort of event.  The payload will include the data provided by the LLM.
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/llm-tool-call.md
+++ b/markdown/docs/ws/llm-tool-call.md
@@ -0,0 +1,16 @@
+# llm:tool-call
+
+>> jambonz => websocket server
+
+An `llm:tool-call` message is sent by jambonz to the application when an LLM being managed by jambonz (i.e., the [llm verb](docs/webhooks/llm/)) has called a function or tool that the application needs to implement.
+
+The payload will include the following properties: 
+- name: the name of the function,
+- call_id: an identifier that must be returned in the `llm:tool-output` method sent by the application to return function call results, and
+- args: an object containing the parameters provided as part of the function call
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/llm-tool-output.md
+++ b/markdown/docs/ws/llm-tool-output.md
@@ -0,0 +1,15 @@
+# llm:tool-output
+
+>> websocket server => jambonz
+
+An `llm:tool-output` message is sent by the application to jambonz in response to a [llm:tool-call](docs/ws/llm-tool-call/)) message.
+
+The payload must include:
+- tool_call_id: the value of call_id in the "llm:tool-call" message
+- data: an object containing the results of the function
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/llm-update.md
+++ b/markdown/docs/ws/llm-update.md
@@ -0,0 +1,14 @@
+# llm:tool-update
+
+>> websocket server => jambonz
+
+An `llm:tool-update` message is sent by the application to jambonz in response to update an llm conversation that jambonz is managing.
+
+The message has a `data` payload that contains the information that should be used to update the llm.
+
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/overview.md
+++ b/markdown/docs/ws/overview.md
@@ -6,54 +6,73 @@
 - Use `npx create-jambonz-ws-app` to scaffold a webhook application
 - See [@jambonz/node-client-ws](https://www.npmjs.com/package/@jambonz/node-client-ws) for Node.js API

-The websocket API is functionally equivalent to the Webhook API; it is simply an alternative way for an application to interact with and drive jambonz call and message processing.  
-
-The reason we created this alternative API is that there are some use cases - primarily those involving a lot of asynchronous interaction with jambonz - that can be done much easier over a single websocket connection than over a combination of HTTP webhooks and REST APIs.
+The websocket API is functionally equivalent to the Webhook API; it is simply an alternative way for an application to interact with and drive jambonz call and message processing.  We recommend using the websocket API for highly asynchronous applications.

 When you create a jambonz application in the jambonz portal and you want to use the websocket API, simply provide a ws(s) URL for the calling webhook instead of an http(s) URL.  The call status webhook can be the same ws(s) URL, in which case your application will get the call status notifications over the same websocket connections.
 > You can also have call status notifications sent to a completely separate http(s) webhook URL if you prefer.

 The impact of specifying a ws(s) URL as the application calling webhook is that this causes jambonz to establish a websocket connection to that URL when an incoming call (or outbound call) is routed to the jambonz application, and then communicate with your application over that websocket connection. 

+>> In the documentation below, we refer to the websocket server as the "application".
+
 ## Connection management

 The websocket connection will be established by jambonz to the specified websocket URL,  The websocket subprotocol used shall be “ws.jambonz.org”.  If jambonz fails to connect to the provided URL, there will be no retry and the call shall be rejected.

-Once connected, jambonz will send an initial JSON text message to the your server with the same parameters as are provided in the webhook call.  The full message set is described below, but for now we can simply say that:
+Once connected, jambonz will send an initial JSON text message to the your application with the same parameters as are provided in the webhook call.  The full message set is described below, but for now we can simply say that:
 - Only text frames are ever sent over the websocket connections; i.e. no binary frames.
 - All text frames contain JSON-formatted data.
- The information content sent from jambonz to the your server is exactly the same content as that supplied via http webhooks.
+- The information content sent from jambonz to the your application is exactly the same content as that supplied via http webhooks.

-The websocket should generally be closed only from the jambonz side, which happens when the call is ended.  If the your server closes the socket, jambonz will attempt to reconnect, up to a configurable number of reconnection attempts.  Upon reconnecting, jambonz will send an initial reconnect message containing only the callSid of the session.  It is up to the your server to maintain the state of the application between reconnections for the same call.
+The websocket should generally be closed only from the jambonz side, which happens when the call is ended.  If the your application closes the socket, jambonz will attempt to reconnect, up to a configurable number of reconnection attempts.  Upon reconnecting, jambonz will send an initial reconnect message containing only the callSid of the session.  It is up to the your application to maintain the state of the application between reconnections for the same call.

 ## Message format

 As mentioned above, all messages will be JSON payloads sent as text frames.  The following top-level properties will be commonly included:
 - *type*: all messages **must** have a type property.
-  - Messages from jambonz to the your server will have the following types: [`session:new`, `session:reconnect`, `verb:hook`, `call:status`, `error`].
-  - Messages from the your server to jambonz will have the following types: [`ack`, `command`].
- *msgid*: every message sent from jambonz will include a unique message identifier. Messages from the your server application that are responses to jambonz messages (`ack`) **must** include the msgId that they are acknowledging.  
+  - Messages from jambonz to the your application will have the following types: [`session:new`, `session:reconnect`, `verb:hook`, `call:status`, `error`].
+  - Messages from the your application to jambonz will have the following types: [`ack`, `command`].
+- *msgid*: every message sent from jambonz will include a unique message identifier. Messages from the your application application that are responses to jambonz messages (`ack`) **must** include the msgId that they are acknowledging.  

 Note that not all messages sent by jambonz need to be acknowledged.  The message types which **must** be acknowledged are the `session:new`, and `verb:hook` messages.

 ## Message types
-In the sections that follow, we will describe each of the message types in detail.  The table below provides summary information.
+In the sections that follow, we will describe each of the message types in detail.  The tables below provides summary information for:
+
+- client messages (sent from jambonz to your application) and,
+- server messages (sent from your application to jambonz).
+
+### Client messages
+The following messages are sent by jambonz to your application
+
+|type|usage|
+|---|---|
+|session:new|sent when a new call arrives (or an outbound call generated via the  REST API has been answered).  This is analogous to the initial webhook sent by jambonz to gather an initial set of instructions for the call.|
+|session:redirect|sent when live call control has been used to retrieve a new application for either the parent or child call leg.|
+|session:reconnect|sent when the websocket connection was closed unexpectedly by the application and jambonz has successfully reconnected.|
+|call:status|sent any time the call status changes.|
+|verb:hook|sent when an action hook or event hook configured for a verb has been triggered (e.g. a “gather” verb has collected an utterance from the user).|
+|verb:status|sent when a verb has just started or completed executing.  See “command” below; this message is only sent if the application includes “id” properties on the verbs provided.|
+|llm:event|sent when an LLM generates any kind of event; e.g. transcript, etc|
+|llm:tool-call|sent when an LLM agent makes a tool or function call that the application needs to invoke|
+|tts:tokens-result|sent in response to a `tts:tokens` message to indicate whether the tokens have been processed. The payload may indicate that the tokens were not processed due to a throttling limit, in which case the application is expected to queue the tokens and retry later (after a `tts:tokens` message is received indicating the stream has been resumed)|
+|tts:streaming-event|sent to notify an application that a tts stream has been paused or resumed due to throttling limits|
+|dial:confirm|sent when a dialed call has a confirmHook; the application should respond with a payload of verbs to play in the confirm call session|
+|jambonz:error|if jambonz encounters some sort of fatal error (i.e. something that would necessitate ending the call unexpectedly) jambonz will send an error event to the far end application describing the problem.|
+
+### Server messages
+The following messages can be sent from your application back to jambonz
+
+|type|usage|
+|---|---|
+|ack|the jambonz application must respond to any `session:new` or `verb:hook` message with an `ack` message indicating that the provided content in the message has been processed.  The ack message may optionally contain a payload of new instructions for jambonz.|
+|command|the application  will send this message when it wants to asynchronously  provide a new set of instructions to jambonz. The application **may** include an `id` property in each of the verbs included in the command; if so, jambonz will send `verb:status` notifications back to the application when the verb is executed.  The `id` property is a string value that is assigned by the application and is meaningful only to the application (i.e. to jambonz it is simply an opaque piece of tracking data).|
+|llm:tool-output|the application should send this when an LLM has invoked a tool and results are available|
+|llm:update|the apps sends when it wants to asynchronously provide new instructions or session state to the LLM|
+|tts:tokens|sent when the application wants to stream text, e.g. from an LLM being managed on the application side.  This may be called multiple times as the LLM itself streams tokens|
+|tts:flush|sent periodically during TTS streaming when the application wants to cause audio to be generated|
+|tts:clear|sent during TTS streaming when the application wants to discard any queued audio and tokens, e.g. to handle an interruption|

-|message type|sent by|usage|
-|---|---|---|
-|session:new|jambonz|sent when a new call arrives (or an outbound call generated via the  REST API has been answered).  This is analogous to the initial webhook sent by jambonz to gather an initial set of instructions for the call.|
-|session:redirect|jambonz|sent when live call control has been used to retrieve a new application for either the parent or child call leg.|
-|session:reconnect|jambonz|sent when the websocket connection was closed unexpectedly by the websocket server and jambonz has successfully reconnected.|
-|call:status|jambonz|sent any time the call status changes.|
-|verb:hook|jambonz| sent when an action hook or event hook configured for a verb has been triggered (e.g. a “gather” verb has collected an utterance from the user).|
-|verb:status|jambonz|sent when a verb has just started or completed executing.  See “command” below; this message is only sent if the app includes “id” properties on the verbs provided.|
-|llm:event|jambonz|sent when an LLM generates any kind of event; e.g. transcript, etc|
-|jambonz:error|jambonz| if jambonz encounters some sort of fatal error (i.e. something that would necessitate ending the call unexpectedly) jambonz will send an error event to the far end app describing the problem.|
-|ack|websocket server|the ws server will respond to any `session:new` or `verb:hook` message with an `ack` message indicating that the provided content in the message has been processed.  The ack message may optionally contain a payload of new instructions for jambonz.|
-|command|websocket server|the ws server  will send this message when it wants to asynchronously  provide a new set of instructions to jambonz. The app **may** include an `id` property in each of the verbs included in the command; if so, jambonz will send `verb:status` notifications back to the app when the verb is executed.  The `id` property is a string value that is assigned by the app and is meaningful only to the app (i.e. to jambonz it is simply an opaque piece of tracking data).|
-|llm:tool-call|jambonz|sent when an LLM agent makes a tool or function call that the app needs to invoke|
-|llm:tool-output|websocket server|the ws server sends when a tool has been invoked and results are available|
-|llm:update|websocket server|the ws server application sends when it wants to asynchronously provide new instructions or session state to the LLM|


 <p class="flex">
--- a/markdown/docs/ws/tts-clear.md
+++ b/markdown/docs/ws/tts-clear.md
@@ -0,0 +1,13 @@
+# tts:clear
+
+>> websocket server => jambonz
+
+An `tts:clear` message is sent by the application to jambonz to cause all audio generation to stop.  Any queued audio or tokens are immediately discarded.
+
+This message has no other properties.
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/tts-flush.md
+++ b/markdown/docs/ws/tts-flush.md
@@ -0,0 +1,35 @@
+# tts:flush
+
+>> websocket server => jambonz
+
+An `tts:flush` message is sent by the application to jambonz to indicate that the speech synthesizer should be notified to generate audio for the tokens that have been sent.  The application should periodically call tts:flush.  
+
+The snippet of example code below, written using [@jambonz/node-client-ws](https://www.npmjs.com/package/ws) shows how an application streaming tokens from Anthropic could call tts:flush at the end of each message streamed by Anthropic.
+
+```js
+    const stream = await client.messages.create({
+      model: ANTHROPIC_MODEL,
+      max_tokens: 1024,
+      messages: session.locals.messages,
+      stream: true
+    });
+
+    for await (const messageStreamEvent of stream) {
+      if (messageStreamEvent.delta?.text) {
+        const tokens = messageStreamEvent.delta.text;
+        session.sendTtsTokens(tokens)
+          .catch((err) => logger.error({err}, 'error sending TTS tokens'));
+      }
+      else if (messageStreamEvent.type === 'message_stop') {
+        session.flushTtsTokens();
+      }
+    }
+
+```
+
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/tts-streaming-event.md
+++ b/markdown/docs/ws/tts-streaming-event.md
@@ -0,0 +1,18 @@
+# tts:streaming-event
+
+>> jambonz => websocket server
+
+An `tts:streaming-event` message is sent by jambonz to indicate a streaming event
+
+The message shall contain an event_type field with one of the following values:
+
+- stream_open: tokens are now being streamed to the synthesizer; any tokens received before this event would have been queued and are now being sent
+- stream_closed: tokens are no longer being streamed to the synthesizer, but the application may continue sending them; when the stream is next opened the queued tokens will be sent.
+- stream_paused: the token buffer is full and the application should discontinue sending tokens
+- stream_resumed: the application may resume sending tokens
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/tts-tokens-result.md
+++ b/markdown/docs/ws/tts-tokens-result.md
@@ -0,0 +1,20 @@
+# tts:tokens-result
+
+>> jambonz => websocket server
+
+An `tts:tokens-result` message is sent by jambonz in response to each [tts:tokens](docs/ws/tts-tokens) message received.
+
+The message shall contain:
+
+- id: the unique identifier from the tts:tokens message
+- status: 'ok' if tokens were processed, 'failed' if not
+- reason: only supplied it status is 'failed'; provides more detail on why the tokens were not processed
+
+The most common reason for a 'failed' response is if the token buffer maintained by jambonz for this conversation is full.  In that case the reason will be 'full' and the application should stop sending any further tts:tokens messages until a [tts:streaming-event](docs/ws/tts-streaming-event) message is received indicating that streaming of TTS tokens can resume.
+
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>
--- a/markdown/docs/ws/tts-tokens.md
+++ b/markdown/docs/ws/tts-tokens.md
@@ -0,0 +1,23 @@
+# tts:tokens
+
+>> websocket server => jambonz
+
+An `tts:tokens` message is sent by the application to jambonz to stream text to a speech synthesizer.  This requires that a TTS speech vendor that supports streaming is being used for the call.  Currrently, as of release 0.9.3 the following vendors are support for TTS streaming:
+
+- Cartesia
+- Deepgram
+- Elevenlabs
+
+The payload must contain:
+- id: a unique identifier within the current tts stream, identifying this request
+- tokens: a string of text to be synthesized in streaming fashion
+
+The intent is that as the application receives a stream of tokens from an LLM that is being managed by the application it will send this stream on via successive calls to tts:tokens.
+
+The application will receive a [tts:tokens-result](docs/ws/tts-tokens-result) message in response to each tts:tokens message it sends.  
+
+<p class="flex">
+<span>&nbsp;</span>
+<a href="/docs/ws/overview">Prev: overview</a>
+<a href="/docs/ws/session-redirect">Next: session:redirect</a>
+</p>