Overview
The WebSocket Secure (WSS)/stream API enables real-time, bidirectional communication with the Corti system for interaction streaming. Clients can send and receive structured data, including transcripts and fact updates. Learn more about FactsR™ here.
This documentation provides a structured guide for integrating the Corti WSS API for real-time interaction streaming.
This
/stream endpoint supports real-time ambient documentation interactions and clinical decision support workflows.- If you are looking for a stateless endpoint that is geared towards front-end dictation workflows you should use the /transcribe WSS
- If you are looking for asynchronous ambient documentation interactions, then please refer to the /documents endpoint
1. Establishing a Connection
Clients must initiate a WebSocket connection using thewss:// scheme and provide a valid interaction ID in the URL.
When creating an interaction, the 200 response provides a
websocketUrl for that interaction including the tenant-name as url parameter.
The authentication for the WSS stream requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.Path Parameters
Unique interaction identifier
Query Parameters
eu or usSpecifies the tenant context
Bearer $token
Using SDK
You can use the Corti SDK (currently in “beta”) to connect to a stream endpoint.
2. Handshake Responses
101 Switching Protocols
Indicates a successful WebSocket connection. Once connected, the server streams data in the following formats:Transcripts Data Streams
| Property | Type | Description |
|---|---|---|
type | string | ”transcript” |
data | array of objects | Transcript segments |
data[].id | string | Unique identifier for the transcript |
data[].transcript | string | The transcribed text |
data[].final | boolean | Indicates whether the transcript is finalized or interim |
data[].speakerId | integer | Speaker identifier (-1 if diarization is off) |
data[].participant.channel | integer | Audio channel number (e.g. 0 or 1) |
data[].time.start | number | Start time of the transcript segment |
data[].time.end | number | End time of the transcript segment |
Facts Data Streams
| Property | Type | Description |
|---|---|---|
type | string | ”facts” |
fact | array of objects | Fact objects |
fact[].id | string | Unique identifier for the fact |
fact[].text | string | Text description of the fact |
fact[].group | string | Categorization of the fact (e.g., “medical-history”) |
fact[].groupId | string | Unique identifier for the group |
fact[].isDiscarded | boolean | Indicates if the fact was discarded |
fact[].source | string | Source of the fact (e.g., “core”) |
fact[].createdAt | string (date-time) | Timestamp when the fact was created |
fact[].updatedAt | string or null (date-time) | Timestamp when the fact was last updated |
By default, incoming audio and returned data streams are persisted on the server, associated with the interactionId. You may query the interaction to retrieve the stored
recordings, transcripts, and facts via the relevant REST endpoints. Audio recordings are saved as .webm format; transcripts and facts as json objects.Data persistence can be disabled by Corti upon request when needed to support compliance with your applicable regulations and data handling preferences.Using SDK
You can use the Corti SDK (currently in “beta”) to subscribe to stream messages.
3. Sending Messages
Clients must send a stream configuration message and wait for a response of typeCONFIG_ACCEPTED before transmitting other data.
Once the server responds with {"type": "CONFIG_ACCEPTED"} Clients can proceed with sending audio or controlling the stream status.
Stream Configuration
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | ”config” |
configuration | object | Yes | Configuration settings |
configuration.transcription.primaryLanguage | string (enum) | Yes | Primary spoken language for transcription |
configuration.transcription.isDiarization | boolean | No - false | Enable speaker diarization |
configuration.transcription.isMultichannel | boolean | No - false | Enable multi-channel audio processing |
configuration.transcription.participants | array | Yes | List of participants with roles assigned to a channel |
configuration.transcription.participants[].channel | integer | Yes | Audio channel number (e.g. 0 or 1) |
configuration.transcription.participants[].role | string (enum) | Yes | ”doctor”, “patient”, or “multiple” |
configuration.mode.type | string (enum) | Yes | ”facts” or “transcription” |
configuration.mode.outputLocale | string (enum) | No | Output language locale (required for facts) |
Example
wss:/stream configuration example
Using SDK
You can use the Corti SDK (currently in “beta”) to send stream configuration.
Sending Audio Data
Ensure that your configuration was accepted before starting to send audio and that your initial audio chunk is not too small as it needs to contain the headers to properly decode the audio.We recommend sending audio in chunks of 500ms. In terms of buffering, the limit is 64000 bytes per chunk.Audio data should be sent as raw binary without JSON wrapping.
Channels, participants and speakers
In a typical on-site setting you should send mono-channel audio. If the microphone is a stereo-microphone, you can ensure to setisMultichannel: false and audio will be converted to mono-channel, ensuring no duplicate transcripts are being returned.
In a virtual setting such as telehealth, you would typically have the virtual audio on one channel from webRTC and mix in on a separate channel the microphone of the local client. In this scenario, define isMultichannel: true and assign each channel the relevant participant role (e.g., if the doctor is on the local client and channel 0, then you can set the role for channel 0 to doctor).
Diarization is independent of audio channels and participant roles. If you want transcript segments to be assigned to automatically identified speakers, set isDiarization: true. If false, transcript segments will be returned with speakerId: -1. If set to true, then diarization will try to identify speakers separately on each channel. The first identified speaker on each channel will have transcript segments with speakerId: 0, the second speakerId: 1 and so forth.
SpeakerIds are not related or matched to participant roles.
Using SDK
You can use the Corti SDK (currently in “beta”) to send audio data to the stream.
sendAudio method on the stream socket. Audio should be sent as binary chunks (e.g., ArrayBuffer):
Flush the Audio Buffer
To flush the audio buffer, forcing transcript segments to be returned over the web socket, send a message -flush message and then respond with message -
configuration.mode: facts) is not impacted by the flush event and will continue to process as normal.
4. Ending the Session
To end the/stream session, send a message -
mode configuration). Then, the server will send two messages -
ENDED, the server will close the web socket.
You can at any time open the WebSocket again by sending the configuration.
Using SDK
You can use the Corti SDK (currently in “beta”) to control the stream status.
connect),
the socket will close itself without reconnecting when it receives an ENDED message.
When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this,
you must subscribe to the ENDED message and manually close the connection.
5. Error Handling
In case of an invalid or missing interaction ID, the server will return an error before opening the WebSocket.From opening the WebSocket, you need to commit the configuration within 15 seconds, else the WebSocket will close again
reason: language unavailable
Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors.
These are sent as JSON objects with the following structure:
usage and type ENDED.
Using SDK
You can use the Corti SDK (currently in “beta”) to handle error messages.
error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.