Recordings and Transcripts

Speech to text is a machine learning technology that provides the ability to transform individual or conversational spoken audio into text. Such speech-to-text capability has been used in medical transcription workflows for decades. While the capabilities and opportunities for real-time audio streaming are vast, audio file-based transcription is still necessary in many clinical workflows.

This page explains key functionality provided by the Corti audio file processing via the /recordings and /transcripts endpoints.API specifications: Upload recording and create transcript.

Transcript creation is a synchronous-to-asynchronous workflow - see below how to upload audio files, create, and receive transcripts.Read more about real-time audio streaming, here - /transcribe and /streams.

How it Works

Each interaction may have more than one audio file and transcript associated with it. Audio files up to 60 min in total duration, or 150 MB in total size, may be used.
By default, the transcripts endpoint operates synchronous-to-asynchronous: requests will process synchronously for 25 seconds before timeout, upon which processing will continue asynchronously.
- In the latter scenario, a partial or empty transcript with status=processing will be returned with a location header that can be used to retrieve the completed transcript.
- The client can poll the /transcripts endpoint (GET /interactions/{id}/transcripts/{transcriptId}/status) for transcript status monitoring:
  - 202 OK with status processing, completed, or failed
  - 404 Not Found if the interactionId or transcriptId are invalid
- The completed transcript can be retrieved via the Get Transcript request (GET /interactions/{id}/transcripts/{transcriptId}/)
Set the async parameter to true to receive the location header immediately: client does not need to wait for the sync timeout window, request will process asynchronously.

Using the API

Review supported audio file requirements here.

Corti speech to text supports file transcoding; however, it is recommended to follow the outlined best practices for a consistent and reliable experience.

Create an Interaction: POST:/interactions/

Note the interactionId included in the response that will be used for aggregating the audio file and transcript assets.

// Replace with an identifier of your choosing
const IDENTIFIER = "<id-of-your-choosing>";

const { interactionId } = await client.interactions.create({
  encounter: {
    identifier: IDENTIFIER,
    status: "planned",
    type: "first_consultation",
  },
});

// Replace with an identifier of your choosing
const string IDENTIFIER = "<id-of-your-choosing>";

var interaction = await client.Interactions.CreateAsync(new InteractionsCreateRequest
{
    Encounter = new InteractionsEncounterCreateRequest
    {
        Identifier = IDENTIFIER,
        Status = InteractionsEncounterStatusEnum.Planned,
        Type = InteractionsEncounterTypeEnum.FirstConsultation,
    },
});

import requests

# Replace these with your values
ENVIRONMENT = "<eu-or-us>"
IDENTIFIER = "<id-of-your-choosing>"
TENANT = "<your-tenant-name>"
TOKEN = "<your-access-token>"

response = requests.post(
    f"https://api.{ENVIRONMENT}.corti.app/v2/interactions",
    headers={
        "Authorization": f"Bearer {TOKEN}",
        "Tenant-Name": TENANT,
        "Content-Type": "application/json",
    },
    json={
        "encounter": {
            "identifier": IDENTIFIER,
            "status": "planned",
            "type": "first_consultation",
        },
    },
)
response.raise_for_status()
interaction = response.json()
interaction_id = interaction["interactionId"]

# Replace these with your values
ENVIRONMENT="<eu-or-us>"
IDENTIFIER="<id-of-your-choosing>"
TENANT="<your-tenant-name>"
TOKEN="<your-access-token>"

curl -X POST "https://api.${ENVIRONMENT}.corti.app/v2/interactions" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Tenant-Name: ${TENANT}" \
  -H "Content-Type: application/json" \
  -d '{
    "encounter": {
      "identifier": "'"${IDENTIFIER}"'",
      "status": "planned",
      "type": "first_consultation"
    }
  }'

Only encounter (with identifier, status, and type) is required. assignedUserId, patient, period, and title are optional — see Create Interaction for the full schema.

Upload an audio file: POST:/interactions/{id}/recordings/

Note the recordingId that will be used for transcript creation

import { createReadStream } from "fs";

// Replace these with your values
const INTERACTION_ID = "<your-interaction-id>";

const file = createReadStream("sample.mp3", { autoClose: true });
await client.recordings.upload(file, INTERACTION_ID);

// Replace these with your values
const string INTERACTION_ID = "<your-interaction-id>";

var file = File.OpenRead("sample.mp3");
var recording = await client.Recordings.UploadAsync(INTERACTION_ID, file);

import requests

# Replace these with your values
ENVIRONMENT = "<eu-or-us>"
INTERACTION_ID = "<your-interaction-id>"
TENANT = "<your-tenant-name>"
TOKEN = "<your-access-token>"

with open("sample.mp3", "rb") as f:
    response = requests.post(
        f"https://api.{ENVIRONMENT}.corti.app/v2/interactions/{INTERACTION_ID}/recordings/",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Tenant-Name": TENANT,
            "Content-Type": "application/octet-stream",
        },
        data=f,
    )
response.raise_for_status()
recording_id = response.json()["recordingId"]

# Replace these with your values
ENVIRONMENT="<eu-or-us>"
INTERACTION_ID="<your-interaction-id>"
TENANT="<your-tenant-name>"
TOKEN="<your-access-token>"

curl -X POST "https://api.${ENVIRONMENT}.corti.app/v2/interactions/${INTERACTION_ID}/recordings/" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Tenant-Name: ${TENANT}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "@sample.mp3"

Create the transcript: POST:/interactions/{id}/transcripts/

Each interaction may have more than one audio file and transcript associated with it. Audio files up to 120 minutes in total duration, or 150 MB in total size, are supported.

const { recordingId } = await uploadRecording(client, interactionId, "sample.mp3");

const transcript = await client.transcripts.create(interactionId, {
    recordingId: recordingId,
    primaryLanguage: "en"
});

var recording = await UploadRecordingAsync(client, interactionId, "sample.mp3");

var transcript = await client.Transcripts.CreateAsync(
    interactionId,
    new TranscriptsCreateRequest
    {
        RecordingId = recording.RecordingId,
        PrimaryLanguage = "en",
    }
);

import requests

# Replace these with your values
ENVIRONMENT = "<eu-or-us>"
INTERACTION_ID = "<your-interaction-id>"
RECORDING_ID = "<your-recording-id>"
TENANT = "<your-tenant-name>"
TOKEN = "<your-access-token>"

response = requests.post(
    f"https://api.{ENVIRONMENT}.corti.app/v2/interactions/{INTERACTION_ID}/transcripts/",
    headers={
        "Authorization": f"Bearer {TOKEN}",
        "Tenant-Name": TENANT,
        "Content-Type": "application/json",
    },
    json={
        "recordingId": RECORDING_ID,
        "primaryLanguage": "en",
    },
)
response.raise_for_status()
transcript = response.json()

# Replace these with your values
ENVIRONMENT="<eu-or-us>"
INTERACTION_ID="<your-interaction-id>"
RECORDING_ID="<your-recording-id>"
TENANT="<your-tenant-name>"
TOKEN="<your-access-token>"

curl -X POST "https://api.${ENVIRONMENT}.corti.app/v2/interactions/${INTERACTION_ID}/transcripts/" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Tenant-Name: ${TENANT}" \
  -H "Content-Type: application/json" \
  -d '{
    "recordingId": "'"${RECORDING_ID}"'",
    "primaryLanguage": "en"
  }'

Receive the transcript:

First, the transcript will process synchronously for a maximum of 25 seconds
If the audio file transcription takes longer than the 25 second synchronous processing timeout, then it will continue to process asynchronously.
- In this scenario, an empty transcript will be returned with a location header that can be used to retrieve the final transcript via the transcriptId.
- The client can poll the Get Transcript endpoint status (GET /interactions/{id}/transcripts/{transcriptId}/status) for transcript status (processing, completed, failed).

Use the List Transcripts endpoint to view all transcripts associated with an interaction, and completed transcripts can be retrieved via the Get Transcript endpoint.

Features

Click on the cards to learn more…

Languages

Corti speech to text is specifically designed for use in the healthcare domain. A tier system has been introduced to categorize functionality and performance that is available per language and endpoint. Languages in the Enhanced and Premier tiers have the utmost functionality and recognition accuracy - they’re the ones recommended for dictation use.

Audio Configuration

With support for mono or multi-channel audio, with live transcoding and a variety of file formats to choose from, don’t let the complexities fo audio capture and processing inhibit opportunities for real-time intelligence. Read more about our recommendations and best practices.

Punctuation

Punctuation is essential for coherent documentation. Setting the spokenPunctuation parameter to true in /transcripts requests enables verbalized punctuation (“period” or “comma”) to be converted to symbols (”.” or ”,”). automaticPunctuation is enabled by default.

Diarization

Diarization is the process of segmenting an audio recording by speaker, assigning portions of speech to distinct identities (e.g., “Doctor,” “Patient”). This enables accurate transcription, attribution, and analysis of multi-speaker clinical conversations, but is not required for effective AI scribing or workflow speech-enablement.

Formatting

Speech to text can be used to create a verbatim transcript of the audio; however, some content is not documented in the same manner as it is verbalized. The formatting features provide control over how key information should for represented in the textual output.
_{Server defaults are applied and configuration of formatting preferences is not currently exposed through this endpoint as with /transcribe.}

Replacements

Ability to define words or phrases that should be returned in place of the standard output by the speech-to-text model.

Keyterms

Bias speech-to-text output so that new words can be introduced to the system vocabulary (e.g., surnames) or to improve recognition reliability for homophones and words with ambiguous pronunciation.

Please contact us for more information or help.

Endpoints

Features

Best Practices

Guides

Resources

How it Works

Using the API

Features

Languages

Audio Configuration

Punctuation

Diarization

Formatting

Replacements

Keyterms

​How it Works

​Using the API

​Features

​Languages

​Audio Configuration

​Punctuation

​Diarization

​Formatting

​Replacements

​Keyterms

How it Works

Using the API

Features

Languages

Audio Configuration

Punctuation

Diarization

Formatting

Replacements

Keyterms