Operations

GET /attachments/{id}/content

Summary Get the raw data for the attachment with given id
URL /api/v1/attachments/{id}/content
Detailed Description

Download the contents of an attachment. Use this method to download either finished output file(s) (transcripts or captions, depending on the type of order you placed), or the source file(s) for an order.

This endpoint will retrieve the latest version of the attachment, including edits, after all editor sessions for the attachment have been closed for at least two minutes.

For output file attachments, you may request to get the contents in a specific representation (file format), either via the Accept HTTP header or by appending an extension to the end of the URL.

If you optionally requested timestamps with your order, you can retrieve either in-line interval timestamps (e.g. every 30 seconds) or per word timestamps with your transcript. Per word timestamps are only available with the JSON format, all other output formats use interval timestamps.

Burned-in caption video attachments (attachment kind = burnedin) are special. The Accept header is ignored, and they are returned as-is with no conversions, and always with a video/mp4 content type. They also do not include any customer edits (if you wish, you may generate a burned-in video with customer edits reflected from within the customer caption editor).

The following representations are supported for human transcripts and AI transcripts:

Media Type Extension Description
application/vnd.openxmlformats-officedocument.wordprocessingml.document .docx Open Office XML (format used by Microsoft Word 2007+)
application/pdf .pdf Adobe PDF
text/plain .txt Plain text, with control characters removed
text/plain; format=youtube-transcript N/A - must use Accept header Plain text, using YouTube's transcript format. More info.
application/json+rev-transcript .json JSON output. See example JSON output below.
application/x-subrip
(AI transcription only)
.srt See caption formats table below.
text/vtt
(AI transcription only)
.vtt See caption formats table below.
{
    "speakers": [
        {"id": 1, "name": "Interviewee"},
        {"id": 2, "name": "Interviewer"}
    ],
    /* 
        Monologues (transcribed speech). 
        Each monologue corresponds to a run of text from one speaker.
    */
    "monologues": [
        {
        "id": 1,
        "speaker": 1, /* Speaker id, same as declared above */
        "speaker_name": "Interviewee", /* speaker name, same as declared above */
        /* 
            Elements of the monologue. 
            Each element is either text (transcribed speech) 
            or a tag (an annotation), such as inaudible.

            Text elements may or may not contain a timestamp.
            Tag elements require a timestamp.
        */
        "elements": [
            {
                /* Text element, no timestamp */
                "type": "text",
                /* for text elements, value contains the text */
                "value": "I've never been here before" 
            },
            {
                /* Tag element */
                "type": "tag",
                /* for tag elements, value contains the annotation type */
                "value": "inaudible",
                /* 
                    timestamp format is hh:mm:sss,fff, 
                    where fff represents milliseconds 
                */
                "timestamp": "00:00:07,000" 
            },
            /* 
                Text elements, with per word timestamps 
                Whitespace and punctuation are not timestamped
            */
            {
                "type": "text",
                "value": " "
            },
            {
                "type": "text",
                "value": "so",
                "timestamp": "00:00:20,000",
                "end_timestamp": "00:00:20,138"
            },
            {
                "type": "text",
                "value": " "
            },
            {
                "type": "text",
                "value": "this",
                "timestamp": "00:00:20,222",
                "end_timestamp": "00:00:20,566"
            }
        ]
        }
    ]
}
                    

The following representations are supported for captions:

Media Type Extension Description
application/x-subrip .srt SubRip, a simple subtitle text format first used by the SubRip program. More info.
text/x-scc .scc Scenarist Closed Caption, an older but popular format, used by many video authoring tools such as Adobe Premiere. More info.
text/x-mcc .mcc The MacCaption (.mcc) format is used for high-definition Broadcast TV.
application/ttml+xml .ttml Timed Text Markup Language, a W3C standard for specifying timed text to be displayed along with video. More info.
application/x-quicktime-timedtext .qt.txt Quicktime Timed Text, a proprietary text-based format for specifying timed text supported by Apple's Quicktime Authoring software. More info.
text/x-rev-transcript .txt Plain-text format
text/vtt .vtt The WebVTT (Web Video Text Tracks) format is intended for marking up external text track resources. The main use for WebVTT files is captioning or subtitling video content. More info.
application/ttaf+xml .dfxp The Distribution Format Exchange Profile is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions. More info.
application/x-cheetah-cap .cap File format for Cheetah Captivator Offline
text/x-stl .stl Spruce Subtitle File, a simple subtitle text format, originally used by DVD Studio Pro. More info.
text/vnd.avid-ds .txt Avid DS Subtitle File, a captions file format for Avid DS Software
text/vnd.avid-dvd .txt Avid DVD Subtitle File, a captions file format for Avid DVD Software
text/x-facebook-srt .srt Standard SubRip file with Facebook file naming conventions. More info.
text/x-ebu-stl .ebu.stl EBU Subtitle File, a standardized data file format for subtitles, popular in Europe. More info.

If the requested representation is not supported for the output file, then a 406 Not Acceptable status code will be returned. With the Accept header, you may specify multiple acceptable representations (as described in the Accept header HTTP spec), and the best possible match will be returned.

If you do not specify a desired representation via Accept header or URL extension, then the output file contents will be returned in its default format. For transcripts, this will be a JSON output. For captions, this will be a SubRip (.srt) file.

For source media attachments, we do not currently support multiple representations. They will always be returned in the original format in which they were uploaded. If you specify an Accept header or extension that does not match the original format, a 406 Not Acceptable status code will be returned.

For transcripts, if you request an attachment in the plain text representation (using "Accept: text/plain" header or ".txt" URL extension), you can also request a specific character encoding using the Accept-Charset HTTP header. The following encodings are supported:

Character Set Encoding Description
utf-8 UTF-8 encoding. The byte order mark is not included.
utf-16 UTF-16 encoding. Note this results in a binary file - it is unlikely this is what you want.
iso-8859-1 Commonly referred to as Western European (ISO), this consists of the ASCII characters, a set of control characters in hex 80-9F range, plus a "Latin Extended" character block with common symbols and diacritized characters in the hex A0-FF range. Characters outside the 00-FF range will be mapped to their nearest equivalents or replaced by "?".
Full details about this character set are here: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
windows-1252 Windows code page 1252, or Western European (Windows). Similar to iso-8859-1, but uses characters in hex 80-9F range for displayable characters such as smart quotes, rather than control characters.
Full details about this character set are here: https://en.wikipedia.org/wiki/Windows-1252
us-ascii US ASCII - this consists purely of ASCII characters in the hex 00-7F range, which includes numbers, the letters a-z and A-Z, punctuation, and some control characters. Characters outside the 00-FF range will generally be omitted, except that diacritized characters are replaced with their basic latin character equivalents (eg À is replaced with A).
Full details about this character set are here: https://en.wikipedia.org/wiki/ASCII

If you do not specify a desired character encoding via the Accept-Charset header, then the UTF-8 encoding will be used. If something other than the plain text representation is requested, then the Accept-Charset header is ignored.

Request Parameters
  • id - id of the attachment to retrieve
Request Headers
  • Authorization - contains client/user API keys
  • Accept - optional, specifies a desired format for the attachment as a media type.
  • Accept-Charset - optional, specifies a desired character set encoding for the attachment, if a plain text representation is requested.
Response On success, 200 OK.
If the attachment with given id is not found, 404 Not Found
If none of the requested representations are available for the attachments, 406 Not Acceptable
Response Headers
  • Content-Type - the content type of the attachment
Response Body On success, the raw data for the attachment’s contents in the requested format
Error Codes None