Data Types

Audio Processing API

Scalable audio REST API to convert, trim, concatenate, optimize, and compress audio files.

1 Upload your input

To process audio, the input file must be uploaded to your account, or accessible via a URL:

Icon

Use the Bytescale Dashboard to upload a file manually.

Icon

Use the Upload Widget, Bytescale SDKs or Bytescale API to upload a file programmatically.

Icon

Create external HTTP folders to process files hosted externally.

Get the input file's /raw/ URL before continuing.

2 Build your audio URL

To build an audio processing URL:

2a

Get the /raw/ URL for your input file.

2b

Replace /raw/ with /audio/.

2c

Add the querystring parameters documented on this page, e.g.:

https://upcdn.io
Account
/W142hJk/
API
audio
File Path
/example.mp3
Parameters
?br=96

3 Play your audio

Play your audio by navigating to the URL from step 2.

By default, your audio will be encoded to AAC.

The HTTP response will be an HTML webpage with an embedded audio player that's hardcoded to play your audio.

You can change this behaviour — e.g. to return an audio file instead of an audio player — using the parameters documented on this page.

Example #1: Embedding an audio file

To embed audio in a webpage using Video.js:

<!DOCTYPE html>
<html>
<head>
<link href="https://unpkg.com/video.js@7/dist/video-js.min.css" rel="stylesheet">
<script src="https://unpkg.com/video.js@7/dist/video.min.js"></script>
<style type="text/css">
.audio-container {
height: 316px;
max-width: 600px;
}
</style>
</head>
<body>
<div class="audio-container">
<video-js
class="vjs-fill vjs-big-play-centered"
controls
preload="auto">
<p class="vjs-no-js">To play this audio please enable JavaScript.</p>
</video-js>
</div>
<script>
var vid = document.querySelector('video-js');
var player = videojs(vid, {responsive: true});
player.on('loadedmetadata', function() {
// Begin playing from the start of the audio. (Required for 'f=hls-aac-rt'.)
player.currentTime(player.seekable().start(0));
});
player.src({
src: 'https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac-rt&br=80&br=256',
type: 'application/x-mpegURL'
});
</script>
</body>
</html>

Audio encoded using f=hls-aac-rt takes ~10 seconds to play initially and ~100ms on all subsequent requests.

Example #3: Creating MP3 audio

To create an MP3 file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=mp3 to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the MP3 file:

https://upcdn.io/W142hJk/audio/example.mp3?f=mp3
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.mp3",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=mp3&a=/audio.mp3"
}
}
}

Example #4: Creating AAC audio

To create an AAC file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=aac to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the AAC file:

https://upcdn.io/W142hJk/audio/example.mp3?f=aac
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.aac",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=aac&a=/audio.aac"
}
}
}

Example #2: Creating WAV audio

To create a WAV file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=wav-riff to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the WAV file:

https://upcdn.io/W142hJk/audio/example.mp3?f=wav-riff
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.wav",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=wav-riff&a=/audio.wav"
}
}
}

Example #5: Creating HLS audio with multiple bitrates

To create an HTTP Live Streaming (HLS) file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=hls-aac to the URL.

2a

Add parameters from the Audio Transcoding API or Audio Compression API

2b

You can create adaptive bitrate (ABR) audio by specifying multiple groups of bitrate and/or sample rate parameters. The end-user's audio player will automatically switch to the most appropriate variant during playback. By default, a single 96 kbps variant is produced.

2c

You can specify up to 10 variants. Each variant's parameters must be adjacent on the querystring. For example: br=80&sr=24&br=256&sr=48 specifies 2 variants, whereas br=80&br=256&sr=24&sr=48 specifies 3 variants (which would most likely be a mistake). You can add next=true between groups of parameters to forcefully split them into separate variants.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the HTTP Live Streaming (HLS) file:

https://upcdn.io/W142hJk/audio/example.mp3?f=hls-aac&br=80&br=256
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.m3u8",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac&br=80&br=256&a=/audio.m3u8"
}
}
}

Example #6: Creating HLS audio with real-time encoding

Real-time encoding allows you to return HLS manifests (.m3u8 files) while they're being transcoded.

The benefit of real-time encoding is the ability to play web-optimized audio files within seconds of uploading them, rather than having to wait for audio transcoding jobs to complete.

To create HTTP Live Streaming (HLS) audio with real-time encoding:

1

Complete the steps from creating HLS audio.

2

Replace f=hls-aac with f=hls-aac-rt.

3

The result will be an M3U8 file that's dynamically updated as new segments finish transcoding:

https://upcdn.io/W142hJk/audio/example.mp4?f=hls-aac-rt
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=2038521,AVERAGE-BANDWIDTH=2038521,CODECS="mp4a.40.2"
example.mp3!f=hls-aac-rt&a=/0f/manifest.m3u8

Example #7: Extracting audio metadata

The Audio Metadata API allows you to extract the audio file's duration, codec, and more.

To extract an audio file's duration using JavaScript:

<!DOCTYPE html>
<html>
<body>
<p>Please wait, loading audio metadata...</p>
<script>
async function getAudioDuration() {
const response = await fetch("https://upcdn.io/W142hJk/audio/example.mp4?f=meta");
const jsonData = await response.json();
const audioTrack = (jsonData.tracks ?? []).find(x => x.type === "Audio");
if (audioTrack === undefined) {
alert("Cannot find audio metadata.")
}
else {
alert(`Duration (seconds): ${audioTrack.duration}`)
}
}
getAudioDuration().then(() => {}, e => alert(`Error: ${e}`))
</script>
</body>
</html>

Supported Inputs

The Audio Processing API can transcode audio from video and audio files:

Supported Input Audio

The Audio Processing API can transcode audio from the following audio inputs:

File Extension(s)Audio ContainerAudio Codecs

.wma, .asf

Advanced Systems Format (ASF)

WMA, WMA2, WMA Pro

.fla, .flac

FLAC

FLAC

.mp3

MPEG-1 Layer 3

MP3

.ts, .m2ts

MPEG-2 TS

MP2, PCM

.aac, .mp4, .m4a

MPEG-4

AAC

.mka

Matroska Audio Container

Opus, FLAC

.oga

OGA

Opus, Vorbis, FLAC

.wav

Waveform Audio File

PCM

Supported Input Videos

The Audio Processing API can transcode audio from the following video inputs:

File Extension(s)Video ContainerVideo Codecs

.m2v, .mpeg, .mpg

No Container

AVC (H.264), DV/DVCPRO, HEVC (H.265), MPEG-1, MPEG-2

.3g2

3G2

AVC (H.264), H.263, MPEG-4 part 2

.3gp

3GP

AVC (H.264), H.263, MPEG-4 part 2

.wmv

Advanced Systems Format (ASF)

VC-1

.flv

Adobe Flash

AVC (H.264), Flash 9 File, H.263

.avi

Audio Video Interleave (AVI)

Uncompressed, Canopus HQ, DivX/Xvid, DV/DVCPRO, MJPEG

.m3u8

HLS (MPEG-2 TS segments)

AVC (H.264), HEVC (H.265), MPEG-2

.mxf

Interoperable Master Format (IMF)

Apple ProRes, JPEG 2000 (J2K)

.mxf

Material Exchange Format (MXF)

Uncompressed, AVC (H.264), AVC Intra 50/100, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DV25, DV50, DVCPro HD, JPEG 2000 (J2K), MPEG-2, Panasonic P2, SonyXDCam, SonyXDCam MPEG-4 Proxy, VC-3

.mkv

Matroska

AVC (H.264), MPEG-2, MPEG-4 part 2, PCM, VC-1

.mpg, .mpeg, .m2p, .ps

MPEG Program Streams (MPEG-PS)

MPEG-2

.m2t, .ts, .tsv

MPEG Transport Streams (MPEG-TS)

AVC (H.264), HEVC (H.265), MPEG-2, VC-1

.dat, .m1v, .mpeg, .mpg, .mpv

MPEG-1 System Streams

MPEG-1, MPEG-2

.mp4, .mpeg4

MPEG-4

Uncompressed, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000, MPEG-2, MPEG-4 part 2, VC-1

.mov, .qt

QuickTime

Uncompressed, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000 (J2K), MJPEG, MPEG-2, MPEG-4 part 2, QuickTime Animation (RLE)

.webm

WebM

VP8, VP9

Audio Metadata API

Use the Audio Metadata API to extract the duration, codec, and other information from an audio file.

Instructions:

  1. Replace raw with audio in your audio URL.

  2. Append ?f=meta to the URL.

  3. The result will be a JSON payload describing the audio's tracks (see below).

Example audio metadata JSON response:

{
"tracks": [
{
"bitRate": 159980,
"bitRateMode": "VBR",
"channels": 2,
"codec": "AAC",
"codecId": "mp4a-40-2",
"frameCount": 35875,
"frameRate": 46.875,
"samplingRate": 48000,
"title": "Stereo",
"type": "Audio"
}
]
}

Audio Transcoding API

Use the Audio Transcoding API to transcode your audio to a specific format.

Use the f parameter to change the output format of the audio:

FormatTranscodingCompressionBrowser Support

f=mp3

async

good

all

f=aac

async

excellent

all

f=wav-riff

async

none

none

f=wav-rf64

async

none

none

f=hls-aac

async

excellent

requires SDK

f=hls-aac-rt

real-time

excellent

requires SDK

Which output format should I use?

Use f=hls-aac-rt to create web-optimized audio that plays while it's being transcoded.

Omit the f parameter to get a shareable link to your audio (encoded in AAC).

Which audio SDK should I use?

For f=hls-* formats you need to use an audio player SDK that supports HLS (e.g. Video.js). For other formats you can use HTML5's <audio> element.

What is async transcoding?

Asynchronous transcoding means Bytescale will return a JSON response that initially contains "status": "Running". You must poll the URL until "status": "Succeeded" is returned, at which point the JSON response will contain the URL to the encoded audio.

What is real-time transcoding?

Real-time transcoding means Bytescale will stream the audio to your device while it's being transcoded: instead of receiving a JSON response you will receive an M3U8 response. This allows you to start playing transcoded audio within seconds of uploading it.

f=mp3

Transcodes the audio to MP3 (.mp3).

Transcoding: asynchronous (poll for completion)

Response: JSON (contains the URL to the MP3 file on completion)

f=aac

Transcodes the audio to AAC (.aac).

Transcoding: asynchronous (poll for completion)

Response: JSON (contains the URL to the AAC file on completion)

f=wav-riff

Transcodes the audio to Waveform (.wav) using the RIFF wave format.

Transcoding: asynchronous (poll for completion)

Response: JSON (contains the URL to the WAV file on completion)

f=wav-rf64

Transcodes the audio to Waveform (.wav) using the RF64 wave format (to support output audio larger than 4GB).

Transcoding: asynchronous (poll for completion)

Response: JSON (contains the URL to the WAV file on completion)

f=hls-aac

Transcodes the audio to HLS AAC (.m3u8).

Transcoding: asynchronous (poll for completion)

Response: JSON (contains the URL to the M3U8 file on completion)

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=hls-aac-rt

Transcodes the audio to HLS AAC (.m3u8) in real-time.

Transcoding: real-time

During transcoding: there will be an initial ~10 second delay for the first HTTP response. Subsequent HTTP responses will return the M3U8 file with new audio segments appended to it as transcoding progresses (similar to a live audio stream). Generic audio players may hide playback controls and start playing from towards the end of the audio. To overcome this issue, we recommend using an audio player SDK that allows you to force playback from the beginning, and to show the seek bar for live HLS feeds. See the source code returned by f=html-aac for a working example of using Video.js to gracefully play audio files transcoded using f=hls-aac-rt.

Response: M3U8

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=html-aac

Returns a webpage with an embedded audio player that's configured to play the requested audio in AAC.

Useful for sharing links to audio files and for previewing/debugging audio transformation parameters.

Transcoding: real-time

Response: HTML

This is the default value.

f=meta

Returns metadata for the audio file (duration, codec, etc.)

See the Audio Metadata API docs for more information.

Response: JSON (audio metadata)

rt=auto

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will be returned to the user while it's being transcoded only if the transcode rate is faster than the playback rate.

Only supported by f=hls-aac-rt and f=html-aac.

This is the default value.

rt=false

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=false) will never be returned to the user while it's being transcoded.

Use this option as a performance optimization (instead of using rt=auto) when you know the variant will always transcode at a slower rate than its playback rate:

When rt=auto is used, the initial HTTP request for the M3U master manifest will block until the first few segments of each rt=auto and rt=true variants have been transcoded, before returning the initial M3U playlist.

In general, you want to exclude slow-transcoding HLS variants to reduce this latency.

If none of the HLS variants have rt=true or rt=auto then the fastest variant to transcode will be returned during transcoding.

Only supported by f=hls-aac-rt and f=html-aac.

rt=true

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will always be returned to the user while it's being transcoded.

Only supported by f=hls-aac-rt and f=html-aac.

Audio Compression API

Use the Audio Compression API to control the file size of your audio.

br=<int>

Sets the output audio bitrate (kbps).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

16

20

24

28

32

40

48

56

64

80

96

112

128

160

192

224

256

288

320

384

448

512

576

Supported values for f=mp3:

16

24

32

40

48

56

64

72

80

88

96

104

112

120

128

136

144

152

160

168

176

184

192

200

208

216

224

232

240

248

256

264

272

280

288

296

Not applicable to f=wav (Waveform audio files do not have a bitrate).

Default: 96

sr=<number>

Sets the output audio sample rate (kHz).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

8

12

16

22.05

24

32

44.1

48

88.2

96

Supported values for f=mp3:

22.05

32

44.1

48

Supported values for f=wav:

8

16

22.05

24

32

44.1

48

88.2

96

192

Note: the sample rate will be automatically adjusted if the provided value is unsupported by the requested bitrate for the requested audio format (for example, AAC only supports sample rates between 32kHz - 48kHz when a bitrate of 96kbps is used).

Default: 48

Audio Trimming API

Use the Audio Trimming API to remove parts of the audio from the start and/or end.

ts=<number>

Sets the start position of audio, and removes all audio before that point.

If s exceeds the length of the audio, then an error will be returned.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

te=<number>

Sets the end position of audio, and removes all audio after that point.

If e exceeds the length of the audio, then no error will be returned, and the parameter effectively does nothing.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

tm=after-repeat

Applies the trim specified by ts and/or te after the rp parameter is applied.

tm=before-repeat

Applies the trim specified by ts and/or te before the rp parameter is applied.

This is the default value.

Audio Concatenation API

Use the Audio Concatenation API to append additional audio files to the primary audio file's timeline.

append=<string>

Appends the audio from another media file (video or audio file) to the output.

All media files specified via append are concatenated in the order they are specified, with the primary input audio (specified on the URL's file path) playing first.

To use: specify the "file path" attribute of another media file as the query parameter's value.

rp=<int>

Number of times to play the audio file.

If this parameter appears after an append parameter, then it will repeat the appended audio file only.

If this parameter appears before any append parameters, then it will repeat the primary audio file only.

Default: 1

Audio pricing

The Audio Processing API is available on all Bytescale Plans.

Audio price list

Your processing quota (see pricing) is consumed by the output audio file's duration multiplied by a "processing multiplier": the codec of your output audio file determines the "processing multiplier" that will be used.

Audio files can be played an unlimited number of times.

Your processing quota will only be deducted once per URL: for the very first request to the URL.

There is a minimum billable duration of 10 seconds per audio file.

Audio billing example:

A 60-second audio file encoded to AAC would consume 45 seconds (60 × 0.75) from your monthly processing quota.

If the audio file is initially played in January 2023, and is then played 100k times for the following 2 years, then you would be billed 45 seconds in January 2023 and 0 seconds in all the following months. (This assumes you never clear your permanent cache).

CodecProcessing Multiplier

AAC

0.75

MP3

0.75

WAV

1.15

HLS audio pricing

When using f=hls-aac, f=hls-aac-rt or f=html-aac (which uses f=hls-aac-rt internally) your processing quota will be consumed per HLS variant.

When using f=hls-aac-rt each real-time variant (rt=true or rt=auto) will have an additional 10 seconds added to its billable duration.

The default behaviour for HLS outputs is to produce one HLS AAC variant.

You can change this behaviour using the querystring parameters documented on this page.

HLS pricing example:

Given an input audio file of 60 seconds and the querystring ?f=hls-aac-rt&br=64&br=128&br=256&rt=false, you would be billed:

  • 3×60 seconds for 3× HLS variants (br=64&br=128&br=256).

  • 2×10 seconds for 2× HLS variants using real-time encoding.

    • The first two variants on the querystring (br=64&br=128) do not specify rt parameters, so will default to rt=auto.

    • Per the pricing above, real-time variants incur an additional 10 seconds of billable duration.

  • 200 seconds total billed duration: 3×60 + 2×10

Was this section helpful? Yes No

You are using an outdated browser.

This website requires a modern web browser -- the latest versions of these browsers are supported: