Audio Class Stream Data Flow

The Audio Processing module manages playback and record streams using two internal tasks:

Playback task
Record task

These two tasks are the glue between the µC/USB-Device Core and the Audio Peripheral Driver.

From a host perspective, a stream lifetime will always consist in:

Opening a stream,
Communicating on this stream,
Closing a stream.

Sections below describe in more detailed manner the streams data flow.

Playback Stream

A playback stream carries audio data over an isochronous OUT endpoint. There is a one-to-one relation between an isochronous OUT endpoint, an AudioStreaming interface and a Terminal. presents the audio data flow implemented inside the Audio Processing module. The playback path relies on a ring buffer queue to synchronize the playback task, the core task and the codec ISR.

The playback task supports multi-streams. If the audio function uses several USB OUT Terminal types, each USB OUT Terminal is associated to one AudioStreaming interface structure that the playback task manipulates and updates during stream communication.

OS Tick Rate

In case the ring buffer queue is empty when the playback task is submitting a buffer to the audio peripheral driver, a retry mechanism is used to re-submit the buffer 1 ms later. This delay allows other tasks to execute and a new buffer will become available in the ring buffer queue. The function USBD_Audio_OS_DlyMs() is used for this delay. Whenever possible, the OS tick rate should have a 1 ms granularity. It will also help for the audio class tasks scheduling as audio class works on a 1 ms frame basis.

Record Stream

The record task supports multi-streams . If the audio function uses several USB IN Terminal types, each USB IN Terminal is associated to one AudioStreaming interface structure posted in the record task's queue. Thus the record task can handle buffers from different streams.

The record data path takes care of the data rate adjustment. This is required for certain sampling frequencies that do not produce an integer number of audio samples per ms. Partial audio samples are not possible. For those sampling frequencies, the gives the required adjustment. The data rate adjustment is implemented in the isochronous IN transfer completion callback USBD_Audio_RecordIsocCmpl().

For instance, considering a sampling frequency of 44.1 kHz and a mono microphone, the audio class will send to the host isochronous transfers with a size of 44 samples each frame. In order to have 44 100 samples every second, the audio class will send 45 samples every 10 frames (that is every 10 ms). At one second, the host will have received 100 additional samples added to the 44 000 samples received with the 44-byte isochronous transfers.

Stream Correction

Playback Built-In Stream Correction

The built-in playback stream correction is active only when the constant USBD_AUDIO_CFG_PLAYBACK_CORR_EN is set to DEF_ENABLED. As explained in section Playback Stream, the stream correction is evaluated before the playback task provides a ready buffer to the audio peripheral driver. The evaluation relies on monitoring the playback ring buffer queue. Two thresholds are defined: a lower limit and an upper limit as shown in . The figure shows the four indexes used in the ring buffer queue. A buffer difference is computed between the indexes ProducerEnd and ConsumerEnd. For the playback path, ProducerEnd is linked to the USB transfer completion while ConsumerEnd is linked to the audio transfer completion. The buffer difference represent a circular distance between two indexes. If the distance is less than the lower limit, you have an underrun situation, that is the USB side does not produce fast enough the audio samples consumed by the codec. Conversely, if the distance is greater than the upper limit, this is an overrun situation, that is the USB side produces faster then the the codec can consume audio data. To keep the codec and USB in sync, a simple algorithm is used to add an audio sample in case of underrun and to remove a sample frame in case of overrun.

The frequency at which the playback stream correction is evaluated is configurable via the field CorrPeriodMs of the structure USBD_AUDIO_STREAM_CFG.

illustrates the algorithm to add an audio sample in case of underrun situation.

The frequency at which the playback stream correction is evaluated is configurable via the field CorrPeriodMs of the structureUSBD_AUDIO_STREAM_CFG.

The stream correction supports signed PCM and unsigned PCM8 format.

This stream correction is convenient for low-cost audio design. It will give good results as long as the incoming USB audio sampling frequency is very close to the DAC input clock frequency. However, if the difference between the two frequencies is important, this will add audio distortion.

illustrates the algorithm to remove an audio sample in case of overrun situation.

The playback stream correction offers the possibility to apply your own correction algorithm. If an underrun or overrun situation is detected, an application callback is called. shows an example of playback correction callback prototype and definition provided by the application.

If p_as_alt_cfg-> BitRes is equal to 8 bits, it means that the audio data is encoded in PCM8 format (for legacy 8-bit wave format). In this format, audio data is represented as unsigned fixed point. You correction algorithm must take into account signed PCM and unsigned PCM8.

Record Built-In Stream Correction

There is also a built-in record stream correction active only when the constant USBD_AUDIO_CFG_RECORD_CORR_EN is set to DEF_ENABLED. As explained in the section Record Stream, when an isochronous IN transfer completes by calling the callback function USBD_Audio_RecordIsocCmpl() , the stream correction is evaluated. The evaluation relies on monitoring the record ring buffer queue. Two thresholds are defined: a lower limit and an upper limit based on the same principle as shown in . For the record path, ProducerEnd is linked to the audio transfer completion while ConsumerEnd is linked to the USB transfer completion. This is the opposite of the playback. Moreover, the ring buffer queue scheme is common to the playback and record streams. And within the audio class, the definition of overrun and underrun situation is "USB-centric".

Consequently, if the lower limit is reached, you have an overrun situation, that is the USB side consumes a little bit faster than the the codec can produce. Conversely, the upper limit corresponds to an underrun situation, that is the USB side does not consume fast enough the audio samples produced by the codec. As opposed to the playback stream correction, no software algorithm is needed to add or remove an audio sample. The audio class will adjust the audio peripheral hardware by using the number of required record data bytes indicated by USBD_Audio_RecordBufGet(). The correction is done implicitly by the audio peripheral hardware by directly getting the right number of audio samples (-1 sample frame or +1 sample frame) to accommodate the overrun or underrun situation.

The frequency at which the record stream correction is evaluated is configurable via the field CorrPeriodMs of the structure USBD_AUDIO_STREAM_CFG.

Playback Feedback Correction

The feedback correction (refer to section Feedback Endpoint for an overview of feedback) takes place when the configuration constant USBD_AUDIO_CFG_PLAYBACK_FEEDBACK_EN is set to DEF_ENABLED and the AudioStreaming interface uses an isochronous OUT endpoint with asynchronous synchronization. As explained in section Playback Stream, the stream correction is evaluated in the function USBD_Audio_PlaybackCorrSynch() before the playback task provides a ready buffer to the audio peripheral driver.

The feedback value evaluation relies on monitoring the playback ring buffer queue. Based on the same principle as the playback built-in correction, the buffer difference between the indexes ProducerEnd and ConsumerEnd is computed and gives the reflect at which the USB and codec clocks operate. The feedback monitoring starts only when the playback stream priming is done, that is when the audio class calls the audio peripheral driver function USBD_Audio_DrvStreamStart. Once the feedback monitoring has started, the underrun or overrun situation requiring a feedback value to be sent to the host is evaluated using the method shown in . The underrun situation occurs when the USB side is slower than the codec. In that case, depending how fast is the codec, the underrun situation could be light or heavy. The processing will adjust the feedback value by telling the host to add up to one sample per frame depending of the underrun degree. Similarly, the overrun situation occurs when the USB side is faster than the codec. In that case, depending how slow is the codec, the overrun situation could be light or heavy. The processing will adjust the feedback value by telling the host to remove up to one sample per frame depending of the overrun degree.

When coming from the safe zone, the light underrun or overrun is corrected with a feedback value taking into account the variation of buffers during a certain number of elapsed frames. This allows to correct smoothly the stream deviation instead of over-shooting the correction. The feedback value adjustment is between a minimum adjustment and a maximum adjustment:

Underrun situation: +1/2048 sample < adjustment < +1 sample ()
Overrun situation: -1 sample < adjustment < -1/2048 sample

The first feedback value sent by the device is always the nominal value of samples per frame corresponding to the sampling frequency. For instance, if the sampling frequency is 48.0 kHz, the nominal feedback value send to the host will be 48 samples per frame.

The feedback value update to the host is evaluated every refresh period. The refresh period is configurable via the field CorrPeriodMs of the structure USBD_AUDIO_STREAM_CFG. When the refresh period is reached, if there is a correction to apply, the feedback value update is sent to the host by calling the function USBD_IsocTxAsync(). If there is no correction necessary, the audio class does not prepare an isochronous IN transfer. Thus when the host sends an IN token, a zero-length packet is sent by the device. The host interprets this zero-length packet as "continue to apply the previous valid feedback value". The feedback value is sent in 10.14 format.

Audio 1.0 specification indicates that the feedback refresh period can range from 1 (2 ms) to 9 (512 ms). The refresh period is a power of 2: 2, 4, 8,16, 32, 64, 128, 256, 512. A short bRefresh period will result in a tighter control of the stream data rate. A long bRefresh period may add some latency in the control the stream data rate. Refresh periods such as 256 and 512 should be avoided as they can impact the data rate control. For instance, if the bRefresh is 512 ms and, USB and codec clocks diverge quickly, updates of the feedback value every 512 ms may not be fast enough to re-synchronize USB and codec clocks.