Audio Class Stream Data Flow

Audio Class Stream Data Flow

The Audio Processing module manages playback and record streams using two internal tasks:

  • Playback task

  • Record task

These two tasks are the glue between the µC/USB-Device Core and the Audio Peripheral Driver.

From a host perspective, a stream lifetime will always consist in:

  1. Opening a stream,

  2. Communicating on this stream,

  3. Closing a stream.

Sections below describe in more detailed manner the streams data flow. 

Playback Stream

A playback stream carries audio data over an isochronous OUT endpoint. There is a one-to-one relation between an isochronous OUT endpoint, an AudioStreaming interface and a Terminal. Figure - Playback Stream Dataflow presents the audio data flow implemented inside the Audio Processing module. The playback path relies on a ring buffer queue to synchronize the playback task, the core task and the codec ISR.

Figure - Playback Stream Dataflow

(1) The host activates the AudioStreaming interface #X by selecting the operational interface (request SET_INTERFACE sent for alternate setting 1). This marks the opening of the playback. The core task will call the function USBD_Audio_AS_IF_Start(). The first isochronous OUT transfer is submitted to the USB device driver to prime the stream. An empty audio buffer is taken from the ring buffer queue.

(2) The host then sets the sampling frequency for a certain isochronous OUT endpoint by sending a SET_CUR request. The function USBD_Audio_DrvAS_SamplingFreqManage() (not indicated in the figure) is called from the core task's context. This function is implemented by the audio peripheral driver and will set a DAC (Digital-to-Analog Converter) clock in the codec.

(3) The USB Device Controller fills the buffer with the isochronous audio data that have been sent by the host. The buffer is retrieved by the core task. As soon as one isochronous transfer is completed, the core task will call the callback USBD_Audio_PlaybackIsocCmpl() passed as a parameter of USBD_IsocRxAsync() . This callback notifies the audio class that a buffer with audio samples is ready for the audio codec.

(4) The received buffer is then added to the ring buffer queue.

(5) The core task will submit all the buffers it can to the USB device driver to feed the stream communication by calling USBD_IsocRxAsync()  several times.

(5a) Once a certain number of buffers (pre-buffering threshold) have been accumulated , the playback stream is started on the codec side by calling the function StreamStart() . The pre-buffering threshold is always equal to (MaxBufNbr / 2). The field MaxBufNbr is part of the structure USBD_AUDIO_STREAM_CFG . Within Drv_API_Ptr->PlaybackStart() , the audio peripheral driver should signal the playback task N times by calling via the function USBD_Audio_PlaybackTxCmpl() . N corresponds to the number of buffers it can queue. The driver should at least support the double-buffering and thus queue two buffers.

(6) Signalling the playback task consists in posting an AudioStreaming (AS) interface handle in a queue. The playback task wakes up and processes the handle. It submits a ready buffer taken from the ring buffer queue to the audio peripheral driver by calling the function StreamPlaybackTx() . Before being submitted to the audio peripheral driver, the received audio data may go through a correction in case of underrun or overrun situation of ring buffer queue. The playback stream correction is explained in section Playback Stream Correction . The audio peripheral driver should accumulate the ready buffer. After at least two buffers accumulated, the driver should send the first buffer to the codec usually by preparing a DMA transfer.

(7) In the same way as the core task in USBD_Audio_PlaybackIsocCmpl() , the playback task submits all buffers it can to the USB device driver by calling USBD_IsocRxAsync() several times.

(8) The buffer contains a chunk (1 ms of audio data) of audio stream. This audio chunk is encoded following a certain format. The audio peripheral driver might have to decode the audio chunk in order to correctly present the audio samples to the codec.

(9) Each time a playback buffer is consumed by the codec, the audio peripheral driver ISR signals to the playback task the end of an audio transfer by calling the function USBD_Audio_PlaybackTxCmpl() . This function posts a AS interface handle and free the consumed buffer back to the ring buffer queue.

(10) Afterwards, steps 3 to 9 are repeated over and over again until the host stops the playback by selecting the default AudioStreaming Interface (request SET_INTERFACE sent for alternate setting 0). At this time, the Audio Processing will stop the streaming on the codec side by calling the audio peripheral driver function StreamStop() . Basically, any playback DMA transfer is aborted. All the playback buffers attached to pending isochronous transfers will be freed automatically by the core which calls USBD_Audio_IsocPlaybackCmpl() for each aborted isochronous transfer.

 

The playback task supports multi-streams. If the audio function uses several USB OUT Terminal types, each USB OUT Terminal is associated to one AudioStreaming interface structure that the playback task manipulates and updates during stream communication.

OS Tick Rate

In case the ring buffer queue is empty when the playback task is submitting a buffer to the audio peripheral driver, a retry mechanism is used to re-submit the buffer 1 ms later. This delay allows other tasks to execute and a new buffer will become available in the ring buffer queue. The function USBD_Audio_OS_DlyMs() is used for this delay. Whenever possible, the OS tick rate should have a 1 ms granularity. It will also help for the audio class tasks scheduling as audio class works on a 1 ms frame basis.  

Record Stream

Figure - Record Stream Dataflow

(1) The host activates the AudioStreaming interface #X by selecting the operational interface (request SET_INTERFACE sent for alternate setting 1). The host then sets the sampling frequency for a certain isochronous IN endpoint by sending SET_CUR request. The function USBD_Audio_DrvAS_SamplingFreqManage  (not indicated in the figure) is called  from the core task's context . This function is implemented by the audio peripheral driver and will set an ADC (Analog-to-Digital Converter) clock in the codec.

(2) When processing the SET_CUR(sampling frequency) request, the audio class will also start the record stream on the codec side by calling StreamStart() . In this function, a DMA transfer will be prepared to get the first record buffer from the codec. The initial receive buffer will be obtained by calling USBD_Audio_RecordBufGet() . This step is not entirely represented in the figure. The audio class ensures that the record stream is started on the codec side after setting the sampling frequency as a codec needs the correct clock settings before getting record data.

(3) Once the first DMA transfer has completed, the audio peripheral driver will obtain the next receive buffer from the ring buffer queue by calling USBD_Audio_RecordBufGet() . This function provides also to the audio peripheral driver the number of bytes to get from the codec.

(4) The buffer will be filled with audio samples given by one or more ADCs (one ADC per logical channel). The buffer will contain 1 ms worth of audio samples. This 1 ms of audio samples should be encoded, either directly by the codec (hardware) or by the audio peripheral driver (software). Most of the time, the codec will provide the chunk of audio stream already encoded. The driver signals the end of an audio transfer to the record task by calling the function USBD_Audio_RecordRxCmpl() . The signal represents an AudioStreaming interface handle.

(5) The record task wakes up and retrieves the ready buffer from the audio peripheral driver by calling the audio peripheral driver function StreamRecordRx() . The buffer is stored in the ring buffer queue.

(6) To prime the audio stream, the record task waits for a certain number of buffers to be ready. The pre-bufferring threshold is always equal to (MaxBufNbr / 2). The field  MaxBufNbr is part of the structure USBD_AUDIO_STREAM_CFG . Once the pre-buffering is done, the record task submits the initial isochronous IN transfer to the USB device driver via USBD_IsocTxAsync() . During the stream communication, the record task does not submit other isochronous transfers. Other USB transfers submission is done by the core task.

There is a special situation where the record task can submit a new transfer. When the stream communication loop is broken, that is there are no more ongoing isochronous transfers in the USB device driver, the record task restarts the stream with a new USB transfer.

(7) The USB device driver will send isochronous audio data to the host during a specific frame.

(8) Upon completion of the isochronous IN transfer, the core task will call the callback USBD_Audio_RecordIsocCmpl() provided by the Audio Processing as an argument of USBD_IsocTxAsync() . This callback will free the buffer by returning it in the ring buffer queue. Before the buffer return int the ring buffer queue, a stream correction may happen for the next record buffer to fill by the codec. The record stream correction is explained in section  Record Stream Correction.  

(9) The core task submits all the ready buffers it can to the USB device driver by calling USBD_IsocTxAsync()  several times. The core task is thus responsible to maintain alive the stream communication by repeating the steps 7 and 8.

(10) Once the audio stream is initiated, the steps 3 to 8 will repeat over and over again until the host stops recording by selecting the default AudioStreaming Interface (request SET_INTERFACE sent for alternate setting 0). At this time, the Audio Processing will stop the streaming on the codec side by calling the audio peripheral driver function StreamRecordStop() . Basically, any record DMA transfer will be aborted. All empty buffers being processed and all ready buffers not yet retrieved by the record task are implicitly freed by the ring buffer queue reset. On the USB side, all the record buffers unconsumed will be freed automatically by the core by calling USBD_Audio_IsocRecordCmpl() for each aborted isochronous transfers.

 

The record task supports multi-streams . If the audio function uses several USB IN Terminal types, each USB IN Terminal is associated to one AudioStreaming interface structure posted in the record task's queue. Thus the record task can handle buffers from different streams.

The record data path takes care of the data rate adjustment. This is required for certain sampling frequencies that do not produce an integer number of audio samples per ms. Partial audio samples are not possible. For those sampling frequencies, the Table - Data Rate Adjustment gives the required adjustment. The data rate adjustment is implemented in the isochronous IN transfer completion callback   USBD_Audio_RecordIsocCmpl().

Table - Data Rate Adjustment

Samples per frame/ms

Typical Packet Size

Adjustment

11.025

11 samples

12 samples every 40 packets (i.e. ms)

22.050

22 samples

23 samples every 20 packets (i.e. ms)

44.1

44 samples

45 samples every 10 packets (i.e. ms)

 

For instance, considering a sampling frequency of 44.1 kHz and a mono microphone, the audio class will send to the host isochronous transfers with a size of 44 samples each frame. In order to have 44 100 samples every second, the audio class will send 45 samples every 10 frames (that is every 10 ms). At one second, the host will have received 100 additional samples added to the 44 000 samples received with the 44-byte isochronous transfers. 

Stream Correction 

Playback Built-In Stream Correction 

The built-in playback stream correction is active only when the constant USBD_AUDIO_CFG_PLAYBACK_CORR_EN is set to DEF_ENABLED. As explained in section Playback Stream, the stream correction is evaluated before the playback task provides a ready buffer to the audio peripheral driver. The evaluation relies on monitoring the playback ring buffer queue. Two thresholds are defined: a lower limit and an upper limit as shown in Figure - Playback Ring Buffers Queue Monitoring. The figure shows the four indexes used in the ring buffer queue. A buffer difference is computed between the indexes ProducerEnd and ConsumerEnd. For the playback path, ProducerEnd  is linked to the USB transfer completion while  ConsumerEnd  is linked to the audio transfer completion. The buffer difference represent a circular distance between two indexes. If the distance is less than the lower limit, you have an underrun situation, that is the USB side does not produce fast enough the audio samples consumed by the codec. Conversely, if the distance is greater than the upper limit, this is an overrun situation, that is the USB side produces faster then the the codec can consume audio data. To keep the codec and USB in sync, a simple algorithm is used to add an audio sample in case of underrun and to remove a sample frame in case of overrun.  

The frequency at which the playback stream correction is evaluated is configurable via the field CorrPeriodMs of the structure  USBD_AUDIO_STREAM_CFG .

Figure - Playback Ring Buffers Queue Monitoring

 

Figure - Adding a Sample in Case of Underrun illustrates the algorithm to add an audio sample in case of underrun situation. 

Figure - Adding a Sample in Case of Underrun
Adding a Sample in Case of Underrun

(1) Sample N is moved at N+1.

(2) Sample N is rebuilt and equal to the average of N-1 and N+1.

(3) The packet size is increased of one sample.

 

The frequency at which the playback stream correction is evaluated is configurable via the field CorrPeriodMs of the structure USBD_AUDIO_STREAM_CFG .

The stream correction supports signed PCM and unsigned PCM8 format.

This stream correction is convenient for low-cost audio design. It will give good results as long as the incoming USB audio sampling frequency is very close to the DAC input clock frequency. However, if the difference between the two frequencies is important, this will add audio distortion.

Figure - Removing a Sample in Case of Overrun illustrates the algorithm to remove an audio sample in case of overrun situation.