Detecting Task Stack Overflows

1) Using an MMU or MPU

Stack overflows are easily detected if the processor has a Memory Management Unit (MMU) or a Memory Protection Unit (MPU). Basically, MMUs and MPUs are special hardware devices integrated alongside the CPU that can be configured to detect when a task attempts to access invalid memory locations, whether code, data, or stack. However, setting up an MMU or MPU is well beyond the scope of this book.

2) Using a CPU with stack overflow detection

Some processors, however, do have simple stack pointer overflow detection registers. When the CPU’s stack pointer goes below (or above depending on stack growth) the value set in this register, an exception is generated and the exception handler ensures that the offending code does not do further damage (possibly issue a warning about the faulty code or even terminate it). The .StkLimitPtr field in the OS_TCB (see Task Control Blocks) is provided for this purpose as shown in the figure below. Note that the position of the stack limit is typically set at a valid location in the task’s stack with sufficient room left on the stack to handle the exception itself (assuming the CPU does not have a separate exception stack). In most cases, the position can be fairly close to &MyTaskStk[0].

Figure - Hardware detection of stack overflows


As a reminder, the location of the .StkLimitPtr is determined by the “stk_limit” argument passed to OSTaskCreate(), when the task is created as shown below:

Listing - stk_limit, ie .StkLimitPtr, as passed to OSTaskCreate()
OS_TCB  MtTaskTCB;
CPU_STK MyTaskStk[1000];
 
 
OSTaskCreate(&MyTaskTCB,
             "MyTaskName",
              MyTask,
             &MyTaskArg,
              MyPrio,
             &MyTaskStk[0],   /* Stack base address                                         */
               100,           /* Used to set .StkLimitPtr to trigger exception ...          */
                              /* ... at stack usage > 90%                                   */
              1000,           /* Total stack size (in CPU_STK elements)                     */
              MyTaskQSize,
              MyTaskTimeQuanta,
              (void *)0,
              MY_TASK_OPT,
             &err);


Of course, the value of .StkLimitPtr used by the CPU’s stack overflow detection hardware needs to be changed whenever µC/OS-III performs a context switch. This can be tricky because the value of this register may need to be changed so that it first points to NULL, then the CPU’s stack pointer is changed, and finally the value of the stack checking register is set to the value saved in the TCB’s .StkLimitPtr. Why? Because if the sequence is not followed, the exception could be generated as soon as the stack pointer or the stack overflow detection register is changed. You can avoid this problem by first changing the stack overflow detection register to point to a location that ensures the stack pointer is never invalid (thus the NULL as described above). Note that I assumed here that the stack grows from high memory to low memory but the concept works in a similar fashion if the stack grows in the opposite direction.

3) Custom software-based stack overflow detection

Whenever µC/OS-III switches from one task to another, it calls a “hook” function (OSTaskSwHook()), which allows the µC/OS-III port programmer to extend the capabilities of the context switch function. So, if the processor doesn’t have hardware stack pointer overflow detection, it’s still possible to “simulate” this feature by adding code in the context switch hook function and, perform the overflow detection in software. Specifically, before a task is switched in, the code should ensure that the stack pointer to load into the CPU does not exceed the “limit” placed in .StkLimitPtr. Because the software implementation cannot detect the stack overflow “as soon” as the stack pointer exceeds the value of .StkLimitPtr, it is important to position the value of .StkLimitPtr in the stack fairly far from &MyTaskStk[0], as shown in the figure below. A software implementation such as this is not as reliable as a hardware-based detection mechanism but still prevents a possible stack overflow. Of course, the .StkLimitPtr field would be set using OSTaskCreate() as shown above but this time, with a location further away from &MyTaskStk[0].

Figure - Software detection of stack overflows, monitoring .StkLimitPtr


4) Redzone stack overflow detection

The Redzone Stack Overflow detection mechanism is built-in within µC/OS-III. This software-based approach implements something defined in the previous section and is enabled by setting OS_CFG_TASK_STK_REDZONE_EN to DEF_ENABLED in os_cfg.h. When enabled, µC/OS-III creates a monitored zone at the end of a task's stack. This zone, the Redzone, is filled upon task creation with a special canary-like value. The zone is identified in red in the next figure. The figure assumes a stack that grows towards lower addresses. Every time a task needs to be switched out, either at the task level or at the interrupt level, µC/OS-III checks if the Redzone has been overwritten and checks if the stack pointer is still within the limits of the stack. If the zone has been overwritten, or if the stack pointer is out of bounds, µC/OS-III informs the user by calling OSRedzoneHitHook(). At that point, it is known that the current task's stack is corrupted. Without an application hook, µC/OS-III simply calls the software-based exception which stops the execution of the application. See the µC-OS-III API Reference for information regarding OSRedzoneHitHook().

Note that the size of the Redzone is determined by OS_CFG_TASK_STK_REDZONE_DEPTH. By default, it is 8 CPU_STK elements deep. Hence, the effectively usable stack space is reduced by 8 elements.

Figure - Redzone Stack Overflow detection


5) Counting the amount of free stack space

Another way to check for stack overflows is to allocate more stack space than is anticipated to be used for the stack, then, monitor and possibly display actual maximum stack usage at run-time. This is fairly easy to do. First, the task stack needs to be cleared (i.e., filled with zeros) when the task is created. Next, a low priority task walks the stack of each task created, from the bottom (&MyTaskStk[0]) towards the top, counting the number of zero entries. When the task finds a non-zero value, the process is stopped and the usage of the stack can be computed (in the number of bytes used or as a percentage). Then, you can adjust the size of the stacks (by recompiling the code) to allocate a more reasonable value (either increase or decrease the amount of stack space for each task). For this to be effective, however, you need to run the application long enough for the stack to grow to its highest value. This is illustrated in the figure below. µC/OS-III provides a function that performs this calculation at run-time, OSTaskStkChk() and in fact, this function is called by OS_StatTask() to compute stack usage for every task created in the application (to be described later).

Figure - Software detection of stack overflows, walking the stack