80x86 Port with Emulated FP Support

Real Mode, Large Model with Emulated Floating-Point Support

This chapter describes how µC/OS-II has been ported to the Intel 80x86 series of processors running in real mode, large model for the Borland C++ V4.51 tools. This port assumes that your application will not be doing any floating-point math or, if it does, it will use the Borland Floating-Point Emulation library. In other words, I assumed that you would use this port with embedded 80186, 80286, 80386 or even ‘plain’ 8086 class processors which rely only on integer math. This port can also be adapted (i.e., changed) to run ‘plain’ 8086 processors but requires that you replace the use of the PUSHA instruction with the proper number of PUSH instructions.

The Intel 80x86 series includes the 80186, 80286, 80386, 80486, Pentiums™ (all models), Celeron as well most 80x86 processors from AMD, NEC (V-series), and others. Literally millions of 80x86 CPUs are sold each year. Most of these end up in desktop computers, but a growing number of processors are making their way into embedded systems.

Most C compilers that support 80x86 processors running in real mode offer different memory models, each suited for a different program and data size. Each model uses memory differently. The large model allows your application (code and data) to reside in a 1Mb memory space. Pointers in this model require 32 bits, although they only address up to 1Mb. The next section shows why a 32-bit pointer in this model can only address 20 bits worth of memory.

Figure 14.1 shows the programming model of an 80x86 processor running in real mode. All registers are 16 bits wide, and they all need to be saved during a context switch. As can be seen, there are no floating-point registers since these are emulated by the Borland compiler library using the integer registers.The 80x86 provides a clever mechanism to access up to 1Mb of memory with its 16-bit registers. Memory addressing relies on using a segment and an offset register. Physical address calculation is done by shifting a segment register by four (multiplying it by 16) and adding one of five other registers (BP, SP, SI, DI, or IP). The result is a 20-bit address that can access up to 1Mb. Figure 14.2 shows how the registers are combined. Each segment points to a block of 16 memory locations called a paragraph. A 16-bit segment register can point to any of 65,536 different paragraphs of 16 bytes and thus address 1,048,576 bytes. Because the offset is also 16 bits, a single segment of code cannot exceed 64Kb. In practice, however, programs are made up of many smaller segments.The code segment register (CS) points to the base of the program currently executing, the stack segment register (SS) points to the base of the stack, the data segment register (DS) points to the base of one data area, and the extra segment register (ES) points to the base of another area where data may be stored. Each time the CPU needs to generate a memory address, one of the segment registers is automatically chosen and its contents is added to an offset. It is common to find the segment-colon-offset notation in literature to reference a memory location. For example, 1000:00FF represents physical memory location 0x100FF.

Development Tools

I used the Borland C/C++ V4.51 compiler along with the Borland Turbo Assembler to port and test the 80x86 port. This compiler generates reentrant code and provides in-line assembly language instructions that can be inserted in C code. The compiler comes with a floating-point emulation library that simulates the floating-point hardware found on 80x86 processors equipped with floating-point hardware. Once compiled, the code is executed on a PC. I tested the code on a 300 MHz Pentium-II-based computer running the Microsoft Windows 2000 operating system. In fact, I configured the compiler to generate a DOS executable which was run in a DOS window.

I thought of changing compilers because some readers have complained that they can’t find the Borland tools anymore which makes it harder to build the example code provided in this book. It turns out that a similar compiler and assembler that will compile the example code is in fact available from Borland for only $70 USD (circa 2002). Borland calls it the Turbo C++ Suite for DOS and you can order a copy by visiting the Borland web site at www.Borland.com and follow the links to this product.

You can also get professional 80x86 level tools from Paradigm (www.DevTools.com) that contains not only a Borland compatible compiler and assembler but also an IDE (Integrated Development Environment), a utility that will allow you to locate your code for deployement in embedded systems, a source level debugger and more. Paradigm calls their package the Paradigm C++ Professional Real.

Finally, you can also adapt the port provided in this chapter to other 80x86 compiler as long as they generate real-mode code. You will most likely have to change some of the compiler options and assembler directives if you use a different development environment.

Table 14.1 shows the Borland C/C++ compiler V4.51 options (i.e., flags) supplied on the command line. These settings were used to compile the port as well as the example code provided in Chapter 1.Table 14.2 shows the Borland Turbo Assembler V4.0 options (i.e., flags) supplied on the command line. These settings were used to assemble the port’s OS_CPU_A.ASM.

Directories and Files

The installation program provided on the companion CD installs the port for the Intel 80x86 (real mode, large model) on your hard disk. The port is found under the \SOFTWARE\uCOS-II\Ix86L\BC45 directory. The directory name stands for I ntel 80 x86 real mode, L arge model and is placed in the B orland C ++ V 4 . 5 x directory. The source code for the port is found in the following files: OS_CPU.H , OS_CPU_C.C , and OS_CPU_A.ASM .

INCLUDES.H

INCLUDES.H is a master include file and is found at the top of all .C files. INCLUDES.H allows every .C file in your project to be written without concern about which header file is actually needed. The only drawbacks to having a master include file are that INCLUDES.H may include header files that are not pertinent to the actual .C file being compiled and the compilation process may take longer. These inconveniences are offset by code portability. You can edit INCLUDES.H to add your own header files, but your header files should be added at the end of the list. Listing 14.1 shows the contents of INCLUDES.H for the 80x86 port.

INCLUDES.H is not really part of the port but is described here because it is needed to compile the port files.

OS_CPU.H

OS_CPU.H contains processor- and implementation-specific #defines constants, macros, and typedefs. OS_CPU.H for the 80x86 port is shown in Listing 14.2.

OS_CPU_GLOBALS and OS_CPU_EXT allows us to declare global variables that are specific to this port (described later).

OS_CPU.H, OS_ENTER_CRITICAL() and OS_EXIT_CRITICAL()

OS_CPU.H, Stack Growth

OS_CPU.H, OS_TASK_SW()

OS_CPU.H, Tick Rate

The tick rate for an RTOS should generally be set between 10 and 100Hz. It is always preferable (but not necessary) to set the tick rate to a round number. Unfortunately, on the PC, the default tick rate is 18.20648Hz, which is not what I would call a nice round number. For this port, I decided to change the tick rate of the PC from the standard 18.20648Hz to 200Hz (i.e., 5ms between ticks). There are three reasons to do this:

200Hz happens to be almost exactly 11 times faster than 18.20648Hz. The port will need to “chain” into DOS once every 11 ticks. In DOS, the tick handler is responsible for some system maintenance that is expected to happen every 54.93ms.
It’s useful to have a 5.00ms time resolution for time delays and timeouts. If you are running the example code on an 80386 PC, you may find the overhead of a 200Hz tick rate to be unacceptable. However, on todays fast Pentium class processors, a 200Hz tick rate is not likely to be a problem.
Even if it’s possible to change the tick rate on a PC to be exactly 20 Hz or even 100 Hz, it would be difficult to chain into the DOS tick handler at exactly 18.20648Hz. That’s why I chose an exact multiple and thus, had to choose 200 Hz. Of course, I could also have used 22 as a multiple and would have obtained 400 Hz (2.5 ms). On a fast PC, you should have no problems running at this tick rate or even faster.

OS_CPU.H, Floating-Point Emulation

As previously mentionned, the Borland compiler provides a floating-point emulation library. However, this library is non-reentrant.

OS_CPU_C.C

A µC/OS-II port requires that you write ten fairly simple C functions:µC/OS-II only requires OSTaskStkInit(). The other nine functions must be declared but don’t need to contain any code. In the case of this port, I did just that. The #define constant OS_CPU_HOOKS_EN (see OS_CFG.H) should be set to 1.

OSTaskStkInit()

This function is called by OSTaskCreate() and OSTaskCreateExt() to initialize the stack frame of a task so that it looks as if an interrupt has just occurred and all processor registers were pushed onto it. Figure 14.3 shows what OSTaskStkInit() puts on the stack of the task being created. Note that the diagram doesn’t show the stack frame of the code calling OSTaskStkInit() but rather, the stack frame of the task being created.When you create a task, you pass the start address of the task ( task ), a pointer ( pdata ), the task’s top-of-stack ( ptos ), and the task’s priority (prio) to OSTaskCreate() or OSTaskCreateExt() . OSTaskCreateExt() requires additional arguments, but these are irrelevant in discussing OSTaskStkInit() . To properly initialize the stack frame, OSTaskStkInit() (Listing 14.3) requires only the first three arguments just mentioned (i.e., task , pdata , and ptos ).

OSTaskStkInit_FPE_x86()

When floating-point emulation is enable (see the Borland documentation), the stack of the Borland compiled program is organized as shown in Figure 14.3. The compiler assumes that the application runs in a single threaded (i.e., tasking) environment.The Borland C Floating-Point Emulation (FPE) library assumes that about 300 bytes starting at SS:0x0000 are reserved to hold floating-point emulation variables. As far as I can tell, this applies to the ‘large memory model’ only. To accommodate this, a special function ( OSTaskStkInit_FPE_x86() ) must be called prior to calling either OSTaskCreate() or OSTaskCreateExt() to properly initialize the stack frame of each task that needs to perform floating-point operations. This function applies to Borland V3.x and V4.5x compilers and thus, OSTaskStkInit_FPE_x86() would most likely not be included in a port using a different compiler.

The floating-point emulation library stores its data within the reserved space in relation to the current SS register value, assuming that some space starting form SS up (from SS:0x0000 up) is reserved for floating point operations.

µCOS-II’s task stacks are generally allocated statically as shown below.When a task is created by µCOS-II the highest table address of the stack is pass to OSTaskCreate() (or OSTaskCreateExt()) as shown below:The stack of Task1() starts at DS:&Task1Stk[TASK_STK_SIZE-1] while the stack of Task2() starts at DS:&Task2Stk[TASK_STK_SIZE-1]. Once initialized by µC/OS-II, the tasks top-of-stack (TOS) is saved in the task’s OS_TCB (Task Control Block).

The stack of the two tasks created from the previous code is shown in Figure 14.5. As can be seen, both tasks are part of the same segment and, more importantly, they share the same segment base since both stacks are allocated from the same data segment. When µC/OS-II loads a task during a context switch, it sets the SS register to the value of the DS register of the stack. This causes a problem since both tasks would have to share the same floating-point emulation variables!The beginning of the data segment is overwritten with the floating-point emulation library even when we use a semaphore. Protecting this resource with a semaphore would allow exclusive access to the floating-point variables but it does not protect the data segment from being overwriting. Even a single µCOS-II task using floating point overwrites the data segment! Further system behavior depends on what data are overwritten and typically data segment overwriting crashes the system.

A similar situation occurs when the stacks are allocated from the heap since we don’t know what part of memory is being overwritten. Typically, the heap is corrupted because the floating-point emulation library overwrites the header of the heap allocated block.

To fix this problem, the function OSTaskStkInit_FPE_x86() shown in Listing 14.4 needs to be called prior to creating a task. This function basically ‘normalizes’ the stack so that every stack starts at SS:0x0000 and, the function reserves and properly initializes the floating-point emulation variables for the task being created.As can be seen from the code, you need to pass three arguments to OSTaskStkInit_FPE_x86():

pptos

is a pointer to the task’s top-of-stack (TOS) pointer (a pointer to a pointer). The task’s TOS is passed to OSTaskCreate() or OSTaskCreateExt() when you create a task. The stack is allocated from the data space and consist of a value for the DS register and an offset from this segment register. Because OSTaskStkInit_FPE_x86() normalizes the TOS, a pointer to the initial TOS is passed to this function so that it can be altered.

ppbos

is a pointer to the task’s bottom-of-stack (BOS) pointer (a pointer to a pointer). The task’s BOS is not passed to OSTaskCreate() however, it is passed to OSTaskCreateExt(). In other words, ppbos is necessary for OSTaskCreateExt(). The bottom of this stack is generally not located at DS:0000 but instead, at some offset from the DS register. Because OSTaskStkInit_FPE_x86() normalizes the BOS, a pointer to the initial BOS is passed to this function so that it can be altered.

psize

is a pointer to a variable which contains the size of the stack.. The task’s size is not needed by OSTaskCreate() but it is for OSTaskCreateExt(). Because OSTaskStkInit_FPE_x86() reserves storage for the floating-point emulation variables, the available stack size is actually altered by this function which is why a pointer to the size is passed. You must ensure that you pass OSTaskStkInit_FPE_x86() a stack large enough to hold the floating-point emulation variables plus the anticipated stack space needed by your application task.

Figure 14.6 shows what OSTaskStkInit_FPE_x86() does. Note that paragraph alignment is not shown in Figure 14.6.You would use OSTaskStkInit_FPE_x86() as shown in Listing 14.5 which contains an example with both OSTaskCreate() and OSTaskCreateExt() . The code shows that if your task is to do floating-point math, OSTaskStkInit_FPE_x86() MUST be called BEFORE calling either OSTaskCreate() or OSTaskCreateExt() in order to initialize the task's stack as just described. The returned pointers (ptos and pbos) MUST be used in the task creation call. Note that pbos would be passed to OSTaskCreateExt() as the new bottom of stack. You should note that if you were call OSTaskStkChk() (only if the task is created with OSTaskCreateExt() ) to determine the size of the task’s stack at run-time, then OSTaskStkChk() would report that the stack contains 384 bytes less than it’s original size (see the AFTER case of Figure 14.6)!You should be careful that your code doesn’t generate any floating-point exception (e.g., divide by zero) because the floating-point library would not work properly under these circumstances. Run-time exceptions can, however, be avoided by adding range testing code.

OSTaskCreateHook()

As previously mentioned, OS_CPU_C.C does not define code for this function. In other words, no additional work is done by the port when a task is created. The assignment of ptcb to ptcb is done so that the compiler doesn’t complain about OSTaskCreateHook() not doing anything with the argument.

OSTaskDelHook()

As previously mentioned, OS_CPU_C.C does not define code for this function. In other words, no additional work is done by the port when a task is deleted. The assignment of ptcb to ptcb is again done so that the compiler doesn’t complain about OSTaskDelHook() not doing anything with the argument.

OSTaskSwHook()

Again, OS_CPU_C.C doesn’t do anything in this function. You should note that I added the ‘skeleton’ of the code you would need if you were toactually do something in OSTaskSwHook().

OSTaskIdleHook()

Again, OS_CPU_C.C doesn’t do anything in this function.

OSTaskStatHook()

OS_CPU_C.C doesn’t do anything in this function. See Example 3 in Chapter 1 for an example on what you can do with this function.

OSTimeTickHook()

OS_CPU_C.C doesn’t do anything in this function either.

OSInitHookBegin()

OS_CPU_C.C doesn’t do anything in this function.

OSInitHookEnd()

OS_CPU_C.C doesn’t do anything in this function.

OSTCBInitHook()

OS_CPU_C.C doesn’t do anything in this function.

OS_CPU_A.ASM

A µC/OS-II port requires that you write four assembly language functions:

OSStartHighRdy()

This function is called by OSStart() to start the highest priority task ready to run. However, before you can call OSStart(), you must have called OSInit() and then created at least one task [see OSTaskCreate() and OSTaskCreateExt()]. OSStart() sets up OSTCBHighRdy so that it points to the task control block of the task with the highest priority. Figure 14.7 shows the stack frame for an 80x86 real-mode task created by either OSTaskCreate() or OSTaskCreateExt() just before OSStart() calls OSStartHighRdy().The code for OSStartHighRdy() is shown in Listing 14.15.As seen in Figure 14.7, upon executing the IRET instruction, the stack pointer (SS:SP) points to the return address of the task and ‘looks’ as if the task was called by a normal function. SS:SP+4 points to the argument pdata, which is passed to the task. In other words, your task will not know whether it was called by OSStartHighRdy() or any other function!

OSCtxSw()

A task-level context switch is accomplished on the 80x86 processor by executing a software interrupt instruction. The interrupt service routine must vector to OSCtxSw(). The sequence of events that leads µC/OS-II to vector to OSCtxSw() begins when the current task calls a service provided by µC/OS-II, which causes a higher priority task to be ready to run. At the end of the service call, µC/OS-II calls the function OS_Sched(), which concludes that the current task is no longer the most important task to run. OS_Sched() loads the address of the OS_TCB of the highest priority task into OSTCBHighRdy, then executes the software interrupt instruction by invoking the macro OS_TASK_SW(). Note that the variable OSTCBCur already contains a pointer to the current task’s task control block, OS_TCB. The code for OSCtxSw() is shown in Listing 14.16.Figure 14.8 shows the stack frames of the task being suspended and the task being resumed.

Note that interrupts are disabled during OSCtxSw() and also during execution of the user-definable function OSTaskSwHook().

OSIntCtxSw()

OSIntCtxSw() is called by OSIntExit() to perform a context switch from an ISR (Interrupt Service Routine). Because OSIntCtxSw() is called from an ISR, it is assumed that all the processor registers are already properly saved onto the interrupted task’s stack.

The code shown in Listing 14.17 is identical to OSCtxSw(), except for the fact that there is no need to save the registers (i.e., no PUSHA, PUSH ES, or PUSH DS) onto the stack because it is assumed that the beginning of the ISR has already done that. Also, it is also assumed that the stack pointer is saved into the task’s OS_TCB by the ISR. Figure 14.9 also shows the context switch process, from OSIntCtxSw()’ s point of view.

To understand the difference, let’s assume that the processor receives an interrupt. Let’s also supposed that interrupts are enabled. The processor completes the current instruction and initiates an interrupt handling procedure.Your ISR then needs to either call OSIntEnter() or, increment the global variable OSIntNesting by one. At this point, we can assume that the task is suspended and we could, if needed, switch to a different task.

The ISR can now start servicing the interrupting device and possibly, make a higher priority task ready. This occurs if the ISR sends a message to a task by calling either OSFlagPost(), OSMboxPost(), OSMboxPostOpt(), OSQPostFront(), OSQPost() or OSQPostOpt(). A higher priority task can also be resumed if the ISR calls OSTaskResume(), OSTimeTick() or OSTimeDlyResume().

Assume that a higher priority task is made ready to run by the ISR. µC/OS-II requires that an ISR calls OSIntExit() when it has finished servicing the interrupting device. OSIntExit() basically tell µC/OS-II that it’s time to return back to task-level code if all nested interrupts have completed. In other words, when OSIntNesting is decremented to 0 by OSIntExit(), OSIntExit() would return to task level code.

When OSIntExit() executes, it notices that the interrupted task is no longer the task that needs to run because a higher priority task is now ready. In this case, the pointer OSTCBHighRdy is made to point to the new task’s OS_TCB, and OSIntExit() calls OSIntCtxSw() to perform the context switch.Note that interrupts are disabled during OSIntCtxSw() and also during execution of the user-definable function OSTaskSwHook().

OSTickISR()

As mentioned in section 14.03.05, Tick Rate, the tick rate of an RTOS should be set between 10 and 100Hz. On the PC, the ticker occurs every 54.93ms (18.20648Hz) and is obtained by a hardware timer that interrupts the CPU. Recall that I reprogrammed the tick rate to 200Hz. The ticker on the PC is assigned to vector 0x08 but µC/OS-II redefined it so that it vectors to OSTickISR() instead. Because of this, the PC’s tick handler is saved [see PC.C, PC_DOSSaveReturn()] in vector 129 (0x81). To satisfy DOS, however, the PC’s handler is called every 54.93ms (described shortly). Figure 14.10 shows the contents of the interrupt vector table (IVT) before and after installing µC/OS-II.With µC/OS-II, it is very important that you enable ticker interrupts after multitasking has started; that is, after calling OSStart() . In the case of the PC, however, ticker interrupts are already occurring before you actually execute your µC/OS-II application.

To prevent the ISR from invoking OSTickISR() until µC/OS-II is ready, do the following:

main():

Call OSInit() to initialize µC/OS-II.
Call PC_DOSSaveReturn() (see PC.C)
Call PC_VectSet() to install context switch vector OSCtxSw() at vector 0x80
Create at least one application task
Call OSStart() when you are ready to multitask

The first task to execute needs to:

Install OSTickISR() at vector 0x08
Change the tick rate from 18.20648 to 200Hz

The tick handler on the PC is somewhat tricky, so I will explain it using the pseudocode shown in Listing 14.18. This code would normally be written in assembly language.The actual code for OSTickISR() is shown in Listing 14.19 for your reference. The number in Listing 14.19 corresponds to the same item in Listing 14.18. You should note that the actual code in the file contains comments.You can simplify OSTickISR() by not increasing the tick rate from 18.20648 to 200Hz, as shown in the pseudocode in Listing 14.20. The actual code is shown in Listing 14.21 and matches the same item from Listing 14.20. This code is included so that you can model your ISRs after it.Note that you must not change the tick rate by calling PC_SetTickRate() if you are to use this version of the code. In other words, you must leave the tick rate alone. You also have to change the configuration constant OS_TICKS_PER_SEC (see OS_CFG.H) from 200 to 18. You should note that the tick rate is not actually 18 but 18.20648. You need to be aware of this, especially if you want to delay a task for 10 seconds. You would specify 10 *OS_TICKS_PER_SEC ticks and it would actually end up being only 9.8866 seconds!

Memory Usage

Table 14.3 shows the amount of memory (both code and data space) used by µC/OS-II based on the value of configuration constants. Data in this case means RAM and code means ROM if µC/OS-II is used in an embedded system.

The spreadsheet is actually provided on the companion CD (\SOFTWARE\uCOS-II\Ix86L\BC45\DOC\80x86L-ROM-RAM.XLS). You need Microsoft Excel for Office 2000 (or higher) to use this file. The spreadsheet allows you to do “what-if” scenarios based on the options you select. You can change the configuration values (in RED) and see how they affects µC/OS-II’s ROM and RAM usage on the 80x86. For the ???_EN values, you MUST use either 0 or 1.

I setup the Borland compiler to generate the fastest code. The number of bytes shown are not meant to be accurate but are simply provided to give you a relative idea of how much code space each of the µC/OS-II group of services require. For example, if you don’t need message queue services (OS_Q_EN is set to 0), then you will save between 1,900 and 2,200 bytes of code space.

The spreadsheet also shows you the difference in code size based on the value of OS_ARG_CHK_EN in your OS_CFG.H. You don’t need to change the value of OS_ARG_CHK_EN to see the difference.

The Data column is not as straightforward. Notice that the stacks for both the idle task and the statistics task have been set to 1,024 bytes (1Kb) each. Based on your own requirements, these number may be higher or lower. As a minimum, µC/OS-II requires about 3,500 bytes of RAM for µC/OS-II internal data structures if you configure the maximum number of tasks (62 application tasks).

Table 14.4 shows how µC/OS-II can scale down the amount of memory required with most of the services disabled. In this case, I allowed only 16 tasks with 20 priority levels (0 to 19). Notice that the Code space is now between 2,400 and 2,700 bytes and Data space for µC/OS-II internals is only about 500 bytes. However, just about the only service you can use in your tasks is OSTimeDly()!

If you use an 80x86 processor, you will most likely not be too restricted with memory and thus, µC/OS-II will most likely not be the largest user of memory.