HOME :: JOB LISTINGS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE


Challenges of Benchmarking Real-World Embedded Processor Strengthens Computer Architecture Concepts

by Kevin A. Kwiat, Ph.D., Air Force Research Laboratory, and Michael Macalik, Rome Research Corporation

In his article "Death of the Hardware Engineer: A Dirge for the Digital Designer" [1], Kevin Morris chronicles how the majority of embedded system functionality is no longer the creation of digital designers; instead, the hardware foundation they established long-ago sinks from our view - buried beneath software layers that are the realm of the computer scientist. Empowered with the principle that hardware and software are logically equivalent [2], computer scientists inter digital design concepts into software libraries. Hardware design then becomes more like software design, but with the need to actually understand hardware drastically diminished. Morris eulogizes the digital designer in stating that "… a framework of ever-higher structures has been designed, refined, repeated and commoditized so that future re-design is mostly unnecessary."

We, the authors of this article, Dr. Kevin Kwiat, and Mr. Michael Macalik, do not look despairingly at how computer science is changing the way embedded system functionality is created; instead we view ourselves as willingly contributors to the change. As an adjunct professor of computer science, Dr. Kwiat has taught computer science at Utica College of Syracuse University, Mohawk Valley Community College, The State University of New York (SUNY) Institute of Technology at Utica Rome (SUNYIT) and Hamilton College. As part of an independent study, Mr. Macalik joined Dr. Kwiat as a teacher's assistant for a computer architecture course at SUNYIT. The course's central theme of hardware and software being logically equivalent conveyed to the computer science students the message that embedded system design is changing. Contained in the message are semblances of the words provided by Kevin Morris who wrote that "Today, digital design bears little resemblance to what I learned in school twenty something years ago." Tasked with preparing students with the "...new type of education" that mirrors the changes in architecture and design, we are compelled not only to pay fitting tribute to what digital design once was but show that by honoring in past we are better able to explore the future.

Nearly twenty years ago, during the Strategic Defense Initiative (SDI) of the 1980s, Air Force Research Laboratory, Information Directorate (AFRL/IF) – then known as Rome Laboratory - was tasked with managing the program to design and development a radiation hardened 32-bit microprocessor. The challenges for space-borne processing called for a processor design that, in addition to being rad-hard, is fault-tolerant, testable, and can achieve a high-level of performance. Four contractors competed for a down-select to two contractors that would eventually implement their designs in silicon; however, at the time of down-select it was understood that all of the contractors would only have register-level simulation models of their processors. Design at this level meant that it would be possible to demonstrate fault-tolerance and testability, but measuring the performance of experimental designs that had not advanced to the stage of having an operating system loomed as a problem. One could measure the rate of instruction execution, but without I/O provided by an operating system, benchmark programs could merely execute without storing results. The absence of stored results called into question the entire notion of the meaningfulness of the benchmarks. Millions of Instructions Per Second (MIPS) could be measured by executing the benchmarks on the simulation models of the processors; however, were the benchmarks being completely and correctly executed? The answer came in the form of software and hardware being logically equivalent: use a digital structure that performs data compression. A Linear Feedback Shift Register (LFSR) implemented in the benchmark software reduced the results of each benchmark program to a single word result that could be inspected to determine whether the benchmark executed completely and correctly [3].

Students were posed with this situation: they are part of a team designing an experimental microprocessor that implements the MIPS ISA for their company's next market entry. The company that they work for wants to be among the first to market a microprocessor created with the latest semiconductor technology, but unfortunately the investment of internal R&D funds curtails the design to its rudimentary stages such that its I/O is not fully developed. While the processor can easily generate its own inputs, the team must assume that, for output, there is no way to create a file. They face the constraint that the entire benchmark output be confined to registers. Assuming that the benchmark programs will need as many registers as possible, then only one register is reserved to capture the benchmark's output. Since they can only output one 8 bit result, they must resort to compression to confirm that the processor executed the program correctly.

Logical equivalence of hardware and software is the theme behind Andrew Tannenbaum's computer architecture textbook [2] that has been published in 5 editions for 30 years. Many of the students in our undergraduate computer architecture course encounter in this book a first-time exposure to computer hardware. They created, in MIPS [5] assembly language (MIPS in this case stands for Microprocessor without Interlocking Pipeline Stages – not to be confused with the above use of MIPS), computationally-intensive benchmark programs with an LFSR embedded into the benchmarks' code. Although the project used the freeware version of the MIPS assembly language simulator – SPIM - it was construed that these "benchmark" programs would be run on an experimental architecture that implements the MIPS instruction set architecture. The students, therefore, were faced with the same problem as the earlier rad-hard microprocessor program – I/O was not yet supported, so a file containing results could not be created. The programs could quite easily create their own input, but the large amount of un-storable program output would have to be compressed into a one-word signature.

In addition to reinforcing the notion of hardware and software being logically equivalent, the benchmarking exercise exposed to the students a fundamental problem facing software engineers: the difficulty of inspecting a large amount of test data. The data generated by the benchmarks was too voluminous for human inspection, so the students had to rely upon their ability to correctly program in assembly language. Some students recognized a similar idea to hardware-software logical equivalence: two software programs created according to the same specification but implemented at different levels of abstraction are also logically equivalent. Students could also write the benchmarks in a higher-order language and include the simulated LFSR at that level as well. Obtaining identical signatures from the assembly language level and the higher-order language affirmed a student's confidence in the correctness of their benchmark.

The MIPS architecture was chosen as the platform for the project as it is well described in the chosen text and is commonly used in research, academia, and industry. The MIPS architecture was developed at Stanford University 20 years ago by engineering professor (now University President) John Hennessy. Its efficient architecture and predisposition to easy scalability has not only withstood the test of time, but rallied huge followings in the consumer and military electronics industries. MIPS is now synonymous with MIPS Technologies, Inc. [5]. Their technology can be found at the heart of a staggering list of electronic products. MIPS processors are an invisible yet integral part of everyday life. Coincidently, a cell phone rang during the lecture as the MIPS processor was being introduced. Such would normally be an annoying interruption, but it segued to our point about the ubiquitous MIPS architecture . Broadcom [6], a leading manufacturer of chips for the telecommunications industry uses the MIPS Technologies Rx000 series processor as their cell phone core.

The SPIM MIPS simulator was selected as the design environment. SPIM is a self-contained simulator that will run MIPS32 assembly language programs. It reads and executes assembly language programs written for MIPS R2000/R3000 family of processors. This simulator provides a simple debugger, a view of register contents, and console output capability. SPIM is freely available for download from the University of Wisconsin Computer Science website [4].

The 8-bit LFSR was selected to demonstrate the concept of hardware implemented in software. With an LFSR of only 8 bits, students could readily verify its functionality manually and, as to demonstrate software's flexibility vis-à-vis hardware's rigidity, they implemented the 8-bit version using the MIPS programming language that deals with 32-bit words. Using an LFSR was not esoteric, and we cited examples of common use in industry. Xilinx Corporation offers a firmware specification for use in their XCx000 line of field programmable gate array (FPGA) devices [7]. They state that systems that use embedded diagnostic features can identify and isolate potential problems even before field personnel are assigned. This reduces equipment down-time and gives field personnel the additional information that saves maintenance effort and money.

The underlying theory of the LFSR (represented mathematically by Galois Theory) is not required for a functional understanding of the circuit. If properly done, the output of the programs should yield identical results as the hardware-intensive version even though the means to those ends are completely different. The hardware version is illustrated in Figure 1 by an 8 bit shift register. In its simplest form the hardware version of the LFSR could be constructed with discrete logical components. A more contemporary hardware solution would be to use a reconfigurable device such as a FPGA with a firmware configuration which implements the equivalent logical circuit. This approach makes the distinction between hardware and software even more blurry. The hardware version is basically formed by an 8 bit shift register with exclusive OR (XOR) gates between each register. Each of the LFSR inputs is "wired" to an XOR. An additional exclusive or gate is provided at the inputs that require a feedback tap to form a three-input XOR (or 3-way XOR). Timing and control logic (not shown here) causes the data in the register to be shifted one bit to the left. Next, the 8 bit input is exclusive OR'ed to the register contents. If the most significant bit (MSB) is a 1, then a 1 is also XOR'ed to the register cells at the feed-back taps. This operation is repeated for each input value.

The software version of the LFSR version in Figure 2 works similarly. The data to be compressed are generated by incrementing a variable in a control loop until the desired number of LFSR inputs have been represented. The first 8 bit word of a native 32 bit register is used as the 8 bit shift register. The shift left logical (SLL) instruction shifts the register one bit to the left. The input value generated by the counter is exclusive OR'ed to the value in the register. If the MSB is a 1, the integer value 99 is also exclusive OR'ed to the signature value. The value 99 is the decimal equivalent of the binary value present at the feedback taps. The register containing the signature value is AND'ed with 255 to mask off the upper 24 bits leaving the 8 bits of interest. Finally a test is performed to check for the last of the data to be compressed.

A subtle, yet significant, difference between the hardware and software versions lies within their internal operations. In the hardware version, the 3-way XOR'ing of each signature, data, and feedback bit and storing the resultant signature bit into a flip-flop occurs within one, indivisible step (i.e., a single clock cycle). The 3-way XOR operation in the software version is broken up into multiple steps: first each signature bit is shifted – unchanged - into is destination flip-flop; second, the same signature bit is XOR'ed with the appropriate incoming bit of data to be compressed; and third, if the feedback bit is a ‘1', then it is XOR'ed.

Feedback Taps
MSB
Data inputs
LSB

Figure 1: Linear Feedback Shift Register (LFSR)

  .
.
.
  addi $t4, $t2, -127
#Create a test condition to
# check for a 1 in the MSB
     
  bgtz $t4, MSB_SET #Jump to MSB_SET if MSB = 1
  sll $t2, $t2, 1 #Shift signature value left 1 bit
  xor $t2, $t2, $t1 #XOR signature with input variable
  j Around  
     
MSB_SET
sll $t2, $t2, 1 #Shift signature value left 1 bit
xor $t2, $t2, $t1 #XOR signature with input variable
xor $t2, $t2, $t3 #XOR feedback taps with signature
#$t3 holds binary representation of feedback taps
Around:
li $v0, 1 #Print input value to screen
and $t4, $t2, 255
move $t2, $t4 #Mask off high words
move $a0, $t1
.
.

Figure 2: MIPS LFSR Code

Some students noted that the LFSR code shown in Figure 2 was somewhat redundant: the instructions "sll $t2, $t2, 1" followed by "xor $t2, $t2, $t1 " appear twice and that need not be the case. However, to have only a single instance of these two instructions in the LFSR code requires not only the use of more branch instructions but the reservation of another register. Although these concerns are transparent to programs written in a higher-order language, the departure of the LFSR code in Figure 2 from "best" programming practices is justified by the reduction in instruction count and the conservation of registers.

The assigned benchmarking problem created over sixty thousand data points to be compressed into a single signature. Since it would be highly impractical to derive the signature manually, the benchmark programs were written in two different languages - MIPS and C++. Verifying that the identical results from these two programs matched those of the students gave us the confidence that the student benchmark programs were indeed yielding the proper results.

Benchmarks expose the potential performance of a digital design; however, benchmarking experimental microprocessors exposed the intrinsic value of knowing how to design hardware. The lesson was a tribute to digital designers: although software becomes layered upon hardware - hardware concepts can surprisingly resurface. Although the logical equivalence of hardware and software changes the way embedded system functionality is designed, the change is not absolute. By having to reach back to hardware's unshakable presence, the students, through their benchmarking efforts, experienced the subtleties that confront engineers who transition hardware to software. The fundamental computer architecture concept that hardware and software are logically equivalent could be treated merely as a platitude, but in of our course it was put to practical use.

This academic form of "technology transfer" was not unidirectional: AFRL/IF's college recruiting includes the same schools where we teach computer science. Exposing potential new-hires to the actual challenges of real-world benchmarking shows them that theory-put-to-practice is a hallmark of the research and development carried out at AFRL/IF. For us, the dirge for the digital designer exists in theory, but, in practice, burial of the hardware engineer would be premature.

References

[1] Morris, Kevin "Death of the Hardware Engineer: A Dirge for the Digital Designer," FPGA and Structure ASIC Journal, http://www.embeddedtechjournal.com/articles_2006/20060418_dirge.htm, TechFocus Media, April 18, 2006.

[2] Tannenbaum, Andrew T. Structured Computer Organization, 5th Edition, Prentice Hall, 2005.

[3] Kwiat, Kevin, Dussault, Heather, Debany, Warren, and Gorniak, Mark, "Benchmarking 32-bit Processors Through Simulation of Their Instruction Set Architectures," Proceedings, Government Microcircuit Applications Conference (GOMAC), November, 1990.

[4] Laurus, James. Senior Researcher, Microsoft Research. Formerly: Professor Computer Sciences Dept., University of Wisconsin-Madison. http://www.cs.wisc.edu/~laurus/spim.html

[5] MIPS Technologies Inc., 1225 Charleston Road, Mountain View, CA 94043-1353

[6] Broadcom Corporation, 16215 Alton Parkway (Buildings A, B and C), Irvine, California 92618

[7] Embedded Instrumentation Using XC9500 CPLDs, Application Note XAPP076, Xilinix January 1997.

Click here for printable PDF
(By clicking on this link you agree to Embedded Technology Journal's Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

by Kevin A. Kwiat, Ph.D., Senior Computer Engineer, Air Force Research Laboratory, and Michael Macalik, Systems Engineer, Rome Research Corporation

June 27, 2006

[back to top]

Comments on this article? Send them to comments@embeddedtechjournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
Embedded Technology Journal
Privacy Statement