In this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 7

What every computer scientist should know about floating-point arithmetic

submited by
Style Pass
2024-10-22 12:00:09

In this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either (1) 64-bit binary encoded ...

Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, ...

Economical hardware often uses a FiXed-point Number System (FXNS), whose constant absolute precision is acceptable for many signal-processing algorithms. The almost-constant relative precision of the more expensive Floating-Point (FP) number ...

The title of this paper is appropriate. In fact, the words “user of computers for computation” could replace the words “computer scientist.” In Section 1, “Rounding Errors,” the details of rounding are described. Some of the topics discussed are floating-point formats, relative error and ulps (units in the last place), guard digits, and cancellation. Goldberg discusses the relations between ulps, relative error, and machine epsilon, with examples illustrating “wobble” in ulps. (Recent correspondence over NA-NET was appropriately concerned with new measures of precision in floating-point arithmetic, which fits in here.) He points out the significance of guard digits. Catastrophic cancellation (between computed numbers) and benign cancellation (between exact numbers) are described with some examples showing how some of this may be avoided by rewriting formulas. Section 2, “IEEE Standard,” describes the two different standards, 754 (for binary) and 854 (for binary or decimal). Subtopics include formats and operations (including base and precision), special quantities (such as NaNs and infinity), and exceptions, flags, and trap handlers. Not only are the standards described, but the reasons they were chosen are discussed and examples are given to illustrate the need or desire for them. This section should be kept in mind when one is considering purchasing a new computer. Section 3, “Systems Aspects,” contains discussion on such topics as instruction sets, languages and compilers (ambiguity and optimizers), and exception handling. Section 4, “Details,” presents proofs of some of the statements made earlier (such as “a single guard digit is enough to guarantee that addition and subtraction will always be accurate”). Finally, in Section 5, “Summary,” the author again states that rigorous reasoning can be applied to floating-point algorithms, consistent with underlying hardware and efficient algorithms. The paper contains an appendix in which several theorems are proven, including a highly accurate summation formula due to W. Kahan. Everyone who uses a computer to compute should read or at least peruse this paper. Many of the examples given are things we do without thinking of possible results. We should at least be aware of what can happen.

Leave a Comment