컴퓨터시스템구조론 Stallings 저 / 김종현 역 제9판 Lecture slides prepared for “Computer Organization and Architecture”, 9/e, by William Stallings, Chapter 10 “Computer Arithmetic”. 컴퓨터시스템구조론 Stallings 저 / 김종현 역 제9판
We begin our examination of the processor with an overview of the arithmetic and logic unit (ALU). The chapter then focuses on the most complex aspect of the ALU, computer arithmetic. The logic functions that are part of the ALU are described in Chapter 12, and implementations of simple logic and arithmetic functions in digital logic are described in Chapter 11. Computer arithmetic is commonly performed on two very different types of numbers: integer and floating point. In both cases, the representation chosen is a crucial design issue and is treated first, followed by a discussion of arithmetic operations. This chapter includes a number of examples, each of which is highlighted in a shaded box. 제10장 컴퓨터산술
산술논리연산장치(ALU) 컴퓨터의 한 부품으로서 데이터에 대하여 실제적으로 산술 및 논리 연산을 수행하는 부분 컴퓨터의 한 부품으로서 데이터에 대하여 실제적으로 산술 및 논리 연산을 수행하는 부분 컴퓨터시스템의 다른 모든 요소들, 즉 제어 유니트, 레지스터, 기억장치 및 I/O 장치는 처리될 데이터를 ALU로 가져오거나 그 결과를 다시 내보내는 일을 수행 ALU를 비롯하여 사실상 컴퓨터의 모든 전자 부품들은 2진수를 저장하고 간단한 부울 논리 연산들(Boolean logic operations)을 수행할 수 있는 간단한 디지털 논리 회로들을 이용하여 만들어짐 The ALU is that part of the computer that actually performs arithmetic and logical operations on data. All of the other elements of the computer system—control unit, registers, memory, I/O—are there mainly to bring data into the ALU for it to process and then to take the results back out. We have, in a sense, reached the core or essence of a computer when we consider the ALU. An ALU and, indeed, all electronic components in the computer are based on the use of simple digital logic devices that can store binary digits and perform simple Boolean logic operations.
ALU의 입력 및 출력들 Figure 10.1 indicates, in general terms, how the ALU is interconnected with the rest of the processor. Operands for arithmetic and logic operations are presented to the ALU in registers, and the results of an operation are stored in registers. These registers are temporary storage locations within the processor that are connected by signal paths to the ALU (e.g., see Figure 2.3). The ALU may also set flags as the result of an operation. For example, an overflow flag is set to 1 if the result of a computation exceeds the length of the register into which it is to be stored. The flag values are also stored in registers within the processor. The processor provides signals that control the operation of the ALU and the movement of the data into and out of the ALU.
정수 표현 2진수 체계에서 임의의 수는 아래에 의해 표현될 수 있다: 0과 1 (음수에 대한) 마이너스 부호 (소수 부분을 가진 수들을 위하여) 소수점(period) 혹은 기수점(radix point)을 이용하여 표현될 수 있다. 컴퓨터의 저장 및 처리 목적을 위해서는, 마이너스 부호와 소수점을 위한 특수 기호들의 이점을 누리지 못한다 2진 숫자들(binary digits: 0과 1)만 수를 표현하는 데 사용될 수 있다 In the binary number system, arbitrary numbers can be represented with just the digits zero and one, the minus sign (for negative numbers), and the period, or radix point (for numbers with a fractional component). For purposes of computer storage and processing, however, we do not have the benefit of special symbols for the minus sign and radix point. Only binary digits (0 and 1) may be used to represent numbers. If we are limited to nonnegative integers, the representation is straightforward.
부호-크기 표현(Signed-Magnitude Representation) 양수뿐 아니라 음수를 표현하는 방법에는 여러 가지가 있다 부호-크기(sign-magnitude) 표현은 부호 비트를 사용하는 가장 간단한 표현 형태이다 결점들: 이와 같은 결점들 때문에 부호-크기 표현은 ALU의 정수 부분을 표현하는 데 거의 사용되지 않고 있다 모든 방법들에서 단어의 가장 중요한 맨좌측 비트(leftmost bit)는 부호 비트(sign bit)로 사용된다: 만약 그 비트가 0 이면 양수이다 만약 1 이면 음수이다 덧셈과 뺄셈을 수행하는 과정에서 부호와 상대적 크기를 모두 고려해야 한다 0에 대한 표현이 두 가지 There are several alternative conventions used to represent negative as well as positive integers, all of which involve treating the most significant (leftmost) bit in the word as a sign bit. If the sign bit is 0, the number is positive; if the sign bit is 1, the number is negative. The simplest form of representation that employs a sign bit is the sign-magnitude representation. In an n-bit word, the rightmost n - 1 bits hold the magnitude of the integer. There are several drawbacks to sign-magnitude representation. One is that addition and subtraction require a consideration of both the signs of the numbers and their relative magnitudes to carry out the required operation. This should become clear in the discussion in Section 10.3. Another drawback is that there are two representations of 0: This is inconvenient because it is slightly more difficult to test for 0 (an operation performed frequently on computers) than if there were a single representation. Because of these drawbacks, sign-magnitude representation is rarely used in implementing the integer portion of the ALU. Instead, the most common scheme is twos complement representation.
2의 보수 표현 최상위 비트를 부호 비트로 사용 다른 비트들을 해석하는 방법은 부호-크기 표현의 경우와 다름 Like sign magnitude, twos complement representation uses the most significant bit as a sign bit, making it easy to test whether an integer is positive or negative. It differs from the use of the sign-magnitude representation in the way that the other bits are interpreted. Table 10.1 highlights key characteristics of twos complement representation and arithmetic, which are elaborated in this section and the next. Most treatments of twos complement representation focus on the rules for producing negative numbers, with no formal proof that the scheme is valid. Instead, our presentation of twos complement integers in this section and in Section 10.3 is based on [DATT93], which suggests that twos complement representation is best understood by defining it in terms of a weighted sum of bits, as we did previously for unsigned and sign-magnitude representations. The advantage of this treatment is that it does not leave any lingering doubt that the rules for arithmetic operations in twos complement notation may not work for some special cases. 표 10.1 2의 보수 표현과 산술의 주요 특징들
표 10.2 4-비트 정수에 대한 다른 표현들 Table 10.2 compares the sign-magnitude and twos complement representations for 4-bit integers. Although twos complement is an awkward representation from the human point of view, we will see that it facilitates the most important arithmetic operations, addition and subtraction. For this reason, it is almost universally used as the processor representation for integers.
범위 확장 비트의 길이를 증가시킴에 따라 표현될 수 있는 수의 범위가 확장됨 비트의 길이를 증가시킴에 따라 표현될 수 있는 수의 범위가 확장됨 부호-크기 표현에서는 단순히 부호 비트를 새로운 맨좌측 비트 위치로 이동시키고, 그 외의 비트들은 0으로 채우면 된다 이 방법은 2의 보수 음의 정수에 대해서는 적용될 수 없다 부호 비트를 새로운 맨좌측 위치로 이동시키고, 그 외의 비트들은 부호 비트와 같은 값으로 채운다 양수의 경우에는 0으로 채우고, 음수의 경우에는 1로 채운다 이것을 부호 확장이라고 부른다 It is sometimes desirable to take an n-bit integer and store it in m bits, where m 7 n. This expansion of bit length is referred to as range extension, because the range of numbers that can be expressed is extended by increasing the bit length. In sign-magnitude notation, this is easily accomplished: simply move the sign bit to the new leftmost position and fill in with zeros. This procedure will not work for twos complement negative integers. Instead, the rule for twos complement integers is to move the sign bit to the new leftmost position and fill in with copies of the sign bit. For positive numbers, fill in with zeros, and for negative numbers, fill in with ones. This is called sign extension.
2진 소수점(radix point 혹은 binary point)이 맨우측 비트의 우측에 고정되어 있다고 가정 고정-소수점 표현 2진 소수점(radix point 혹은 binary point)이 맨우측 비트의 우측에 고정되어 있다고 가정 프로그래머는 2진 소수점의 위치를 묵시적으로 조정함으로써 2진 소수(binary fraction)들에 대해서도 같은 표현 방법을 사용할 수 있음 Finally, we mention that the representations discussed in this section are sometimes referred to as fixed point. This is because the radix point (binary point) is fixed and assumed to be to the right of the rightmost digit. The programmer can use the same representation for binary fractions by scaling the numbers so that the binary point is implicitly positioned at some other location.
음수화(Negation) 부호-크기 표현에서 정수를 음수로 바꾸는 규칙 그 수의 음(negative)의 음은 그 자체: 정수의 각 비트(부호 비트 포함)에 대하여 부울 보수(Boolean complement)를 취한다 그 결과를 부호 없는 2진 정수로 취급하고, 1을 더한다 그 수의 음(negative)의 음은 그 자체: +18 = 00010010 (2의 보수) 비트 단위 보수 = 11101101 + 1 11101110 = -18 In sign-magnitude representation, the rule for forming the negation of an integer is simple: invert the sign bit. In twos complement notation, the negation of an integer can be formed with the following rules: 1. Take the Boolean complement of each bit of the integer (including the sign bit). That is, set each 1 to 0 and each 0 to 1. 2. Treating the result as an unsigned binary integer, add 1. This two-step process is referred to as the twos complement operation, or the taking of the twos complement of an integer. As expected, the negative of the negative of that number is itself: -18 = 11101110 (2의 보수) 비트 단위 보수 = 00010001 + 1 00010010 = +18
음수화의 특수 경우 1 0 = 00000000 (2의 보수) 비트 단위 보수 = 11111111 LSB에 1을 더함 + 1 0 = 00000000 (2의 보수) 비트 단위 보수 = 11111111 LSB에 1을 더함 + 1 결과값 100000000 오버플로우는 무시되며, 따라서: - 0 = 0 There is a carry out of the most significant bit position, which is ignored. The result is that the negation of 0 is 0, as it should be.
음수화의 특수 경우 2 -128 = 10000000 (2의 보수) 비트 단위의 보수 = 01111111 -128 = 10000000 (2의 보수) 비트 단위의 보수 = 01111111 LSB에 1 더하기 + 1 결과값 10000000 = -128 The second special case is more of a problem. If we take the negation of the bit pattern of 1 followed by n - 1 zeros, we get back the same number.
덧셈 Addition in twos complement is illustrated in Figure 10.3. Addition proceeds as if the two numbers were unsigned integers. The first four examples illustrate successful operations. If the result of the operation is positive, we get a positive number in twos complement form, which is the same as in unsigned-integer form. If the result of the operation is negative, we get a negative number in twos complement form. Note that, in some instances, there is a carry bit beyond the end of the word (indicated by shading), which is ignored.
오버플로우 오버플로우 규칙: 만약 두 수들이 더해지고, 두 수들이 모두 양수이거나 모두 음수이면, 만약 그 결과가 반대의 부호를 가진다면 오버플로우가 발생한다 규칙 On any addition, the result may be larger than can be held in the word size being used. This condition is called overflow. When overflow occurs, the ALU must signal this fact so that no attempt is made to use the result. Figures 10.3e and f show examples of overflow. Note that overflow can occur whether or not there is a carry.
뺄셈 뺄셈 규칙: 어떤 수(감수: subtrahend)를 다른 수(피감수: minuend)에서 빼기 위해서는, 감수의 2의 보수(음수화)를 취하고 그것을 피감수와 더한다 규칙 Subtraction rule.
뺄셈 Thus, subtraction is achieved using addition, as illustrated in Figure 10.4. The last two examples demonstrate that the overflow rule still applies.
2의 보수 정수들의 기하학적 표현 Some insight into twos complement addition and subtraction can be gained by looking at a geometric depiction [BENH92], as shown in Figure 10.5. The circle in the upper half of each part of the figure is formed by selecting the appropriate segment of the number line and joining the endpoints. Note that when the numbers are laid out on a circle, the twos complement of any number is horizontally opposite that number (indicated by dashed horizontal lines). Starting at any number on the circle, we can add positive k (or subtract negative k) to that number by moving k positions clockwise, and we can subtract positive k (or add negative k) from that number by moving k positions counterclockwise. If an arithmetic operation results in traversal of the point where the endpoints are joined, an incorrect answer is given (overflow).
덧셈 및 뺄셈을 위한 하드웨어 Figure 10.6 suggests the data paths and hardware elements needed to accomplish addition and subtraction. The central element is a binary adder, which is presented two numbers for addition and produces a sum and an overflow indication. The binary adder treats the two numbers as unsigned integers. (A logic implementation of an adder is given in Chapter 11.) For addition, the two numbers are presented to the adder from two registers, designated in this case as A and B registers. The result may be stored in one of these registers or in a third. The overflow indication is stored in a 1-bit overflow flag (0 = no overflow; 1 = overflow). For subtraction, the subtrahend (B register) is passed through a twos complementer so that its twos complement is presented to the adder. Note that Figure 10.6 only shows the data paths. Control signals are needed to control whether or not the complementer is used, depending on whether the operation is addition or subtraction.
곱셈 Compared with addition and subtraction, multiplication is a complex operation, whether performed in hardware or software. A wide variety of algorithms have been used in various computers. The purpose of this subsection is to give the reader some feel for the type of approach typically taken. We begin with the simpler problem of multiplying two unsigned (nonnegative) integers, and then we look at one of the most common techniques for multiplication of numbers in twos complement representation. Figure 10.7 illustrates the multiplication of unsigned binary integers, as might be carried out using paper and pencil. Several important observations can be made: 1. Multiplication involves the generation of partial products, one for each digit in the multiplier. These partial products are then summed to produce the final product. 2. The partial products are easily defined. When the multiplier bit is 0, the partial product is 0. When the multiplier is 1, the partial product is the multiplicand. 3. The total product is produced by summing the partial products. For this operation, each successive partial product is shifted one position to the left relative to the preceding partial product. 4. The multiplication of two n-bit binary integers results in a product of up to 2n bits in length (e.g., 11 * 11 = 1001). Compared with the pencil-and-paper approach, there are several things we can do to make computerized multiplication more efficient. First, we can perform a running addition on the partial products rather than waiting until the end. This eliminates the need for storage of all the partial products; fewer registers are needed. Second, we can save some time on the generation of partial products. For each 1 on the multiplier, an add and a shift operation are required; but for each 0, only a shift is required.
보호 없는 2진 곱셈의 하드웨어 구현 Figure 10.8a shows a possible implementation employing these measures. The multiplier and multiplicand are loaded into two registers (Q and M). A third register, the A register, is also needed and is initially set to 0. There is also a 1-bit C register, initialized to 0, which holds a potential carry bit resulting from addition.
보호 없는 2진 곱셈을 위한 흐름도 The operation of the multiplier is as follows. Control logic reads the bits of the multiplier one at a time. If Q0 is 1, then the multiplicand is added to the A register and the result is stored in the A register, with the C bit used for overflow. Then all of the bits of the C, A, and Q registers are shifted to the right one bit, so that the C bit goes into An-1, A0 goes into Qn-1 and Q0 is lost. If Q0 is 0, then no addition is performed, just the shift. This process is repeated for each bit of the original multiplier. The resulting 2n-bit product is contained in the A and Q registers. A flowchart of the operation is shown in Figure 10.9, and an example is given in Figure 10.8b. Note that on the second cycle, when the multiplier bit is 0, there is no add operation.
2의 보수 곱셈 Figure 10.10 recasts Figure 10.7 to make the generation of partial products by multiplication explicit. The only difference in Figure 10.10 is that it recognizes that the partial products should be viewed as 2n-bit numbers generated from the n-bit multiplicand.
비교 Now we can demonstrate that straightforward multiplication will not work if the multiplicand is negative. The problem is that each contribution of the negative multiplicand as a partial product must be a negative number on a 2n-bit field; the sign bits of the partial products must line up. This is demonstrated in Figure 10.11, which shows that multiplication of 1001 by 0011. If these are treated as unsigned integers, the multiplication of 9 * 3 = 27 proceeds simply. However, if 1001 is interpreted as the twos complement value -7, then each partial product must be a negative twos complement number of 2n (8) bits, as shown in Figure 10.11b. Note that this is accomplished by padding out each partial product to the left with binary 1s.
Booth의 알고리즘 Booth’s algorithm is depicted in Figure 10.12 and can be described as follows. As before, the multiplier and multiplicand are placed in the Q and M registers, respectively. There is also a 1-bit register placed logically to the right of the least significant bit (Q0) of the Q register and designated Q-1 its use is explained shortly. The results of the multiplication will appear in the A and Q registers. A and Q-1 are initialized to 0. As before, control logic scans the bits of the multiplier one at a time. Now, as each bit is examined, the bit to its right is also examined. If the two bits are the same (1–1 or 0–0), then all of the bits of the A, Q, and Q-1 registers are shifted to the right 1 bit. If the two bits differ, then the multiplicand is added to or subtracted from the A register, depending on whether the two bits are 0–1 or 1–0. Following the addition or subtraction, the right shift occurs. In either case, the right shift is such that the leftmost bit of A, namely, An-1 not only is shifted into An-2, but also remains in An-1. This is required to preserve the sign of the number in A and Q. It is known as an arithmetic shift, because it preserves the sign bit.
Booth 알고리즘의 예 Figure 10.13 shows the sequence of events in Booth’s algorithm for the multiplication of 7 by 3.
Booth 알고리즘을 이용한 사례들 More compactly, the same operation is depicted in Figure 10.14a. The rest of Figure 10.14 gives other examples of the algorithm. As can be seen, it works with any combination of positive and negative numbers. Note also the efficiency of the algorithm. Blocks of 1s or 0s are skipped over, with an average of only one addition or subtraction per block.
나눗셈 Division is somewhat more complex than multiplication but is based on the same general principles. As before, the basis for the algorithm is the paper-and-pencil approach, and the operation involves repetitive shifting and addition or subtraction. Figure 10.15 shows an example of the long division of unsigned binary integers. It is instructive to describe the process in detail. First, the bits of the dividend are examined from left to right, until the set of bits examined represents a number greater than or equal to the divisor; this is referred to as the divisor being able to divide the number. Until this event occurs, 0s are placed in the quotient from left to right. When the event occurs, a 1 is placed in the quotient and the divisor is subtracted from the partial dividend. The result is referred to as a partial remainder. From this point on, the division follows a cyclic pattern. At each cycle, additional bits from the dividend are appended to the partial remainder until the result is greater than or equal to the divisor. As before, the divisor is subtracted from this number to produce a new partial remainder. The process continues until all the bits of the dividend are exhausted.
보호 없는 2진 나눗셈을 위한 흐름도 Figure 10.16 shows a machine algorithm that corresponds to the long division process.
복구식 2의 보수 나눗셈 알고리즘의 예 Example of restoring twos complement division.
부동-소수점 표현 (Floating-Point Representation) 원리 고정-소수점 표기(예: 2의 보수)를 이용하면, 0을 중심으로 하여 양수들과 음수들을 나타낼 수 있다. 고정 2진 혹은 기수 소수점(fixed binary or radix point)이 있다고 가정한다면, 이 방법으로 소수(fraction)도 표현할 수 있다. 한계점들: 아주 큰 수와 매우 작은 소수는 나타낼 수 없다. 매우 큰 두 수의 나눗셈에서 몫의 분수 부분을 잃어버릴 수도 있다 With a fixed-point notation (e.g., twos complement) it is possible to represent a range of positive and negative integers centered on or near 0. By assuming a fixed binary or radix point, this format allows the representation of numbers with a fractional component as well. This approach has limitations. Very large numbers cannot be represented, nor can very small fractions. Furthermore, the fractional part of the quotient in a division of two large numbers could be lost.
전형적인 32-비트 부동-소수점 형식 The principles used in representing binary floating-point numbers are best explained with an example. Figure 10.18a shows a typical 32-bit floating-point format. The leftmost bit stores the sign of the number (0 = positive, 1 = negative). The exponent value is stored in the next 8 bits. The representation used is known as a biased representation. A fixed value, called the bias, is subtracted from the field to get the true exponent value. Typically, the bias equals (2k-1 - 1), where k is the number of bits in the binary exponent. In this case, the 8-bit field yields the numbers 0 through 255. With a bias of 127 (27 - 1), the true exponent values are in the range -127 to +128. In this example, the base is assumed to be 2.
부동-소수점 가수(Significand) 단어의 마지막 부분 부동-소수점 수는 여러 가지 방법으로 표현될 수 있다 아래의 표현들은 모두 같으며, 가수는 2진수 형태로 표현되어 있다: 0.110 * 25 110 * 22 0.0110 * 26 Table 10.2 shows the biased representation for 4-bit integers. Note that when the bits of a biased representation are treated as unsigned integers, the relative magnitudes of the numbers do not change. For example, in both biased and unsigned representations, the largest number is 1111 and the smallest number is 0000. This is not true of sign-magnitude or twos complement representation. An advantage of biased representation is that nonnegative floating-point numbers can be treated as integers for comparison purposes. The final portion of the word (23 bits in this case) is the significand. Any floating-point number can be expressed in many ways. To simplify operations on floating-point numbers, it is typically required that they be normalized. A normal number is one in which the most significant digit of the significand is nonzero. For base 2 representation, a normal number is therefore one in which the most significant bit of the significand is one.
정규화된 수(normalized number): 가수의 가장 중요한 숫자(most significant digit)가 영이 아닌 값(nonzero)으로 표현: ± 1.bbb...b × 2±E 위의 표현에서 b는 2진수 0 혹은 1 중의 하나임
표현 가능한 수의 범위 Figure 10.19 indicates the range of numbers that can be represented in a 32-bit word. Using twos complement integer representation, all of the integers from -231 to 231 - 1 can be represented, for a total of 232 different numbers. With the example floating-point format of Figure 10.18, the following ranges of numbers are possible: • Negative numbers between -(2 - 2-23) * 2128 and -2-127 • Positive numbers between 2-127 and (2 - 2-23) * 2128 Five regions on the number line are not included in these ranges: • Negative numbers less than -(2 - 2-23) * 2128, called negative overflow • Negative numbers greater than 2-127, called negative underflow • Zero • Positive numbers less than 2-127, called positive underflow • Positive numbers greater than (2 - 2-23) * 2128, called positive overflow The representation as presented will not accommodate a value of 0. However, as we shall see, actual floating-point representations include a special bit pattern to designate zero. Overflow occurs when an arithmetic operation results in an absolute value greater than can be expressed with an exponent of 128 (e.g., 2120 * 2100 = 2220). Underflow occurs when the fractional magnitude is too small (e.g., 2-120 * 2-100 = 2-220). Underflow is a less serious problem because the result can generally be satisfactorily approximated by 0. It is important to note that we are not representing more individual values with floating-point notation. The maximum number of different values that can be represented with 32 bits is still 232. What we have done is to spread those numbers out in two ranges, one positive and one negative. In practice, most floating-point numbers that one would wish to represent are represented only approximately. However, for moderate sized integers, the representation is exact.
부동-소수점 수들의 밀도 Also, note that the numbers represented in floating-point notation are not spaced evenly along the number line, as are fixed-point numbers. The possible values get closer together near the origin and farther apart as you move away, as shown in Figure 10.20. This is one of the trade-offs of floating-point math: Many calculations produce results that are not exact and have to be rounded to the nearest value that the notation can represent.
IEEE 표준 754 가장 중요한 부동-소수점 표현이 정의됨 이 표준은 프로세서들 간에 프로그램들의 이식성(portability)을 향상시키는 것과 복잡한 수리-중심 프로그램들(numerically-oriented programs)의 개발이 용이하게 해주기 위하여 개발됨 이 표준은 널리 채택되어 왔으며, 최근에 개발된 거의 모든 프로세서들과 산술 보조프로세서들에서 사용되고 있음 IEEE 754-2008 은 2진 및 10진 부동-소수점 표현을 모두 다루고 있음 The most important floating-point representation is defined in IEEE Standard 754, adopted in 1985 and revised in 2008. This standard was developed to facilitate the portability of programs from one processor to another and to encourage the development of sophisticated, numerically oriented programs. The standard has been widely adopted and is used on virtually all contemporary processors and arithmetic coprocessors. IEEE 754-2008 covers both binary and decimal floating-point representations. In this chapter, we deal only with binary representations.
IEEE 754-2008 아래와 같은 여러 가지 형태의 부동-소수점 형식들을 정의: 산술적 형식(Arithmetic format): 표준에 의해 정의된 모든 필수적인 연산들이 이 형식에 의해 지원된다. 이 형식은 표준에서 설명된 부동-소수점 오퍼랜드 혹은 연산의 결과들을 표현하는 데 사용될 수도 한다. 기본 형식(Basic format): 이 형식은 다섯 가지 부동-소수점 표현들을 다루고 있는데, 세 가지의 2진 표현과 두 가지의 10진 표현이며, 엔코딩(encoding)은 표준에 의해 지정되며, 산술을 위해 사용될 수 있다. 어떤 구현에서든 기본 형식들 중에서 적어도 한 가지는 구현된다. 상호교환 형식(Interchange format): 서로 다른 플랫폼들 간에 데이터 상호교환을 허용하고, 저장을 위해 사용될 수 있는, 완전히 규격화된 고정-길이의 2진 엔코딩이다. IEEE 754-2008 defines the following different types of floating-point formats: • Arithmetic format: All the mandatory operations defined by the standard are supported by the format. The format may be used to represent floating-point operands or results for the operations described in the standard. • Basic format: This format covers five floating-point representations, three binary and two decimal, whose encodings are specified by the standard, and which can be used for arithmetic. At least one of the basic formats is implemented in any conforming implementation. • Interchange format: A fully specified, fixed-length binary encoding that allows data interchange between different platforms and that can be used for storage.
IEEE 754 Formats The three basic binary formats have bit lengths of 32, 64, and 128 bits, with exponents of 8, 11, and 15 bits, respectively (Figure 10.21).
표 10.3 IEEE 754 형식의 파라미터들 Table 10.3 summarizes the characteristics of the three formats. The two basic decimal formats have bit lengths of 64 and 128 bits. All of the basic formats are also arithmetic format types (can be used for arithmetic operations) and interchange format types (platform independent). * not including implied bit and not including sign bit
추가적인 형식들 확장된 정밀도 형식 (Extended Precision Formats) 확장가능한 정밀도 형식 (Extendable Precision Format) 지수에 추가적인 비트들을 제공하고 (확장된 범위), 가수에도 추가적인 비트들을 제공(확장된 정밀도) 확장된 형식들은 과도한 반올림 오류(roundoff error)에 의해 최종 결과가 손상될 가능성을 줄여준다 범위가 더 커짐에 따라, 최종 결과값이 기본 형식에서 표현될 수 있는 계산을 중간 오버플로우(intermediate overflow)가 중단시키게 될 가능성을 줄여준다 정밀도가 높아질 때 항상 나타나는 시간 페널티(time penalty)를 발생시키지 않고도 더 큰 기본 형식의 이점을 누릴 수 있게 해준다 정밀도 및 범위가 사용자 제어 하에서 정의된다 이 형식들은 중간 계산을 위해 사용될 수도 있지만, 표준은 형식이나 길이에 어떠한 제한도 두지 않음 In addition, the standard defines extended precision formats, which extend a supported basic format by providing additional bits in the exponent (extended range) and in the significand (extended precision). The exact format is implementation dependent, but the standard places certain constraints on the length of the exponent and significand. These formats are arithmetic format types but not interchange format types. The extended formats are to be used for intermediate calculations. With their greater precision, the extended formats lessen the chance of a final result that has been contaminated by excessive round-off error; with their greater range, they also lessen the chance of an intermediate overflow aborting a computation whose final result would have been representable in a basic format. An additional motivation for the extended format is that it affords some of the benefits of a larger basic format without incurring the time penalty usually associated with higher precision. Finally, IEEE 754-2008 defines an extendable precision format as a format with a precision and range that are defined under user control. Again, these formats may be used for intermediate calculations, but the standard places no constraint or format or length.
표 10.4 IEEE 형식들 Table 10.4 IEEE Formats Table 10.4 shows the relationship between defined formats and format types. Table 10.4 IEEE Formats
IEEE 754 부동-소수점 수들의 해석 (a) 2진 32 형식 Not all bit patterns in the IEEE formats are interpreted in the usual way; instead, some bit patterns are used to represent special values. Table 10.5 indicates the values assigned to various bit patterns. The exponent values of all zeros (0 bits) and all ones (1 bits) define special values. The following classes of numbers are represented: • For exponent values in the range of 1 through 254 for 32-bit format, 1 through 2046 for 64-bit format, and 1 through 16382, normal nonzero floating-point numbers are represented. The exponent is biased, so that the range of exponents is -126 through +127 for 32-bit format, and so on. A normal number requires a 1 bit to the left of the binary point; this bit is implied, giving an effective 24-bit, 53-bit, or 113-bit significand. Because one of the bits is implied, the corresponding field in the binary format is referred to as the trailing significand field. • An exponent of zero together with a fraction of zero represents positive or negative zero, depending on the sign bit. As was mentioned, it is useful to have an exact value of 0 represented. • An exponent of all ones together with a fraction of zero represents positive or negative infinity, depending on the sign bit. It is also useful to have a representation of infinity. This leaves it up to the user to decide whether to treat overflow as an error condition or to carry the value and proceed with whatever program is being executed. • An exponent of zero together with a nonzero fraction represents a subnormal number. In this case, the bit to the left of the binary point is zero and the true exponent is -126 or -1022. The number is positive or negative depending on the sign bit. • An exponent of all ones together with a nonzero fraction is given the value NaN, which means Not a Number, and is used to signal various exception conditions. The significance of subnormal numbers and NaNs is discussed in Section 10.5. Table 10.5 Interpretation of IEEE 754 Floating-Point Numbers (page 1 of 3)
IEEE 754 부동-소수점 수들의 해석 (b) 2진 64 형식 Table 10.5 continued. Table 10.5 Interpretation of IEEE 754 Floating-Point Numbers (page 2 of 3)
IEEE 754 부동-소수점 수들의 해석 (c) 2진 64 형식 Table 10.5 continued. Table 10.5 Interpretation of IEEE 754 Floating-Point Numbers (page 3 of 3)
표 10.6 부동-소수점 수 및 산술 연산들 Table 10.6 summarizes the basic operations for floating-point arithmetic. For addition and subtraction, it is necessary to ensure that both operands have the same exponent value. This may require shifting the radix point on one of the operands to achieve alignment. Multiplication and division are more straightforward. A floating-point operation may produce one of these conditions: • Exponent overflow: A positive exponent exceeds the maximum possible exponent value. In some systems, this may be designated as + ∞ or - ∞. • Exponent underflow: A negative exponent is less than the minimum possible exponent value (e.g., -200 is less than -127). This means that the number is too small to be represented, and it may be reported as 0. • Significand underflow: In the process of aligning significands, digits may flow off the right end of the significand. As we shall discuss, some form of rounding is required. • Significand overflow: The addition of two significands of the same sign may result in a carry out of the most significant bit. This can be fixed by realignment, as we shall explain.
부동-소수점 덧셈과 뺄셈 In floating-point arithmetic, addition and subtraction are more complex than multiplication and division. This is because of the need for alignment. There are four basic phases of the algorithm for addition and subtraction: 1. Check for zeros. 2. Align the significands. 3. Add or subtract the significands. 4. Normalize the result. A typical flowchart is shown in Figure 10.22. A step-by-step narrative highlights the main functions required for floating-point addition and subtraction. We assume a format similar to those of Figure 10.21. For the addition or subtraction operation, the two operands must be transferred to registers that will be used by the ALU. If the floating-point format includes an implicit significand bit, that bit must be made explicit for the operation. Phase 1: Zero check. Because addition and subtraction are identical except for a sign change, the process begins by changing the sign of the subtrahend if it is a subtract operation. Next, if either operand is 0, the other is reported as the result. Phase 2: Significand alignment. The next phase is to manipulate the numbers so that the two exponents are equal. Alignment may be achieved by shifting either the smaller number to the right (increasing its exponent) or shifting the larger number to the left. Because either operation may result in the loss of digits, it is the smaller number that is shifted; any digits that are lost are therefore of relatively small significance. The alignment is achieved by repeatedly shifting the magnitude portion of the significand right 1 digit and incrementing the exponent until the two exponents are equal. (Note that if the implied base is 16, a shift of 1 digit is a shift of 4 bits.) If this process results in a 0 value for the significand, then the other number is reported as the result. Thus, if two numbers have exponents that differ significantly, the lesser number is lost. Phase 3: Addition. Next, the two significands are added together, taking into account their signs. Because the signs may differ, the result may be 0. There is also the possibility of significand overflow by 1 digit. If so, the significand of the result is shifted right and the exponent is incremented. An exponent overflow could occur as a result; this would be reported and the operation halted. Phase 4: Normalization. The final phase normalizes the result. Normalization consists of shifting significand digits left until the most significant digit (bit, or 4 bits for base-16 exponent) is nonzero. Each shift causes a decrement of the exponent and thus could cause an exponent underflow. Finally, the result must be rounded off and then reported. We defer a discussion of rounding until after a discussion of multiplication and division.
부동-소수점 곱셈 Floating-point multiplication and division are much simpler processes than addition and subtraction, as the following discussion indicates. We first consider multiplication, illustrated in Figure 10.23. First, if either operand is 0, 0 is reported as the result. The next step is to add the exponents. If the exponents are stored in biased form, the exponent sum would have doubled the bias. Thus, the bias value must be subtracted from the sum. The result could be either an exponent overflow or underflow, which would be reported, ending the algorithm. If the exponent of the product is within the proper range, the next step is to multiply the significands, taking into account their signs. The multiplication is performed in the same way as for integers. In this case, we are dealing with a sign-magnitude representation, but the details are similar to those for twos complement representation. The product will be double the length of the multiplier and multiplicand. The extra bits will be lost during rounding. After the product is calculated, the result is then normalized and rounded, as was done for addition and subtraction. Note that normalization could result in exponent underflow.
부동-소수점 나눗셈 Finally, let us consider the flowchart for division depicted in Figure 10.24. Again, the first step is testing for 0. If the divisor is 0, an error report is issued, or the result is set to infinity, depending on the implementation. A dividend of 0 results in 0. Next, the divisor exponent is subtracted from the dividend exponent. This removes the bias, which must be added back in. Tests are then made for exponent underflow or overflow. The next step is to divide the significands. This is followed with the usual normalization and rounding.
정밀도 문제 보호 비트들 We mentioned that, prior to a floating-point operation, the exponent and significand of each operand are loaded into ALU registers. In the case of the significand, the length of the register is almost always greater than the length of the significand plus an implied bit. The register contains additional bits, called guard bits, which are used to pad out the right end of the significand with 0s.
정밀도 문제 반올림(Rounding) IEEE 표준에서의 방식들: 가장 가까운 값으로 반올림(round to nearest): 결과값이 가장 가까운 표현가능한 수(nearest representable number)로 반올림 된다 +∞ 방향으로 반올림(round toward +∞): 결과값이 양의 무한대 값으로 반올림 된다. -∞ 방향으로 반내림(round toward -∞): 결과값이 음의 무한대 값으로 반내림(round down) 된다. 0의 방향으로 반올림(round toward 0): 결과값이 0의 방향으로 반올림 된다. Another detail that affects the precision of the result is the rounding policy. The result of any operation on the significands is generally stored in a longer register. When the result is put back into the floating-point format, the extra bits must be eliminated in such a way as to produce a result that is close to the exact result. This process is called rounding. A number of techniques have been explored for performing rounding. In fact, the IEEE standard lists four alternative approaches: • Round to nearest: The result is rounded to the nearest representable number. • Round toward H: The result is rounded up toward plus infinity. • Round toward +∞: The result is rounded down toward negative infinity. • Round toward -∞: The result is rounded toward zero. Let us consider each of these policies in turn. Round to nearest is the default rounding mode listed in the standard and is defined as follows: The representable value nearest to the infinitely precise result shall be delivered. The standard also addresses the special case of extra bits of the form 10000.… Here the result is exactly halfway between the two possible representable values. One possible technique here would be to always truncate, as this would be the simplest operation. However, the difficulty with this simple approach is that it introduces a small but cumulative bias into a sequence of computations. What is required is an unbiased method of rounding. One possible approach would be to round up or down on the basis of a random number so that, on average, the result would be unbiased. The argument against this approach is that it does not produce predictable, deterministic results. The approach taken by the IEEE standard is to force the result to be even: If the result of a computation is exactly midway between two representable numbers, the value is rounded up if the last representable bit is currently 1 and not rounded up if it is currently 0.
간격 산술(Interval Arithmetic ) 각 결과에 대하여 두 개의 값들을 생성함으로써 부동-소수점 계산에서 발생하는 오류를 모니터링하고 제어하는데 효과적으로 사용될 수 있는 방법이다. 그 두 값들은 옳은 값(true result)을 포함하는 간격의 하한값(lower endpoint)과 상한값(upper endpoint)에 해당한다. 하한값과 상한값 간의 차이인 간격의 폭이 결과값의 정확도를 나타낸다. 만약 그 간격의 끝점(endpoint)들이 표현 가능하지 않는 값들이라면, 간격 끝점들은 각각 반올림되거나 반내림 된다. 비록 그 간격의 폭은 구현에 따라 달라지지만, 많은 알고리즘들은 좁은 간격을 생성하도록 설계되어 왔다. 만약 상한값과 하한값 사이의 간격이 충분히 좁다면, 충분히 정확한 결과값을 얻을 수 있게 될 것이다. 양과 음의 무한대 값으로 반올림(rounding to plus and minus infinity)하는 방법은 간격 산술(interval arithmetic)이라고 알려진 기술을 구현하는 데 유용하게 사용될 수 있다 버림(Truncation) 0으로 반올림(round toward zero) 추가 비트들이 무시 가장 간단한 기술 연산을 수행할수록 바이어스 값은 계속하여 0으로 접근 이 바이어스는 0이 아닌 추가 비트들(nonzero extra bits)을 가진 모든 연산들에 영향을 주기 때문에 심각한 바이어스이다. The next two options, rounding to plus and minus infinity, are useful in implementing a technique known as interval arithmetic. Interval arithmetic provides an efficient method for monitoring and controlling errors in floating-point computations by producing two values for each result. The two values correspond to the lower and upper endpoints of an interval that contains the true result. The width of the interval, which is the difference between the upper and lower endpoints, indicates the accuracy of the result. If the endpoints of an interval are not representable, then the interval endpoints are rounded down and up, respectively. Although the width of the interval may vary according to implementation, many algorithms have been designed to produce narrow intervals. If the range between the upper and lower bounds is sufficiently narrow, then a sufficiently accurate result has been obtained. If not, at least we know this and can perform additional analysis. The final technique specified in the standard is round toward zero. This is, in fact, simple truncation: The extra bits are ignored. This is certainly the simplest technique. However, the result is that the magnitude of the truncated value is always less than or equal to the more precise original value, introducing a consistent bias toward zero in the operation. This is a serious bias because it affects every operation for which there are nonzero extra bits.
무한대(Infinity) 2진 부동-소수점 산술을 위한 IEEE 표준 실수 산술(real arithmetic)을 제한하는 경우로서 취급되며, 무한대 값은 다음과 같이 해석된다: - ∞ < (모든 유한 수) < + ∞ 예: 5 + (+ ∞ ) = + ∞ 5÷ (+ ∞ ) = +0 5 - (+ ∞ ) = - ∞ (+ ∞ ) + (+ ∞ ) = + ∞ 5 + (- ∞ ) = - ∞ (- ∞ ) + (- ∞) = - ∞ 5 - (- ∞ ) = + ∞ (- ∞ ) - (+ ∞ ) = - ∞ 5 * (+ ∞ ) = + ∞ (+ ∞ ) - (- ∞ ) = + ∞ IEEE 754 goes beyond the simple definition of a format to lay down specific practices and procedures so that floating-point arithmetic produces uniform, predictable results independent of the hardware platform. One aspect of this has already been discussed, namely rounding. This subsection looks at three other topics: infinity, NaNs, and subnormal numbers. Infinity arithmetic is treated as the limiting case of real arithmetic, with the infinity values given the following interpretation: - ∞ < (every finite number) < + ∞ With the exception of the special cases discussed subsequently, any arithmetic operation involving infinity yields the obvious result.
Quiet 및 Signaling NaNs 2진 부동-소수점 산술을 위한 IEEE 표준 표 10.7 Signaling NaN은 그것이 오퍼랜드로 나타날 때마다 무효 연산 예외(invalid operation exception) 신호를 보낸다 Quiet NaN은 예외 신호 없이 거의 모든 산술 연산을 통하여 전달된다 A NaN is a symbolic entity encoded in floating-point format, of which there are two types: signaling and quiet. A signaling NaN signals an invalid operation exception whenever it appears as an operand. Signaling NaNs afford values for uninitialized variables and arithmetic-like enhancements that are not the subject of the standard. A quiet NaN propagates through almost every arithmetic operation without signaling an exception. Table 10.7 indicates operations that will produce a quiet NaN. Note that both types of NaNs have the same general format (Table 10.4): an exponent of all ones and a nonzero fraction. The actual bit pattern of the nonzero fraction is implementation dependent; the fraction values can be used to distinguish quiet NaNs from signaling NaNs and to specify particular exception conditions. 표 10.7 quiet NaN을 산출하는 연산들
비정규화 수(Subnormal Numbers) 2진 부동-소수점 산술을 위한 IEEE 표준 비정규화 수(Subnormal Numbers) Subnormal numbers are included in IEEE 754 to handle cases of exponent underflow. When the exponent of the result becomes too small (a negative exponent with too large a magnitude), the result is subnormalized by right shifting the fraction and incrementing the exponent for each shift until the exponent is within a representable range. Figure 10.26 illustrates the effect of including subnormal numbers. The representable numbers can be grouped into intervals of the form [2n, 2n+1]. Within each such interval, the exponent portion of the number remains constant while the fraction varies, producing a uniform spacing of representable numbers within the interval. As we get closer to zero, each successive interval is half the width of the preceding interval but contains the same number of representable numbers. Hence the density of representable numbers increases as we approach zero. However, if only normal numbers are used, there is a gap between the smallest normal number and 0. In the case of the 32-bit IEEE 754 format, there are 223 representable numbers in each interval, and the smallest representable positive number is 2-126. With the addition of subnormal numbers, an additional 223 - 1 numbers are uniformly added between 0 and 2-126. The use of subnormal numbers is referred to as gradual underflow [COON81]. Without subnormal numbers, the gap between the smallest representable nonzero number and zero is much wider than the gap between the smallest representable nonzero number and the next larger number. Gradual underflow fills in that gap and reduces the impact of exponent underflow to a level comparable with roundoff among the normal numbers.
Summary Chapter 10 Computer Arithmetic Integer arithmetic Negation Addition and subtraction Multiplication Division Floating-point arithmetic Multiplication and division Precision consideration IEEE standard for binary floating-point arithmetic ALU Integer representation Sign-magnitude representation Twos complement representation Range extension Fixed-point representation Floating-point representation Principles IEEE standard for binary floating-point representation Chapter 10 summary.