12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091 |
- Long double format
- ==================
- Each long double is made up of two IEEE doubles. The value of the
- long double is the sum of the values of the two parts (except for
- -0.0). The most significant part is required to be the value of the
- long double rounded to the nearest double, as specified by IEEE. For
- Inf values, the least significant part is required to be one of +0.0
- or -0.0. No other requirements are made; so, for example, 1.0 may be
- represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
- is don't-care.
- Classification
- --------------
- A long double can represent any value of the form
- s * 2^e * sum(k=0...105: f_k * 2^(-k))
- where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
- 1, and f_k for k>0 is 0 or 1. These are the 'normal' long doubles.
- A long double can also represent any value of the form
- s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
- where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1. These are
- the 'subnormal' long doubles.
- There are four long doubles that represent zero, two that represent
- +0.0 and two that represent -0.0. The sign of the high part is the
- sign of the long double, and the sign of the low part is ignored.
- Likewise, there are four long doubles that represent infinities, two
- for +Inf and two for -Inf.
- Each NaN, quiet or signalling, that can be represented as a 'double'
- can be represented as a 'long double'. In fact, there are 2^64
- equivalent representations for each one.
- There are certain other valid long doubles where both parts are
- nonzero but the low part represents a value which has a bit set below
- 2^(e-105). These, together with the subnormal long doubles, make up
- the denormal long doubles.
- Many possible long double bit patterns are not valid long doubles.
- These do not represent any value.
- Limits
- ------
- The maximum representable long double is 2^1024-2^918. The smallest
- *normal* positive long double is 2^-968. The smallest denormalised
- positive long double is 2^-1074 (this is the same as for 'double').
- Conversions
- -----------
- A double can be converted to a long double by adding a zero low part.
- A long double can be converted to a double by removing the low part.
- Comparisons
- -----------
- Two long doubles can be compared by comparing the high parts, and if
- those compare equal, comparing the low parts.
- Arithmetic
- ----------
- The unary negate operation operates by negating the low and high parts.
- An absolute or absolute-negate operation must be done by comparing
- against zero and negating if necessary.
- Addition and subtraction are performed using library routines. They
- are not at present performed perfectly accurately, the result produced
- will be within 1ulp of the range generated by adding or subtracting
- 1ulp from the input values, where a 'ulp' is 2^(e-106) given the
- exponent 'e'. In the presence of cancellation, this may be
- arbitrarily inaccurate. Subtraction is done by negation and addition.
- Multiplication is also performed using a library routine. Its result
- will be within 2ulp of the correct result.
- Division is also performed using a library routine. Its result will
- be within 3ulp of the correct result.
- Copyright (C) 2004-2022 Free Software Foundation, Inc.
- Copying and distribution of this file, with or without modification,
- are permitted in any medium without royalty provided the copyright
- notice and this notice are preserved.
|