【发布时间】:2014-12-28 14:20:21
【问题描述】:
我从Computer Systems: A Programmer's Perspective一书中了解到,IEEE标准要求双精度浮点数使用以下64位二进制格式表示:
- s: 1 位符号
- exp:11 位的指数
- frac:分数为 52 位
+infinity 表示为具有以下模式的特殊值:
- s = 0
- 所有 exp 位都是 1
- 所有小数位均为 0
我认为 double 的完整 64 位应按以下顺序排列:
(s)(exp)(frac)
所以我写了下面的C代码来验证一下:
//Check the infinity
double x1 = (double)0x7ff0000000000000; // This should be the +infinity
double x2 = (double)0x7ff0000000000001; // Note the extra ending 1, x2 should be NaN
printf("\nx1 = %f, x2 = %f sizeof(double) = %d", x1,x2, sizeof(x2));
if (x1 == x2)
printf("\nx1 == x2");
else
printf("\nx1 != x2");
但结果是:
x1 = 9218868437227405300.000000, x2 = 9218868437227405300.000000 sizeof(double) = 8
x1 == x2
为什么这个数字是一个有效数字而不是某个无穷大的错误?
为什么 x1==x2?
(我使用的是 MinGW GCC 编译器。)
添加 1
我修改了如下代码,并成功验证了 Infinity 和 NaN。
//Check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 =* ((double *)(&x3));
printf("\nsizeof(long long) = %d", sizeof(x1));
printf("\nx1 = %f, x2 = %f, x3 = %f", x1, x2, x3); // %f is good enough for output
printf("\ny1 = %f, y2 = %f, y3 = %f", y1, y2, y3);
结果是:
sizeof(long long) = 8
x1 = 1.#INF00, x2 = -1.#INF00, x3 = 1.#SNAN0
y1 = 1.#INF00, y2 = -1.#INF00, y3 = 1.#QNAN0
详细输出看起来有点奇怪,但我认为重点很清楚。
PS.: 看来指针转换是没有必要的。只需使用%f 告诉printf 函数以double 格式解释unsigned long long 变量。
添加 2
出于好奇,我使用以下代码检查了变量的位表示。
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, int len)
{
int i;
for (i = len-1; i>=0; i--)
{
printf("%.2x", start[i]);
}
printf("\n");
}
我尝试了下面的代码:
//check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 = *((double *)(&x3));
unsigned long long x4 = x1 + x2; // I want to check (+infinity)+(-infinity)
double y4 = y1 + y2; // I want to check (+infinity)+(-infinity)
printf("\nx1: ");
show_bytes((byte_pointer)&x1, sizeof(x1));
printf("\nx2: ");
show_bytes((byte_pointer)&x2, sizeof(x2));
printf("\nx3: ");
show_bytes((byte_pointer)&x3, sizeof(x3));
printf("\nx4: ");
show_bytes((byte_pointer)&x4, sizeof(x4));
printf("\ny1: ");
show_bytes((byte_pointer)&y1, sizeof(y1));
printf("\ny2: ");
show_bytes((byte_pointer)&y2, sizeof(y2));
printf("\ny3: ");
show_bytes((byte_pointer)&y3, sizeof(y3));
printf("\ny4: ");
show_bytes((byte_pointer)&y4, sizeof(y4));
输出是:
x1: 7ff0000000000000
x2: fff0000000000000
x3: 7ff0000000000001
x4: 7fe0000000000000
y1: 7ff0000000000000
y2: fff0000000000000
y3: 7ff8000000000001
y4: fff8000000000000 // <== Different with x4
奇怪的是,虽然 x1 和 x2 具有与 y1 和 y2 相同的位模式,但 x4 和 y4 的和是不同的。
和
printf("\ny4=%f", y4);
给出这个:
y4=-1.#IND00 // What does it mean???
为什么它们不同? y4是怎么得到的?
【问题讨论】:
-
因为您设置的是 值,而不是表示。
-
什么是“无限误差”?
标签: c floating-point