C中的双/浮点转换答案

【问题标题】：double/float conversion in CC中的双/浮点转换
【发布时间】：2024-01-08 17:54:01
【问题描述】：

我有这个代码

#define Third (1.0/3.0)
#define ThirdFloat (1.0f/3.0f)
int main()
{
    double a=1/3;
    double b=1.0/3.0;
    double c=1.0f/3.0f;
    printf("a = %20.15lf, b = %20.15lf, c = %20.15lf\n", a,b,c);
    float d=1/3;
    float e=1.0/3.0;
    float f=1.0f/3.0f;
    printf("d = %20.15f, e = %20.15f, f = %20.15f\n", d,e,f);

    double g=Third*3.0;
    double h=ThirdFloat*3.0;
    float i=ThirdFloat*3.0f;
    printf("(1/3)*3: g = %20.15lf; h = %20.15lf, i = %20.15f\n", g, h, i);
}

哪个给出了输出

a =    0.000000000000000, b =    0.333333333333333, c =    0.333333343267441
d =    0.000000000000000, e =    0.333333343267441, f =    0.333333343267441
(1/3)*3: g =    1.000000000000000; h =    1.000000029802322, i =    1.000000000000000

我假设a 和d 的输出看起来像这样，因为编译器在除法后将整数值转换为浮点数。 b 看起来不错，e 是错误的，因为 float 精度低，如 c 和 f。

但我不知道为什么g 有正确的值（我认为1.0/3.0 = 1.0lf/3.0lf，但后来i 应该是错误的）以及为什么h 与i 不同。

【问题讨论】：

对于g...你没有注意到它是我的三倍吗？
所以基本上你在问为什么(1.0 / 3.0) * 3.0 是1？为什么(1.0f / 3.0f) * 3.0 没有输出1？
Re h 与 i：因为 3.0 是双精度，3.0f 是浮点数
编译器往往非常擅长称为constant folding 的东西，其中编译时常量表达式在编译时进行评估。此外，优秀的编译器也可以很好地检测g、h 和i 所示的情况，其中除以 3.0 后乘以 3.0 相互抵消。
关于常量折叠和计算，即使没有启用优化，GCC 10.2 也会将您的所有计算转换为常量值，如程序集here 中所示（查看最后的常量）。跨度>

标签： c double

【解决方案1】：

让我们先仔细看看：使用"%.17e"（近似十进制）和"%a"（精确）。

#define Third (1.0/3.0)
#define ThirdFloat (1.0f/3.0f)
#define FMT "%.17e, %a"
int main(void) {
    double a=1/3;
    double b=1.0/3.0;
    double c=1.0f/3.0f;
    printf("a = " FMT "\n", a,a);
    printf("b = " FMT "\n", b,b);
    printf("c = " FMT "\n", c,c);
    puts("");
    float d=1/3;
    float e=1.0/3.0;
    float f=1.0f/3.0f;
    printf("d = " FMT "\n", d,d);
    printf("e = " FMT "\n", e,e);
    printf("f = " FMT "\n", f,f);
    puts("");
    double g=Third*3.0;
    double h=ThirdFloat*3.0;
    float i=ThirdFloat*3.0f;
    printf("g = " FMT "\n", g,g);
    printf("h = " FMT "\n", h,h);
    printf("i = " FMT "\n", i,i);
}

输出

a = 0.00000000000000000e+00, 0x0p+0
b = 3.33333333333333315e-01, 0x1.5555555555555p-2
c = 3.33333343267440796e-01, 0x1.555556p-2

d = 0.00000000000000000e+00, 0x0p+0
e = 3.33333343267440796e-01, 0x1.555556p-2
f = 3.33333343267440796e-01, 0x1.555556p-2

g = 1.00000000000000000e+00, 0x1p+0
h = 1.00000002980232239e+00, 0x1.0000008p+0
i = 1.00000000000000000e+00, 0x1p+0

但我不知道为什么 g 有正确的值

(1.0/3.0)*3.0 可以在编译器或运行时评估为 double，四舍五入 结果正好是 1.0。
(1.0/3.0)*3.0 可以在编译器或运行时使用比double 更宽的数学计算，四舍五入的结果正好是 1.0。研究FLT_EVAL_METHOD.

以及为什么 h 与 i 不同。

(1.0f/3.0f) 可以使用float 数学来形成float 商，这明显不同于三分之一：0.333333343267....最终的*3.0 与1.0 的不同并不令人惊讶。

输出都是正确的。我们需要看看为什么期望不正确。

OP 进一步问道：“为什么 h (float * double) 不如 i (float * float) 准确？”

两者都以0.333333343267... * 3.0 开头，而不是one-third * 3.0。
float * double更准确。两者都形成一个产品，但 float * float 是一个 float 产品 四舍五入 到 2²⁴ 中最接近的 1 部分，而更准确的 float * double 产品是 @987654344 @ 和 四舍五入 到 2⁵³ 中最接近的 1 部分。 float * float 舍入为 1.0000000，而 float * double 舍入为 1.0000000298...

【讨论】：

为什么 h ( float * double ) 不如 i ( float * float ) 准确？是不是因为编译器将 (1.0f/3.0f) * 3.0f 计算为 1，所以和 g 中的一样？
我试过这个double h=ThirdFloat*3.0; float i=Third*3.0f; ，因为我认为这会导致同样的错误（h 是 float * double，i 是 double * float），但我似乎是正确的。
也就是说h中的值是从哪里来的？

【解决方案2】：

但我不知道为什么 g 有正确的值（我认为 1.0/3.0 = 1.0lf/3.0lf

G 正是它应该基于的值：

#define Third (1.0/3.0)    
...
double g=Third*3.0;

这是 g=(1.0/3.0)*3.0;
这是1.000000000000000（打印时使用"%20.15lf"）

【讨论】：

我认为计算机不能使用像 1.0/3.0 这样的精确值，所以他们将其近似为 0.3333。这怎么可能让计算机使用精确值？
有些值可以用精确的术语表示，但是a better answer for that question is here.
我了解计算机如何看待数字，问题在于精度。我的意思是 1.0/3.0 恰好是 1/3，这就是为什么 (1.0/3.0)*3.0 = 1.000000 （存储的值不是近似值，而是提名者和分母），或者它具有非常高的精度，但不是特别相同（如 1.000000000000000000000000000000135235）。
@Pulpit - 你是否能够跟上 cmets，尤其是 Chux，以及我在上面的评论中提供的链接（第 2 次）。他们正在非常详细地解决您的问题。变量类型的精度是最简单的限制因素。这就是为什么当查看float（32 位）时，一个值可能会显示为精确值，但是当使用double 查看时，变化会显示在小数部分中，如您所示。 6 位数字不足以超过 float 显示 (1.0/3.0)*3.0. As per Chux's suggestion, try it with "%.17e"` 的接近精确描述的能力
我看到相同的数字可以用更低（浮点）或更高（双）精度来近似。从 float 到 double 的转换显示了 float 的局限性。这是可以理解的。但是为什么float * double（如g）给出的精度低于float * float（如h）。它不能来自浮动的限制。是指转换过程吗？

【解决方案3】：

我想我得到了答案。

#define Third (1.0/3.0)
#define ThirdFloat (1.0f/3.0f)


    printf("%20.15f, %20.15lf\n", ThirdFloat*3.0, ThirdFloat*3.0);//float*double
    printf("%20.15f, %20.15lf\n", ThirdFloat*3.0f, ThirdFloat*3.0f);//float*float
    printf("%20.15f, %20.15lf\n", Third*3.0, Third*3.0);//double*double
    printf("%20.15f, %20.15lf\n\n", Third*3.0f, Third*3.0f);//float*float

    printf("%20.15f, %20.15lf\n", Third, Third);
    printf("%20.15f, %20.15lf\n", ThirdFloat, ThirdFloat);
    printf("%20.15f, %20.15lf\n", 3.0, 3.0);
    printf("%20.15f, %20.15lf\n", 3.0f, 3.0f);

然后输出：

   1.000000029802322,    1.000000029802322
   1.000000000000000,    1.000000000000000
   1.000000000000000,    1.000000000000000
   1.000000000000000,    1.000000000000000

   0.333333333333333,    0.333333333333333
   0.333333343267441,    0.333333343267441
   3.000000000000000,    3.000000000000000
   3.000000000000000,    3.000000000000000

由于浮点数的限制，第一行并不准确。常量ThirdFloat 的精度非常低，因此当乘以double 时，编译器会采用这个非常糟糕的近似值（0.333333343267441），将其转换为double 并乘以double 给出的3.0，这样就得到了结果也是错误的 (1.000000029802322)。

但是如果ThirdFloat，即float，乘以3.0f，也就是float，编译器可以通过取1/3的精确值并乘以3来避免近似，这就是为什么我得到了准确的结果。

【讨论】：