使用 C 预处理获取字符串的整数值答案

【问题标题】：Using C Preprocessing to get integer value of a string使用 C 预处理获取字符串的整数值
【发布时间】：2026-02-16 18:40:01
【问题描述】：

如何创建 C 宏来获取字符串的整数值？具体用例来自问题here。我想像这样更改代码：

enum insn {
    sysenter = (uint64_t)'r' << 56 | (uint64_t)'e' << 48 |
               (uint64_t)'t' << 40 | (uint64_t)'n' << 32 |
               (uint64_t)'e' << 24 | (uint64_t)'s' << 16 |
               (uint64_t)'y' << 8  | (uint64_t)'s',
    mov = (uint64_t)'v' << 16 | (uint64_t)'o' << 8 |
          (uint64_t)'m'
};

到这里：

enum insn {
    sysenter = INSN_TO_ENUM("sysenter"),
    mov      = INSN_TO_ENUM("mov")
};

INSN_TO_ENUM 扩展为相同的代码。性能是一样的，但可读性会大大提高。

我怀疑这种形式可能是不可能的，因为 C 预处理器无法处理字符串，所以这也是一个不受欢迎但可以接受的解决方案（变量参数宏）：

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

【问题讨论】：

此时我将编写自己的预处理器。试图按照自己的方式去做只会让人感到非常痛苦。这是一个用您最喜欢的脚本语言编写的简单脚本。假设您的枚举随着时间的推移是相对静态的（即您不是日复一日地编辑它们），那么我认为这将是方便和可读的。
@DavidHeffernan：我考虑过这一点，但作为最后的手段。如果它可以作为一个宏来完成，我们就具有可读性的优势，并且能够抽象/隐藏这个特定的实现细节。 IMO 的优势超过了通过“预处理”生成的选项。
我不确定宏是否可以处理变量参数，或者它只是接受它来传递它。请参阅我对第一个论点的回答。
@Mike Kwan：这是一种使用枚举的奇怪方式。也许你应该只做enum insn { sysenter, mov }; 并对顺序编号感到满意。你想解决什么问题？
@DavidGrayson：我要解决的问题是这个：*.com/questions/9524342/…

标签： c macros c-preprocessor

【解决方案1】：

这是一个编译时纯 C 解决方案，您表示可以接受。您可能需要将其扩展为更长的助记符。我会继续考虑想要的那个（即INSN_TO_ENUM("sysenter")）。有趣的问题:)

#include <stdio.h>

#define head(h, t...) h
#define tail(h, t...) t

#define A(n, c...) (((long long) (head(c))) << (n)) | B(n + 8, tail(c))
#define B(n, c...) (((long long) (head(c))) << (n)) | C(n + 8, tail(c))
#define C(n, c...) (((long long) (head(c))) << (n)) | D(n + 8, tail(c))
#define D(n, c...) (((long long) (head(c))) << (n)) | E(n + 8, tail(c))
#define E(n, c...) (((long long) (head(c))) << (n)) | F(n + 8, tail(c))
#define F(n, c...) (((long long) (head(c))) << (n)) | G(n + 8, tail(c))
#define G(n, c...) (((long long) (head(c))) << (n)) | H(n + 8, tail(c))
#define H(n, c...) (((long long) (head(c))) << (n)) /* extend here */

#define INSN_TO_ENUM(c...) A(0, c, 0, 0, 0, 0, 0, 0, 0)

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

int main()
{
    printf("sysenter = %llx\nmov = %x\n", sysenter, mov);
    return 0;
}

【讨论】：

这个编译并且看起来它可能正在做正确的事情，但是 GCC 不识别 %Lx 中的 L 并且它希望将 num 视为 32 位 int（可能是因为我m 为 x86 架构编译）。
哎哟！我的错。 %L 用于long double，应该是%llx，已修复。不过printf() 只是用于单元测试:)
我喜欢堆栈溢出，因为我每天都能学到新东西！ head/tail(h, t...) 非常好！我从来不知道您可以将... 粘贴到这样的论点之一
在INSN_TO_ENUM 的定义中，您不必在c 之后写所有这些零，对吗？ A(0, c) 不够吗？
@Shahbaz 这取决于。如果没有尾随0, 0, 0...，宏将起作用（即它会扩展，没有预处理错误），但结果将是((long long) ()) << (0 + 8 + 8 ...) - 里面有()！ - 这不会编译。尾随零导致它变为(0)，这是一个可转换为long long 的有效数字文字。

【解决方案2】：

编辑：这个答案可能会有所帮助，所以我没有删除它，但没有具体回答这个问题。它确实将字符串转换为数字，但不能放在枚举中，因为它不会在编译时计算数字。

好吧，因为你的整数是 64 位的，你只需要担心任何字符串的前 8 个字符。因此，你可以写 8 次，确保你不会超出字符串绑定：

#define GET_NTH_BYTE(x, n)   (sizeof(x) <= n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)      GET_NTH_BYTE(x, 0)\
                            |GET_NTH_BYTE(x, 1)\
                            |GET_NTH_BYTE(x, 2)\
                            |GET_NTH_BYTE(x, 3)\
                            |GET_NTH_BYTE(x, 4)\
                            |GET_NTH_BYTE(x, 5)\
                            |GET_NTH_BYTE(x, 6)\
                            |GET_NTH_BYTE(x, 7)

它所做的基本上是检查每个字节是否在字符串的限制范围内，如果是，则给您相应的字节。

注意：这仅适用于文字字符串。

如果你希望能够转换任何字符串，你可以用它给出字符串的长度：

#define GET_NTH_BYTE(x, n, l)   (l < n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x, l)      GET_NTH_BYTE(x, 0, l)\
                               |GET_NTH_BYTE(x, 1, l)\
                               |GET_NTH_BYTE(x, 2, l)\
                               |GET_NTH_BYTE(x, 3, l)\
                               |GET_NTH_BYTE(x, 4, l)\
                               |GET_NTH_BYTE(x, 5, l)\
                               |GET_NTH_BYTE(x, 6, l)\
                               |GET_NTH_BYTE(x, 7, l)

例如：

int length = strlen(your_string);
int num = INSN_TO_ENUM(your_string, length);

最后，有一种方法可以避免给出长度，但它需要编译器实际从左到右计算INSN_TO_ENUM 的短语。 我不确定这是否是标准的：

static int _nul_seen;
#define GET_NTH_BYTE(x, n)  ((_nul_seen || x[n] == '\0')?(_nul_seen=1)&0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)     (_nul_seen=0)|
                              (GET_NTH_BYTE(x, 0)\
                              |GET_NTH_BYTE(x, 1)\
                              |GET_NTH_BYTE(x, 2)\
                              |GET_NTH_BYTE(x, 3)\
                              |GET_NTH_BYTE(x, 4)\
                              |GET_NTH_BYTE(x, 5)\
                              |GET_NTH_BYTE(x, 6)\
                              |GET_NTH_BYTE(x, 7))

【讨论】：

不过，这不是编译时常量 - 是吗？
你可以使用 C++11 constexpr 来完成这项工作，它也适用于数组 - 至少适用于 gcc-4.6
不幸的是，这在enum 中不起作用，因为条件赋值仅在运行时可用。
constexpr 的条件赋值很好，甚至递归也可以。
问题是针对 C 的。还是有一些语言扩展允许在 C 代码中使用它？

【解决方案3】：

如果你可以在最近的编译器上使用 C++11

constexpr uint64_t insn_to_enum(const char* x) {
    return *x ? *x + (insn_to_enum(x+1) << 8) : 0;
}

enum insn { sysenter = insn_to_enum("sysenter") };

将在编译期间工作并计算常数。

【讨论】：

【解决方案4】：

一些递归模板魔法可以解决问题。如果在编译时已知常量，则不创建代码。

如果你在愤怒中使用它，可能需要留意你的构建时间。

// the main recusrsive template magic. 
template <int N>
struct CharSHift 
{
    static __int64  charShift(char* string )
    {
        return string[N-1] | (CharSHift<N-1>::charShift(string)<<8);
    }
};

// need to provide a specialisation for 0 as this is where we need the recursion to stop
template <>
struct CharSHift<0> 
{
    static __int64 charShift(char* string )
    {
        return 0;
    }
};

// Template stuff is all a bit hairy too look at. So attempt to improve that with some macro wrapping !
#define CT_IFROMS(_string_) CharSHift<sizeof _string_ -1 >::charShift(_string_)

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 hash0 = CT_IFROMS("abcdefgh");

    printf("%08llX \n",hash0);
    return 0;
}

【讨论】：

感谢您的回答，但这是一个 C 项目，所以我对 C 解决方案而不是 C++ 更感兴趣。