C中是否允许负数组索引？答案

【问题标题】：Are negative array indexes allowed in C?C中是否允许负数组索引？
【发布时间】：2011-03-29 06:29:10
【问题描述】：

我只是在阅读一些代码，发现此人正在使用arr[-2] 访问arr 之前的第二个元素，如下所示：

|a|b|c|d|e|f|g|
       ^------------ arr[0]
         ^---------- arr[1]
   ^---------------- arr[-2]

允许吗？

我知道arr[x] 与*(arr + x) 相同。所以arr[-2] 是*(arr - 2)，这似乎没问题。你怎么看？

【问题讨论】：

标签： c arrays

【解决方案1】：

没错。来自 C99 §6.5.2.1/2：

下标的定义运算符 [] 是 E1[E2] 是等同于 (*((E1)+(E2)))。

没有魔法。这是 1-1 等价的。在取消引用指针 (*) 时，您需要确保它指向一个有效地址。

【讨论】：

还请注意，您不必取消引用指针来获取 UB。仅计算 somearray-2 是未定义的，除非结果在从 somearray 开始到结束后 1 的范围内。
在较早的书籍中，[] 被称为指针运算的语法糖。 最喜欢迷惑初学者的方法是写1[arr] - 而不是arr[1] - 看着他们猜测应该是什么意思。
当您的 32 位 int 索引为负数时，在 64 位系统 (LP64) 上会发生什么？是否应该在地址计算之前将索引提升为 64 位有符号整数？
@Paul，来自 §6.5.6/8（加法运算符），“当一个具有整数类型的表达式被添加到指针或从指针中减去时，结果具有指针操作数的类型。如果指针操作数指向数组对象的元素，并且数组足够大，则结果指向与原始元素偏移的元素，使得结果和原始数组元素的下标之差等于整数表达式。 "所以我认为它会被提升，((E1)+(E2)) 将是一个具有预期值的（64 位）指针。
一个奇怪的旁注：由于下标运算符 [] 的定义方式是 E1[E2] 与 (*((E1)+(E2))) 相同（参见答案Matthew Flaschen），编写2[arr] 而不是arr[2] 实际上是有效的C 代码。不过，我承认这是故意混淆代码。

【解决方案2】：

这仅在arr 是指向数组中的第二个元素或后面的元素的指针时才有效。否则，它是无效的，因为您将访问数组边界之外的内存。因此，例如，这是错误的：

int arr[10];

int x = arr[-2]; // invalid; out of range

不过这样就好了：

int arr[10];
int* p = &arr[2];

int x = p[-2]; // valid:  accesses arr[0]

但是，使用负下标是不常见的。

【讨论】：

@Matt：第一个示例中的代码产生了未定义的行为。
无效。根据 C 标准，它明确具有未定义的行为。另一方面，如果int arr[10]; 是之前有其他元素的结构的一部分，则arr[-2] 可能是明确定义的，您可以确定它是否基于offsetof 等。
在 K&R 第 5.3 节中找到它，接近尾声：If one is sure that the elements exist, it is also possible to index backwards in an array; p[-1], p[-2], and so on are syntactically legal, and refer to the elements that immediately precede p[0]. Of course, it is illegal to refer to objects that are not within the array bounds. 不过，您的示例更好地帮助我理解它。谢谢！
抱歉线程死灵，但我只是喜欢 K&R 对“非法”的含义模棱两可。最后一句话听起来像是越界访问会引发编译错误。那本书对初学者来说是毒药。
@Martin 公平地说，这本书是在我们行业历史上更早的时候写成的，当时期望“非法”被解释为“不要这样做，你不是”仍然是非常合理的允许”而不是“您将被禁止这样做”。

【解决方案3】：

对我来说听起来不错。但是，您合法需要它的情况很少见。

【讨论】：

这不是那罕见 - 它非常有用，例如使用邻域运营商进行图像处理。
我只需要使用它，因为我正在创建一个带有堆栈和堆[结构/设计]的内存池。堆栈向更高的内存地址增长，堆向更低的内存地址增长。在中间开会。

【解决方案4】：

arr 可能指向数组的中间，因此使arr[-2] 指向原始数组中的某些内容而不会超出范围。

【讨论】：

【解决方案5】：

我不确定这有多可靠，但我刚刚阅读了以下关于 64 位系统（大概是 LP64）上的负数组索引的警告：http://www.devx.com/tips/Tip/41349

作者似乎是在说 32 位 int 数组索引和 64 位寻址会导致错误的地址计算，除非数组索引被显式提升为 64 位（例如，通过 ptrdiff_t 强制转换）。我实际上已经在 gcc 4.1.0 的 PowerPC 版本中看到了他性质的错误，但我不知道它是编译器错误（即应该根据 C99 标准工作）还是正确的行为（即索引需要强制转换为 64正确行为的位）？

【讨论】：

这听起来像是一个编译器错误。

【解决方案6】：

我知道这个问题已经得到解答，但我忍不住分享这个解释。

我记得编译器设计的原则：假设a是一个int数组，int的大小是2，a的基地址是1000。

a[5] 将如何工作 ->

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

这个解释也是数组中的负索引在 C 中起作用的原因；即，如果我访问a[-5]，它将给我：

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

它将返回位置 990 的对象。因此，通过这种逻辑，我们可以访问 C 中数组中的负索引。

【讨论】：

【解决方案7】：

关于为什么有人要使用负索引，我在两种情况下使用它们：

有一个组合数字表，告诉您 comb[1][-1] = 0;您总是可以在访问表之前检查索引，但是这样代码看起来更干净并且执行得更快。
在表格的开头放置一个centinel。例如，您想使用类似的东西
```
 while (x < a[i]) i--;
```

但是您还应该检查i 是否为正。
解决方案：使a[-1] 为-DBLE_MAX，使x&lt;a[-1] 始终为假。

【讨论】：

【解决方案8】：

#include <stdio.h>

int main() // negative index
{ 
    int i = 1, a[5] = {10, 20, 30, 40, 50};
    int* mid = &a[5]; //legal;address,not element there
    for(; i < 6; ++i)
    printf(" mid[ %d ] = %d;", -i, mid[-i]);
}

【讨论】：

虽然此代码可能会回答问题，但提供有关此代码为何和/或如何回答问题的额外上下文可提高其长期价值。
Python groovy... 拥有它们。一个简单的用例是可以在不知道数组大小的情况下访问数组的最后一个元素，这是许多项目情况下非常实际的要求。许多 DSL 也因此受益。

【解决方案9】：

我想分享一个例子：

GNU C++ 库 basic_string.h

[注意：正如有人指出这是一个“C++”示例，它可能不适合“C”这个主题。我编写了一个与示例具有相同概念的“C”代码。至少，GNU gcc 编译器没有任何抱怨。]

它使用 [-1] 将指针从用户字符串移回管理信息块。因为它一次分配内存，有足够的空间。

说 " * 这种方法具有字符串对象的巨大优势 * 只需要一次分配。 所有的丑陋都被限制了 * 在单个 %pair 内联函数中，每个函数都编译为 * 单个 @a 添加指令：_Rep::_M_data()，和 * string::_M_rep();和得到的分配函数 * 原始字节块和足够的空间并构造一个_Rep * 对象在前面。 "

源代码： https://gcc.gnu.org/onlinedocs/gcc-10.3.0/libstdc++/api/a00332_source.html

   struct _Rep_base
   {
     size_type               _M_length;
     size_type               _M_capacity;
     _Atomic_word            _M_refcount;
   };

   struct _Rep : _Rep_base
   {
      ...
   }

  _Rep*
   _M_rep() const _GLIBCXX_NOEXCEPT
   { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

它解释了：

*  A string looks like this:
*
*  @code
*                                        [_Rep]
*                                        _M_length
*   [basic_string<char_type>]            _M_capacity
*   _M_dataplus                          _M_refcount
*   _M_p ---------------->               unnamed array of char_type
*  @endcode
*
*  Where the _M_p points to the first character in the string, and
*  you cast it to a pointer-to-_Rep and subtract 1 to get a
*  pointer to the header.
*
*  This approach has the enormous advantage that a string object
*  requires only one allocation.  All the ugliness is confined
*  within a single %pair of inline functions, which each compile to
*  a single @a add instruction: _Rep::_M_data(), and
*  string::_M_rep(); and the allocation function which gets a
*  block of raw bytes and with room enough and constructs a _Rep
*  object at the front.
*
*  The reason you want _M_data pointing to the character %array and
*  not the _Rep is so that the debugger can see the string
*  contents. (Probably we should add a non-inline member to get
*  the _Rep for the debugger to use, so users can check the actual
*  string length.)
*
*  Note that the _Rep object is a POD so that you can have a
*  static <em>empty string</em> _Rep object already @a constructed before
*  static constructors have run.  The reference-count encoding is
*  chosen so that a 0 indicates one reference, so you never try to
*  destroy the empty-string _Rep object.
*
*  All but the last paragraph is considered pretty conventional
*  for a C++ string implementation.

//使用之前的概念，写一个示例C代码

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

typedef struct HEAD {
    int f1;
    int f2;
}S_HEAD;

int main(int argc, char* argv[]) {
    int sz = sizeof(S_HEAD) + 20;

    S_HEAD* ha = (S_HEAD*)malloc(sz);
    if (ha == NULL)
      return -1;

    printf("&ha=0x%x\n", ha);

    memset(ha, 0, sz);

    ha[0].f1 = 100;
    ha[0].f2 = 200;

    // move to user data, can be converted to any type
    ha++;
    printf("&ha=0x%x\n", ha);

    *(int*)ha = 399;

    printf("head.f1=%i head.f2=%i user data=%i\n", ha[-1].f1, ha[-1].f2, *(int*)ha);

    --ha;
    printf("&ha=0x%x\n", ha);

    free(ha);

    return 0;
}



$ gcc c1.c -o c1.o -w
(no warning)
$ ./c1.o 
&ha=0x13ec010
&ha=0x13ec018
head.f1=100 head.f2=200 user data=399
&ha=0x13ec010

库作者使用它。希望对您有所帮助。

【讨论】：

问题标签是针对 C 而不是针对 C++，这个例子与主题完全无关。请先阅读标签。
好吧，我假设 [-1] 数组索引在 C 和 C++ 中很常见。尽管如此，我还是用 C 编写了一个示例代码。至少，GCC 不会抱怨使用 [-1] 来索引数组。