双向链表 - 合并排序后更新 list->tail答案

【问题标题】：Doubly linked list - Update list->tail after a merge sort双向链表 - 合并排序后更新 list->tail
【发布时间】：2020-09-18 04:21:06
【问题描述】：

在一个双向链表的实现中，我使用了典型的结构：

struct node
{
    void *data;
    struct node *prev;
    struct node *next;
};

我还将在 O(1) 时间内插入列表末尾，因此我还有另一个 struct 存储 head 和 tail：

struct linklist
{
    struct node *head;
    struct node *tail;
    size_t size;
};

该程序对所有插入和删除操作都按预期工作，但我的排序功能有问题，我正在使用合并排序算法，据我了解它是最有效或最有效的排序算法之一列表，算法运行良好：

static struct node *split(struct node *head)
{
    struct node *fast = head;
    struct node *slow = head;

    while ((fast->next != NULL) && (fast->next->next != NULL))
    {
        fast = fast->next->next;
        slow = slow->next;
    }

    struct node *temp = slow->next;

    slow->next = NULL;
    return temp;
}

static struct node *merge(struct node *first, struct node *second, int (*comp)(const void *, const void *))
{
    if (first == NULL)
    {
        return second;
    }
    if (second == NULL)
    {
        return first;
    }
    if (comp(first->data, second->data) < 0)
    {
        first->next = merge(first->next, second, comp);
        first->next->prev = first;
        first->prev = NULL;
        return first;
    }
    else
    {
        second->next = merge(first, second->next, comp);
        second->next->prev = second;
        second->prev = NULL;
        return second;
    }
}

static struct node *merge_sort(struct node *head, int (*comp)(const void *, const void *))
{
    if ((head == NULL) || (head->next == NULL))
    {
        return head;
    }

    struct node *second = split(head);

    head = merge_sort(head, comp);
    second = merge_sort(second, comp);
    return merge(head, second, comp);
}

但我不知道如何更新list->tail 的地址：

void linklist_sort(struct linklist *list, int (*comp)(const void *, const void *))
{
    list->head = merge_sort(list->head, comp);
    // list->tail is no longer valid at this point
}

当然我可以在订购后浏览整个列表并通过蛮力更新list->tail，但我想知道是否有更好的方法来做到这一点。

我设法使用循环列表解决了这个问题，但我想避免改变程序的结构。

【问题讨论】：

合并排序的工作原理是拆分列表，但在保持顺序的同时将所有术语分配给同一侧，然后您换边并开始将术语分配给另一侧。然后合并两边，重复直到将列表拆分为一个列表。但是在这里，您将每个节点分配给不同的一侧...您正在扰乱列表...根本没有排序...您在合并阶段所做的所有排序在拆分部分都被破坏了。

标签： c linked-list mergesort

【解决方案1】：

您的算法通过在每个步骤中递归 merge 函数来使用 O(N) 堆栈空间。使用这种方法，跟踪tail 节点会非常麻烦。您可以简单地扫描列表以找到它并更新linklist_sort 中的list 结构。这个额外的步骤不会改变排序操作的复杂性。从link->tail 的当前值开始，您可以节省一些时间：如果列表已经排序，循环将立即停止。

这是修改后的版本：

void linklist_sort(struct linklist *list, int (*comp)(const void *, const void *)) {
    list->head = merge_sort(list->head, comp);
    if (list->tail) {
        struct node *tail = list->tail;
        while (tail->next)
            tail = tail->next;
        list->tail = tail;
    }
}

使用归并排序对链表进行排序只能使用 O(log(N)) 空间和 O(N log(N)) 时间。

这里有一些改进这个算法的想法：

由于您知道列表的长度，因此您无需扫描完整列表进行拆分。您可以将长度与列表指针一起传递，并使用它来确定拆分位置并仅扫描列表的一半。
如果将merge 转换为非递归版本，则可以在合并阶段跟踪最后一个节点，并更新作为参数传递的指针struct node **tailp 以指向最后一个节点。这将保存最后一次扫描，并且删除递归将降低空间复杂度。这是否提高了效率并不明显，基准测试会告诉我们。
根据经验，使用指向列表节点的辅助数组 N 指针更有效地实现对链表进行单向排序和更进一步的双向排序。您将对该数组进行排序并根据排序数组的顺序重新链接节点。额外的要求是 O(N) 大小。

这是使用列表长度和非递归merge 的修改版本：

struct node {
    void *data;
    struct node *prev;
    struct node *next;
};

struct linklist {
    struct node *head;
    struct node *tail;
    size_t size;
};

static struct node *split(struct node *head, size_t pos) {
    struct node *slow = head;

    while (pos-- > 1) {
        slow = slow->next;
    }
    struct node *temp = slow->next;
    slow->next = NULL;
    return temp;
}

static struct node *merge(struct node *first, struct node *second,
                          int (*comp)(const void *, const void *))
{
    struct node *head = NULL;
    struct node *prev = NULL;
    struct node **linkp = &head;

    for (;;) {
        if (first == NULL) {
            second->prev = prev;
            *linkp = second;
            break;
        }
        if (second == NULL) {
            first->prev = prev;
            *linkp = first;
            break;
        }
        if (comp(first->data, second->data)) <= 0 {
            first->prev = prev;
            prev = *linkp = first;
            linkp = &first->next;
        } else {
            second->prev = prev;
            prev = *linkp = second;
            linkp = &second->next;
        }
    }
    return head;
}

static struct node *merge_sort(struct node *head, size_t size,
                               int (*comp)(const void *, const void *))
{
    if (size < 2) {
        return head;
    }

    struct node *second = split(head, size / 2);

    head = merge_sort(head, size / 2, comp);
    second = merge_sort(second, size - size / 2, comp);
    return merge(head, second, comp);
}

void linklist_sort(struct linklist *list, int (*comp)(const void *, const void *)) {
    list->head = merge_sort(list->head, comp, list->size);
    if (list->tail) {
        struct node *tail = list->tail;
        while (tail->next)
            tail = tail->next;
        list->tail = tail;
    }
}

请注意，您还可以简化merge 函数并且在排序期间不更新反向指针，因为您可以在最后一次扫描期间重新链接整个列表。最后一次扫描会更长，对缓存的友好性也更低，但它应该仍然更有效，更不容易出错。

【讨论】：

A bottom up merge sort for linked list 快一点，因为它不涉及任何扫描到拆分列表。
@rcgldr：当然好！自下而上的合并排序应该更有效，因为没有扫描来拆分列表，而且它也对缓存更加友好。通过与尾节点而不是NULL 进行比较，这也有助于根本不拆分列表，这将需要跟踪端节点但会保存最终扫描。
使用带有虚拟节点的循环列表意味着在任何运行之前和之后总会有一个节点。如果列表本身从未拆分，则合并通过比较指针参数来检查运行结束：（第一次运行开始，第一次运行结束==第二次运行开始，第二次运行结束）。
对于使用指针数组的自下而上合并排序，第一次运行的开始 = 当前数组元素，第一次运行的结束 == 第二次运行的开始 = 第一个之前的非空数组元素，第二次结束run == 前第二个非空数组元素或指向列表末尾的本地指针。在将指向运行的指针数组合并到单个运行时，这主要是一个问题。这需要意识到最右边的运行在数组[0]，最左边的运行在数组[max used]（数组索引越高，列表的左侧越多）。

【解决方案2】：

一种选择是将节点合并排序，就好像它们是单个列表节点一样，然后在完成后进行一次传递以设置先前的指针，并更新尾指针。

另一种选择是使用类似于 C++ 的 std::list 和 std::list::sort 的东西。使用循环双向链表。有一个虚拟节点使用“next”作为“head”，“prev”作为“tail”。合并排序和合并的参数是迭代器或指针，仅用于跟踪运行边界，因为通过在原始列表中移动节点来合并节点。合并函数使用 std::list::splice 将第二次运行的节点合并到第一次运行。逻辑是如果第一个运行元素小于或等于第二个运行元素，只需将迭代器或指针推进到第一次运行，否则从第二次运行中删除节点并将其插入到第一次运行中的当前节点之前。如果涉及删除 + 插入步骤，这将自动更新虚拟节点中的头和尾指针。

将结构节点更改为：

struct node
{
    struct node *next;           // used as head for dummy node
    struct node *prev;           // used as tail for dummy node
    void *data;
};

会更通用一点。

由于dummy节点是在创建列表时分配的，那么begin == dummy->next，last == dummy-> prev，end == dummy。

【讨论】：

谢谢，我非常喜欢虚拟节点的想法及其最佳解决方案，但我看到一个问题：您总是需要将比较函数传递给push，pop ...函数，当然你可以传递一个虚拟比较函数，但我更喜欢保持 API 不变。再次感谢！
另一方面，我可以将比较函数传递给构造函数并保持 API 的其余部分不变，是的，这绝对是个好主意。
@DavidRanieri - 为什么 push 和 pop 需要比较函数？
...小于或等于第二个运行元素...
@DavidRanieri - 我不明白，你会使用 pop 和 push 作为在排序过程中重新排列节点的一种方式吗？如果是这样，请注意在重新排列任何节点之前进行比较。

【解决方案3】：

我不是对算法Big-O 符号进行深入分析的最佳人选。无论如何，用已经被接受的“规范”答案来回答问题是很棒的，因为有可能在没有太大压力的情况下探索替代解决方案。
这很有趣，即使如您所见，分析的解决方案并不比问题中提出的当前解决方案更好。

该策略首先考虑是否可以在不颠倒代码的情况下跟踪候选尾部元素。主要候选者是决定排序列表中节点顺序的函数：merge() 函数。

现在，由于在比较之后我们决定哪个节点将在排序列表中排在第一位，我们将有一个更靠近尾部的 “loser”。因此，通过与每个步骤的当前尾部元素进行进一步比较，最终我们将能够使用 “loser of the losters” 更新tail 元素。

合并函数会有额外的struct node **tail参数（双指针是必需的，因为我们将更改列表tail字段就地：

static struct node *merge(struct node *first, struct node *second, struct node **tail, int (*comp)(const void *, const void *))
{
    if (first == NULL)
    {
        return second;
    }
    if (second == NULL)
    {
        return first;
    }
    if (comp(first->data, second->data) < 0)
    {
        first->next = merge(first->next, second, tail, comp);

        /* The 'second' node is the "loser". Let's compare current 'tail' 
           with it, and in case it loses again, let's update  'tail'.      */
        if( comp(second->data, (*tail)->data) > 0)
            *tail = second;
        /******************************************************************/

        first->next->prev = first;
        first->prev = NULL;
        return first;
    }
    else
    {
        second->next = merge(first, second->next, tail, comp);

        /* The 'first' node is the "loser". Let's compare current 'tail' 
           with it, and in case it loses again, let's update  'tail'.      */
        if( comp(first->data, (*tail)->data) > 0)
            *tail = first;
        /******************************************************************/

        second->next->prev = second;
        second->prev = NULL;
        return second;
    }
}

除了通过merge_sort() 和linklist_sort() 函数“传播”tail 双指针参数外，无需对代码进行更多更改：

static struct node *merge_sort(struct node *head, struct node **tail, int (*comp)(const void *, const void *));

void linklist_sort(List_t *list, int (*comp)(const void *, const void *))
{
    list->head = merge_sort(list->head, &(list->tail), comp);
}

测试

为了测试这个修改，我必须编写一个基本的insert() 函数、一个设计用于获取降序排序列表的compare() 函数和一个printList() 实用程序。然后我写了一个主程序来测试所有的东西。

我做了几个测试；这里我只举一个例子，我在这个答案中省略了问题和上面提到的函数：

#include <stdio.h>

typedef struct node
{
    void *data;
    struct node *prev;
    struct node *next;
} Node_t;

typedef struct linklist
{
    struct node *head;
    struct node *tail;
    size_t size;
} List_t;

void insert(List_t *list, int data)
{
    Node_t * newnode = (Node_t *) malloc(sizeof(Node_t) );
    int * newdata = (int *) malloc(sizeof(int));
    *newdata = data;

    newnode->data = newdata;
    newnode->prev = list->tail;
    newnode->next = NULL;
    if(list->tail)
        list->tail->next = newnode;

    list->tail = newnode;

    if( list->size++ == 0 )
        list->head = newnode;   
}

int compare(const void *left, const void *right)
{
    if(!left && !right)
        return 0;

    if(!left && right)
        return 1;
    if(left && !right)
        return -1;

    int lInt = (int)*((int *)left), rInt = (int)*((int *)right);

    return (rInt-lInt); 
}

void printList( List_t *l)
{
    for(Node_t *n = l->head; n != NULL; n = n->next )
    {
        printf( " %d ->", *((int*)n->data));
    }
    printf( " NULL (tail=%d)\n", *((int*)l->tail->data));
}


int main(void)
{
  List_t l = { 0 };

  insert( &l, 5 );
  insert( &l, 3 );
  insert( &l, 15 );
  insert( &l, 11 );
  insert( &l, 2 );
  insert( &l, 66 );
  insert( &l, 77 );
  insert( &l, 4 );
  insert( &l, 13 );
  insert( &l, 9 );
  insert( &l, 23 );

  printList( &l );

  linklist_sort( &l, compare );

  printList( &l );

  /* Free-list utilities omitted */

  return 0;
}

在这个特定的测试中，我得到了以下输出：

 5 -> 3 -> 15 -> 11 -> 2 -> 66 -> 77 -> 4 -> 13 -> 9 -> 23 -> NULL (tail=23)
 77 -> 66 -> 23 -> 15 -> 13 -> 11 -> 9 -> 5 -> 4 -> 3 -> 2 -> NULL (tail=2)

结论

好消息是，从理论上讲，我们仍然有一个算法，在最坏的情况下，将具有 O(N log(N)) 时间复杂度。
坏消息是，为了避免在链表中进行线性搜索（N 个“简单步骤”），我们必须进行 N*logN 比较，包括对函数的调用。 这使得线性搜索仍然是一个更好的选择。

【讨论】：

写这篇分析对我来说很有教育意义。不要因为我为次优解决方案付出了这么多努力而对我苛刻。 ;)
罗伯托太好了，谢谢！！它就像一个魅力。
这使得线性搜索仍然是一个更好的选择你是对的，对于 100000 条记录，线性搜索花费的时间要少几毫秒。