电影排程_问题_答案

【问题标题】：The Movie Scheduling _Problem_电影排程_问题_
【发布时间】：2013-09-12 23:47:11
【问题描述】：

目前我正在阅读 Skiena 的“算法设计手册”（嗯，开始阅读）

他提出了一个他称之为“电影调度问题”的问题：

问题：电影调度问题

输入：行上 n 个区间的集合 I。

输出：相互不重叠的区间的最大子集是什么？被我选中？

例子：（每条虚线是一部电影，你想找到一个电影数量最多的集合）

                      ----a---
-----b----    -----c---    ---d---
        -----e---  -------f---
            --g--  --h--

我想解决的算法是这样的：我可以扔掉“最坏的罪犯”（与大多数其他电影相交），直到没有最坏的罪犯（零交叉）。我看到的唯一问题是，如果有平局（比如两部不同的电影，每部都与其他 3 部电影相交），我扔掉哪一部有关系吗？

基本上，我想知道如何将这个想法变成“数学”，以及如何证明它正确/不正确。

【问题讨论】：

我们是在努力制作尽可能多的电影，还是尽可能地填补时间？是 6 部电影可以播放 5 小时，还是 5 部电影可以在同一时间跨度播放 6 小时？
我们正在努力制作最多的电影。
为什么输入中的某些行有多个电影（您的示例）？如果每部电影都在自己的路线上，这有关系吗？这样可以简化思考，否则给人的印象是一些电影被放在同一条线上。
请edit您的问题的标题，使其更具体。 “这个算法正确吗？”有什么可能的用途？当他们在这里搜索时，是给未来的读者吗？您的问题在内容上也很模糊：“我如何将这个想法变成“数学”以及如何证明它正确/不正确”并不是一个可以在这里回答的具体问题。实际上，我倾向于将其作为题外话，更适合 Programmers 作为一个理论问题，因为不涉及任何代码。
也是duplicate。好吧，也许不是确切的问题。但这是同一个问题和书。

标签： algorithm schedule

【解决方案1】：

算法不正确。让我们考虑以下示例：

反例

           |----F----|       |-----G------| 

        |-------D-------|  |--------E--------|

|-----A------|    |------B------|    |------C-------|

你可以看到有一个大小至少为 3 的解，因为你可以pick A, B and C。

首先，让我们数一下，对于每个区间，交叉点的数量：

A = 2    [F, D]
B = 4    [D, F, E, G]
C = 2    [E, G]
D = 3    [A, B, F]
E = 3    [B, C, G]
F = 3    [A, B, D]
G = 3    [B, C, E]

现在考虑运行您的算法。第一步我们删除B，因为它与最多的invervals相交，我们得到：

           |----F----|       |-----G------| 

        |-------D-------|  |--------E--------|

|-----A------|                      |------C-------|

很容易看出，现在从{A, D, F} 你只能选择一个，因为每一对都相交。与{G, E, C}的情况相同，所以删除B后，{A, D, F}最多可以选择一个，{G, E, C}最多可以选择一个，得到2的总数，小于{A, B, C}。

结论是，删除了与invervals最多相交的B后，不能得到最大的不相交电影数。

正确的解决方案

这个问题是众所周知的，一种解决方案是选择最先结束的区间，删除与其相交的所有区间并继续直到没有要检查的区间。这是一个贪心方法的例子，您可以找到或开发证明它是正确的。

【讨论】：

如果我删除重复项会怎样：如果 M1 和 M2 具有相同的相交集并且彼此相交，那么它们是可以互换的——任何包含 M1 的解决方案都可以包含 M2，反之亦然。没有解决方案可以同时包括这两者。因此，从一组间隔中删除任何一个都不会影响找到最佳解决方案的能力。 IF: intersects(M1) union intersects(M2) = intersects(M1)-M2 = intersects(M2)-M1 THEN: I = (I - M1)
@DavidCrowe 你在我的反例中看到两个具有相同交集的区间吗？
D 和 F 都有 [A, B]。 E 和 G 都有 [B, C]。
@DavidCrowe 好的，所以您可以添加一个仅与 D 和 B 相交的新区间 H - 在 F 结束之后和 D 开始之前添加它。然后 B 的数量最多交点，D 和 F 有不同的交点。你可以对 B 和 C 做同样的事情。
@DavidCrowe 我会考虑的，但明天 :)

【解决方案2】：

这对我来说像是一个dynamic programming 问题：

定义以下函数：

sched(t) = best schedule starting at time t
next(t) = set of movies that start next after time t
len(m) = length of movie m

next 返回一个集合，因为可能有多个电影同时开始。

那么sched应该定义如下：

sched(t) = max { 1 + sched(t + len(m)), sched(t+1) } where m in next(t)

此递归函数从next(t) 中选择一部电影m，并比较可能包含或不包含m 的最大集合。

使用您的第一部电影的时间调用sched，您将获得最佳集的大小。获得最佳集合本身只需要一些额外的逻辑来记住您在每次调用时选择了哪些电影。

如果您使用记忆化，我认为这种递归（与迭代相反）算法的运行时间为 O(n^2)，其中 n 是电影的数量。

这是正确的，但我必须查阅我的算法教科书才能给你一个明确的证明，但希望这个算法能直观地理解为什么它是正确的。

【讨论】：

【解决方案3】：

# go through the database and create a 2-D matrix indexed a..h by a..h.  Set each
# element of the matrix to 1 if the row index movie overlaps the column index movie.

mtx = []
for i in range(8):
    column = []
    for j in range(8):
        column.append(0)
    mtx.append(column)

# b <> e
mtx[1][4] = 1
mtx[4][1] = 1

# e <> g
mtx[4][6] = 1
mtx[6][4] = 1

# e <> c
mtx[4][2] = 1
mtx[2][4] = 1

# c <> a
mtx[2][0] = 1
mtx[0][2] = 1

# c <> f
mtx[2][5] = 1
mtx[5][2] = 1

# c <> g
mtx[2][6] = 1
mtx[6][2] = 1

# c <> h
mtx[2][7] = 1
mtx[7][2] = 1

# d <> f
mtx[3][5] = 1
mtx[5][3] = 1

# a <> f
mtx[0][5] = 1
mtx[5][0] = 1

# a <> d
mtx[0][3] = 1
mtx[3][0] = 1

# a <> h
mtx[0][7] = 1
mtx[7][0] = 1

# g <> e
mtx[4][7] = 1
mtx[7][4] = 1

# print out contstraints
for line in mtx:
    print line

# keep track of which movies are still allowed
allowed = set(range(8))

# loop through in greedy fashion, picking movie that throws out the least
# number of other movies at each step
best = 8
while best > 0:
    best_col = None
    best_lost = set()
    best = 8  # score if move does not overlap with any other
    # each step, only try movies still allowed
    for col in allowed:
        lost = set()
        for row in range(8):
            # keep track of other movies eliminated by this selection
            if mtx[row][col] == 1:
                lost.add(row)
        # this was the best of all the allowed choices so far
        if len(lost) < best:
            best_col = col
            best_lost = lost
            best = len(lost)
    # there was a valid selection, process
    if best_col > 0:
        print 'watch movie: ', str(unichr(best_col+ord('a')))
        for row in best_lost:
            # now eliminate the other movies you can't now watch
            if row in allowed:
                print 'throwing out: ', str(unichr(row+ord('a')))
                allowed.remove(row)
        # also throw out this movie from the allowed list (can't watch twice)
        allowed.remove(best_col)

# this is just a greedy algorithm, not guaranteed optimal!
# you could also iterate through all possible combinations of movies
# and simply eliminate all illegal possibilities (brute force search)

【讨论】：