双重拍卖的数组快速匹配答案

【问题标题】：fast matching in arrays for double auction双重拍卖的数组快速匹配
【发布时间】：2022-01-08 14:19:39
【问题描述】：

我正在为双重拍卖实施匹配系统。我有buy 和sell 订单数组，每个订单都有[price, quantity]。例如buy:

[[ 2 44]
 [ 6 47]
 [10 64]
 [ 4 67]
 [ 1 67]
 [ 6  9]
 [ 1 83]
 [ 2 21]
 [ 3 36]
 [ 5 87]
 [ 3 70]]

所以第一个购买订单是44 单位，价格为2。价格仅限于价格网格，例如1, 2, ..., 10。对于每个可能的价格，我创建了显示总需求和供应的累积数组。例如，对于总需求，我查看价格网格上每个价格P 的累积数量，对价格大于或等于P 的所有买单求和。然后我发现 清算价格 是这样的价格，即总买单小于总卖单的价格。 清算数量是清算价格下总水平中较小的一个。

例如这里的清算价格是6（红色虚线）和2250附近的清算数量（蓝色虚线）

我的问题是，有没有更快/更简洁的方法来计算清算价格？假设价格网格变得非常精细（例如 10000 个可能的价格），而无需检查每个可能的价格水平，我该如何提高效率？速度和效率是主要问题。

这里展示的是 Python 的实现（生产中可能会使用其他较低级别的语言）

import numpy as np

MAX_QTY = 100
MIN_QTY = 0
MIN_PX = 1
MAX_PX = 11
TICK_SIZE = 1

price_grid = np.arange(MIN_PX, MAX_PX, TICK_SIZE)

def gen_orders(num, price_grid):
    qty = np.random.randint(MIN_QTY, MAX_QTY, num)
    px = np.random.choice(price_grid, num)
    return np.array((px, qty)).T

buy = gen_orders(100, price_grid)
sell = gen_orders(100, price_grid)

agg = np.array([[x, np.sum(buy[buy[:, 0]>=x][:, 1]), np.sum(sell[sell[:, 0]<=x][:, 1])] for x in price_grid])

matched = agg[agg[:, 1]<agg[:, 2]][0, :] # price_grid is sorted
cleared_px = matched[0]
cleared_qty = np.min(matched[1:])

【问题讨论】：

标签： python python-3.x numpy optimization

【解决方案1】：

您在此语句中隐式创建了几个嵌套循环：

agg = np.array([[x, np.sum(buy[buy[:, 0]>=x][:, 1]), np.sum(sell[sell[:, 0]<=x][:, 1])] for x in price_grid])

尽管它们主要以矢量化格式执行，但当价格网格很大或订单数量很大时，这会让你吃不消。

您可以通过合并订单以线性方式执行此操作。我在下面使用字典。然后，再线性遍历buys 字典以创建您需要的总需求数，这仍然是 O(n) 而不是 O(n^2)。

对于更大的n，这是一个巨大的变化。我对 orig 和 mod（如下）进行了计时，对于 5000 个订单和 10K 的价格网格（您的价值），这个 mod 在没有 numpy 操作的情况下快 100 倍。

注意：如果供应 == 需求恰好在任何价格步长，则 mod 会提前停止，这在逻辑上存在细微差别。（不清楚这是错误还是功能...... :)，但可以很容易地调整逻辑）。我已经展示了它们与时差匹配的（有些罕见的）事件的捕获。

带时间的编辑代码

import numpy as np
import time

MAX_QTY = 10
MIN_QTY = 0
MIN_PX = 1
MAX_PX = 10_000
TICK_SIZE = 1

price_grid = np.arange(MIN_PX, MAX_PX, TICK_SIZE)

def gen_orders(num, price_grid):
    qty = np.random.randint(MIN_QTY, MAX_QTY, num)
    px = np.random.choice(price_grid, num)
    return np.array((px, qty)).T

buy = gen_orders(5000, price_grid)
sell = gen_orders(5000, price_grid)

tic = time.time() 
agg = np.array([[x, np.sum(buy[buy[:, 0]>=x][:, 1]), np.sum(sell[sell[:, 0]<=x][:, 1])] for x in price_grid])
matched = agg[agg[:, 1]<agg[:, 2]][0, :] # price_grid is sorted
cleared_px = matched[0]
cleared_qty = np.min(matched[1:])
toc = time.time()
print(f'ORIG: computed clear px: {cleared_px} and qty: {cleared_qty} in {toc-tic:0.6f} sec')

###  ALTERNATE ###

# Start the clock again for the mod method...
tic = time.time()
buys = {}
sells = {}
# "bin" the buys by price
for b in buy:
    buys[b[0]] = buys.get(b[0], 0) + b[1]
# need to aggregate the demand...
agg_demand = {MAX_PX: buys.get(MAX_PX,0)} # starting point
for px in range(MAX_PX-1, MIN_PX-1, -1):  # backfill down to min px
    agg_demand[px] = agg_demand[px+1] + buys.get(px, 0)

# "bin" the sells similarly
for s in sell:
    sells[s[0]] = sells.get(s[0], 0) + s[1]

# set up the loop
selling_px = MIN_PX
supply = sells.get(selling_px, 0)
demand = agg_demand.get(selling_px, 0)
while demand > supply:
    # updates
    selling_px += 1
    demand = agg_demand.get(selling_px) # update with the pre-computed aggregate demand
    supply += sells.get(selling_px, 0)  # keep running aggregation of supply
new_cleared_px = selling_px
new_cleared_qty = min(demand, supply)


toc = time.time()
print(f'MOD: computed clear px: {new_cleared_px} and qty: {new_cleared_qty} in {toc-tic:0.6f} sec')

if cleared_px != new_cleared_px or cleared_qty != new_cleared_qty:  # somethign wrong...??
    print(agg[cleared_px-5:cleared_px+5,:])

输出：

ORIG: computed clear px: 4902 and qty: 11390 in 1.183204 sec
MOD: computed clear px: 4899 and qty: 11398 in 0.020830 sec
[[ 4898 11411 11398]
 [ 4899 11398 11398]
 [ 4900 11398 11398]
 [ 4901 11398 11398]
 [ 4902 11390 11398]
 [ 4903 11385 11398]
 [ 4904 11385 11398]
 [ 4905 11385 11398]
 [ 4906 11385 11398]
 [ 4907 11384 11398]]
[Finished in 1.3s]

【讨论】：

【解决方案2】：

您可以尝试一些技巧：

数量匹配时停止，如notebook所示

buy_sum = np.sum(buy[buy[:, 0]>=x][:, 1])
sell_sum = np.sum(sell[sell[:, 0]<=x][:, 1])

if buy_sum < sell_sum:
    cleared_px = x
    cleared_qty = buy_sum
    break

首先对buy 和sell 进行排序，这样当您循环遍历buy 和sell 时，sum_of_quantity 函数会更快。不幸的是，在 Python 中使用 for 循环有很多开销，因此在 Python 中使用像 np.sum(buy[buy[:, 0]>=x][:, 1]) 这样的 numpy 矢量化操作会更快。但是，这在低级语言中会很有用。
如果您已对buy 和sell 订单进行排序，则将总和值缓存在箱中。例如，您可以将x <= 4 的总和存储在内存中，因此当您要计算x <= 5 的总和时，可以使用x <= 4 的总和加上x == 5 的总和。这需要跟踪订单列表中x 更改的索引。请注意，它不能与 numpy 矢量化操作一起使用，因为 buy[:, 0]==5 与使用 numpy 的 buy[:, 0]<=5 一样昂贵。
尝试search algorithms 之类的方法。这对于更大的搜索空间很有用，即price_grid 有更多的值。例如price_grid <= 10，
- 先试试x == 5。
- 如果是buy_sum > sell_sum，试试更大的x，例如x == 7。
- 如果buy_sum < sell_sum，通过检查x 是x - 1 来确认它是最优惠的价格；否则，选择比5 更大的更小的x 并重复

【讨论】：