Ruby - 如何加快“.each”数组的循环速度？答案

【问题标题】：Ruby - how to speed up looping through an ".each" array?Ruby - 如何加快“.each”数组的循环速度？
【发布时间】：2015-01-18 19:35:18
【问题描述】：

我在尝试提高代码性能的方法中拥有这些模型和以下行。

class Location < ActiveRecord::Base
  belongs_to :company
end
class Company < ActiveRecord::Base
  has_many :locations
end

在方法中：

locations_company = []

###
found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")
### 0.002659s

###
found_locations.each do |location|
  locations_company << location.company
end
### 45.972285s

###
companies = locations_company.uniq{|x| x.id}
### 0.033029s

代码具有此功能 - 首先，抓取指定半径内的所有位置。然后，从找到的每一行中取出公司并将其保存到准备好的数组中。这是有问题的部分 - 每个循环需要 45 秒来处理。

然后从这个新创建的数组中删除重复项。

我仍然想知道是否有更好的方法来解决这种情况，但我担心我现在看不到它，所以我想问你们如何加快 .each 循环将数据保存到数组中 - ruby 中是否有更好的方法从对象中获取一些信息？

非常感谢您的宝贵时间，我整天都沉浸在这个问题中，但仍然没有更有效的解决方案。

【问题讨论】：

如果您查看found_locations，您会注意到它可能是一个查询代理，而不是一个合并的结果集。 #each 几乎肯定不是你的瓶颈；您应该正确地分析您的代码以找到瓶颈。
这个问题似乎是题外话，因为它是关于重构和提高现有代码的性能，应该在Code Review。

标签： ruby-on-rails ruby arrays performance each

【解决方案1】：

最好的方法是不循环。您的最终目标似乎是找到指定区域内的所有公司。

found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")
companies = Company.where(id: found_locations.pluck(:company_id).uniq)

【讨论】：

如果数据库支持，Company.distinct 代替 bleh.uniq 可能会有所帮助。
Company.distinct 不是必需的。如果您不使用 uniq，它只会将更大的数组传递给 WHERE id IN [] 查询。即使数组中多次包含 id，数据库也只会返回每个公司的一条记录。我个人不喜欢向查询中发送不必要的信息，添加/删除 uniq 不会对性能产生实质性影响。
取决于found_locations 是否会在公司之外实际使用，您可能会对此采取不同的变体。如果您打算单独使用found_locations，那么您可以/应该将其强制为带有to_a 的数组，然后将下一行中的逻辑更改为Company.where(id: found_locations.map(&:id).uniq)。如果您不打算单独使用 found_locations，那么我放在那里的内容最好，因为您甚至不会创建 Location 对象，而只需提取您需要的 id。
另一个peachy-keen回答！

【解决方案2】：

问题不在于 each，而在于查询仅在您开始对其进行迭代时才开始执行。 found_locations 不是查询的结果，它是一个查询构建器，一旦需要（例如当您开始迭代结果时）就会执行查询。

【讨论】：

【解决方案3】：

我相信一直占用的不是each，而是对数据库的查询。

第一行，虽然它构建了查询并没有真正运行它。

我相信如果你把代码写成如下：

locations_company = []

found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")

### this line will take most of the time
found_locations = found_locations.to_a
###    

###
found_locations.each do |location|
  locations_company << location.company_id
end
### 

###
companies = locations_company.uniq{|x| x.id}
###

您会看到each 将花费更少的时间。您应该考虑优化查询。

正如@AlexPeachey 在下面评论的那样，location.company 还将涉及对列表中每个位置的查询，因为它是一个关系。您可能希望通过添加以下内容急切地加载公司：

found_locations = Location.includes(:company).within(distance, origin: from_result.split(',')).order("distance ASC")

【讨论】：

查询可能很慢，但是使用这种方法的 each 不会立即生效，因为您每次都在通过 each 循环对 company 表进行查询。为避免这种情况，更改为 Location.includes(:company) 将加载所有所需的公司，只需一个额外的查询。
谢谢@AlexPeachey，我错过了那部分。更新了答案