【发布时间】:2010-06-16 14:04:37
【问题描述】:
我目前有一些我想移植到 C++ 的 python 代码,因为它目前比我希望的要慢。问题是我在其中使用了一个字典,其中键是一个由一个对象和一个字符串组成的元组(例如(obj,“word”))。 我到底如何在 C++ 中编写类似的东西?也许我的算法太可怕了,有什么方法可以在不求助于 C++ 的情况下让它更快?
为清楚起见,下面的整个算法。字典“post_score”是问题所在。
def get_best_match_best(search_text, posts):
"""
Find the best matches between a search query "search_text" and any of the
strings in "posts".
@param search_text: Query to find an appropriate match with in posts.
@type search_text: string
@param posts: List of candidates to match with target text.
@type posts: [cl_post.Post]
@return: Best matches of the candidates found in posts. The posts are ordered
according to their rank. First post in list has best match and so on.
@returntype: [cl_post.Post]
"""
from math import log
search_words = separate_words(search_text)
total_number_of_hits = {}
post_score = {}
post_size = {}
for search_word in search_words:
total_number_of_hits[search_word] = 0.0
for post in posts:
post_score[(post, search_word)] = 0.0
post_words = separate_words(post.text)
post_size[post] = len(post_words)
for post_word in post_words:
possible_match = abs(len(post_word) - len(search_word)) <= 2
if possible_match:
score = calculate_score(search_word, post_word)
post_score[(post, search_word)] += score
if score >= 1.0:
total_number_of_hits[search_word] += 1.0
log_of_number_of_posts = log(len(posts))
matches = []
for post in posts:
rank = 0.0
for search_word in search_words:
rank += post_score[(post, search_word)] * \
(log_of_number_of_posts - log(1.0 + total_number_of_hits[search_word]))
matches.append((rank / post_size[post], post))
matches.sort(reverse=True)
return [post[1] for post in matches]
【问题讨论】:
-
说真的,如果代码已经没有错误,为什么不利用现有的工具呢?你看,Joe Polski 不建议重写。
-
@Hamish Grubijan:Joe Polski 是谁,我为什么要关心他的建议?
-
@sbk 你应该关心,因为他的话就是法律。
-
@Ignacio Vazquez-Abrams:不,但我会调查一下。谢谢。 :) @Hamish Grubijan:“利用现有工具”是什么意思?
-
@MdaG,我的意思是
Cython,还有其他我听说过但忘记名字的工具。
标签: c++ python dictionary