【发布时间】:2013-11-21 03:31:11
【问题描述】:
我正在制作一个抄袭检测程序。成品会逐句比较两个全文文档;在这一点上,我只是在测试我的算法来比较句子,并给出一个介于 0 和 1 之间的数字来表示它们的单词有多相似。
我将尝试单步调试代码并向您展示问题所在。
指令和函数声明:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <math.h>
#include <set>
double sntnc_cmpr_qtnt(const std::vector<std::string>&, const std::vector<std::string>&);
main 接受两个字符串数组并将它们放入向量中。我知道这似乎没用,但这只是为了我的测试目的。我计算两个字符串向量(应该是 2 个句子)之间的句子比较商。
int main (int argc, char* const argv[]) {
std::string arr1[] = {"Yo", "dawg", "I", "heard", "you", "like", "functions", "so", "we", "put", "a", "function", "inside"};
std::vector<std::string> str1, str2;
for (int i = 0; i < sizeof(arr1)/sizeof(std::string); ++i)
str1.push_back(arr1[i]);
std::string arr2[] = {"Yo", "dawg", "I", "heard", "you", "like", "cars", "so", "we", "put", "a", "car", "inside"};
for (int i = 0; i < sizeof(arr2)/sizeof(std::string); ++i)
str2.push_back(arr2[i]);
std::cout << sntnc_cmpr_qtnt(str1, str2);
return 0;
}
这里是句子比较商函数。它计算两个句子之间共有的单词数。
不过,出了点问题。我的计数(“cnt”)达到 158,这显然太高了。我不明白为什么会这样。
double sntnc_cmpr_qtnt(const std::vector<std::string>& s1, const std::vector<std::string>& s2) {
// Place the words of sentences s1 and s2 each into seperate sets s1_set and s2_set:
std::set<std::string> s1set, s2set;
for (std::vector<std::string>::const_iterator it = s1.begin(); it != s1.end(); ++it)
s1set.insert(*it);
for (std::vector<std::string>::const_iterator it = s2.begin(); it != s2.end(); ++it)
s2set.insert(*it);
/* Compute the proportion of words in common between str1_set and str2_set,
multiped by 1 over 1 minus the squareroot of the size of the smaller set.
This is the sentence comparison quotient that is returned. */
double cnt(0.0);
for (std::set<std::string>::iterator it1 = s1set.begin(); it1 != s1set.end(); ++it1) {
for (std::set<std::string>::iterator it2 = s2set.begin(); it2 != s2set.end(); ++it2) {
if ((*it1).compare(*it2))
cnt += 1.0;
}
}
if (cnt == 0.0) {
return 0.0;
} else {
double minsz = (double)std::min(s1set.size(), s2set.size());
return ((1-1/sqrt(minsz))*cnt/minsz);
}
}
【问题讨论】:
-
您可能想看看
string::compare实际返回的内容。 -
sizeof(std::string)=8 .... 我很确定这是指向字符串的指针的大小
-
啊,我明白了,sbabbi。哎呀。
-
如果您愿意,也可以使用
set_intersection。
标签: c++ string algorithm data-structures std