我开发了一种使用 Luminoso 的方法。我为他们工作,所以这有点像广告,但它完全符合您的要求。
https://www.luminoso.com/search
Luminoso 非常擅长理解产品评论、产品描述、调查结果和故障单等对话文本。它不需要任何类型的培训或本体构建,并且会围绕您的语言构建语言模型。您将页面的文本输入 Luminoso,它会为您的文本中使用的概念生成一组同义词。
作为一个示例项目,我使用 Amazon.com 美容产品构建了一个搜索。我将复制几个围绕三个概念自动生成的同义词。该数据集生成了 17851 个同义词。
scent, rose-like, not sickeningly, not nauseating, not overwhelming, herb-y, no sweetness, cucumber-y, not too citrus-y, no gardenia, not lemony, pachouli, vanilla-like, fragarance, not spicy, flowerly, musk, perfume-like, floraly, not cloyingly => scent
recommend, recommende, advice, suggestion, highly recommend, suggest, recommeded, recommendation, recommend this product, reccommended, advise, suggest, indicated, suggestion, advice, agree, recommend, say, considering, mentioned => recommend
bottle, no sprayer, 8-oz, beaker, decanter, push-down, dispenser, pipet, pint, not the bottle, no dropper, keg, gallon, jug, pump-top, liter, half-full, decant, tumbler, vial => bottle
eczema, non-steroidal, ulcerative, dematitis, ecsema, Elidel, dermititis, inflammation, pityriasis, hydrocortizone, dyshidrotic, chickenpox, Stelatopia, perioral, rosacea, dry skin, nummular, ecxema, mild-moderate, ezcema => eczema
此搜索索引中有 800k 产品,因此结果也很大,但这也适用于小型数据集。
除了同义词格式之外,您还可以将其直接放入 elasticsearch 并将特定页面的同义词与该页面相关联。
这是使用相同技术增强的 Elasticsearch 索引示例。它的拨号非常高,因此添加了太多概念,但只是为了向您展示它如何找到概念之间的关系。
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "130414089X"}}
{"title": "New Benefit Waterproof Automatic Eyeliner Pen - Black - BAD Gal Liner", "text": "Length : 13.5 cm\nColor: Black\n100% Brand new and unused.\nSmudge free.\nFine-tip. Easy to blend and smooth to apply\nCan make fine and bold eyeline with new texture and furnishing.\nProvide rich and consistant colour\nLongwearing and waterproof\nFregrance Free", "primary_concepts": ["not overpoweringly", "concoction", "equipped", "fine-tip", "water-resistant", "luxuriant", "make", "fixture", "☆", "not lengthen", "washable", "not too heady", "blendable", "doesn't collect", "shade", "niche", "supple", "smudge-proof", "sumptuous", "movable", "black", "over-apply", "quick", "silky", "colored", "sweatproof", "opacity", "accomodate", "fuchsia", "furnishes", "meld", "sturdily", "smear", "inch", "mid-back", "chin-length", "smudge", "alredy", "not cheaply", "long-wearing", "eyeline", "texture", "steady", "no-name", "audacious", "easy", "edgy", "is:A", "marketers", "greys", "decadent", "applicable", "Crease-free", "magenta", "free", "itIn", "stay-true", "racy", "application", "glides", "smooth", "sleek", "taupe", "grainy", "dark", "wealthy", "JP7506CF", "gray", "grayish", "width", "newness", "purfumes", "Lancme", "blackish", "easily", "doesn't smudge", "maroon", "blend", "convenient", "smoother", "Moschino", "long-wear", "mauve", "medium-length", "no raccoon", "revamp", "demure", "richly", "white", "brand", "offers", "lenght", "soft", "doesn't smear", "provide", "provides", "unusable", "eye-liner", "unopened", "straightforward", "silky-smooth", "uniting", "compactness", "bold", "fearless", "mix", "indulgent", "brash", "serviceable", "unmarked", "not musky", "constructed", "racoon", "smoothly", "sealant", "merged", "boldness", "reuse", "unused", "long", "Kors", "effortless", "luscious", "stain", "rich", "discard", "richness", "opulent", "short", "consistency", "fine", "sents", "newfound", "fade-resistant", "mixture", "hue", "sassy", "apply", "fragnance", "heathy", "adventurous", "not enthusiastic", "longwearing", "fregrance", "non-waterproof", "empty", "lashline", "simple", "newly", "you'r", "combined", "no musk", "mingle", "waterproof", "painless", "pinkish", "thickness", "clump-free", "gos", "consistant", "color", "smoothness", "name-brand", "new", "smudgeproof", "yaaay", "water-proof", "eyemakeup", "not instant", "spidery", "furnish", "tint", "product", "reapply", "not black", "no globs", "imitators", "blot", "cinch", "uncomplicated", "untouched", "length"], "related_concepts": ["eyeliner", "no goofs", "doesn't smear", "pen", "hundreds"]}
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "130414643X"}}
{"title": "Goodskin Labs Eyliplex-2 Eye Life and Circle Reducer - 10ml", "text": "Eyliplex-2 is a dual solution that focuses on the problematic eye area. This breakthrough, 24-hour system from the scientists at good skin pharmacy visibly tightens eye areas while reducing dark circles. 0.34 oz. each. 64% of subjects reported younger looking eyes immediately and a 20% reduction in the appearance of dark circles in clinical studies.", "primary_concepts": ["coloration", "Laboratories", "oncology", "cornea", "undereye", "eye", "immediately", "☆", "teen", "dry-skin", "good", "eyelids", "puffiness", "behold", "research", "temperamental", "dermatological", "breakthrough", "study", "store", "nice", "lasik", "instantaneously", "teenaged", "multi", "rheostat", "dermatology", "chemist", "invisibly", "PhD", "pharmacy", "alredy", "not cheaply", "optional", "pharmacist", "Obagi-C", "topic", "supermarket", "reversible", "studies", "Younger", "medically", "report", "thermo", "tightness", "dual", "eliminate", "researcher", "Minimization", "cutaneous", "hydration", "O2", "taupe", "increase", "moisturization", "dark", "preliminary", "excellent", "Quad", "well", "appearance", "dusky", "quickly", "instantly", "CVS", "Dermal", "great", "revolutionary", "biologist", "epidermis", "blackish", "disclosed", "problem", "youngsters", "murky", "scientific", "teenager", "oz", "dark circles", "clinically", "emphasis", "absorption", "skin", "loosen", "intractable", "technological", "reduction", "clinician", "nutritional", "forthwith", "grocer", "scientifically", "swiftly", "examination", "state-of-the-art", "not acne prone", "zone", "decrease", "younger-looking", "excellently", "troublesome", "system", "radius", "tighten", "FDA", "decent", "noticeably", "WD-40", "clearer", "scientist", "saggy", "significantly", "improvement", "Teamine", "interchangeable", "visible", "visable", "no fine line", "shortly", "minimize", "survey", "problematic", "young", "glance", "racoon", "vicinity", "youthful", "exacerbated", "focal", "region", "groundbreaking", "reddish", "focus", "reduce", "increments", "nad", "fasten", "area", "soon", "complexion", "squinting", "look", "grocery", "eyliplex-2", "Eyliplex-2", "subsequently", "even-toned", "bothersome", "eyes", "mitigate", "markedly", "philosophy:you", "difficult", "darkish", "bluish", "satisfactory", "darken", "epidermal", "lessen", "appearence", "ocular", "ergonomically", "diminished", "progression", "purplish", "sun-damaged", "Cellex-C", "visibly", "diagnosis", "drugstore", "under-eye", "apothecary", ":-D", "terrific", "clinical", "oz.", "Endocrinology", "time-released", "Nouriva", "tight", "adolescent", "subject", "eyeballs", "sking", "Pro-Retinol", "aggravate", "younger", "shortcomings", "solution", "assess", "promptly", "teenage", "Kinetin", "24-hour", "Mart", "youth", "visibility", "scientists", "taut", "better", "eyesight", "no dark circles", "not reduce", "photoaging", "Pending"], "related_concepts": ["A22", "A82", "Amazon", "daytime", "HK", "nighttime", "smell", "dark circles", "purchased"]}
{"index": {"_index": "amzbeauty", "_type": "_doc", "_id": "1304146537"}}
Luminoso 使用来自它开发的 ConceptNet 的词嵌入,该技术超越了 ConceptNet 为您提供的技术。我有偏见,但每次我通过它运行数据时,我都会感到惊讶。不是免费的,但它确实适用于对数据进行绝对零预训练,实际上没有什么是免费的。