【发布时间】:2019-01-12 08:32:24
【问题描述】:
我有两个字符串列表,并且想搜索带有文本的列,以将一个字符串中的项目替换为第二个字符串中的项目。第二个字符串与第一个字符串相同,但包含 HTML 格式的标签。
我编写了一个小函数,尝试为第一个列表中的每个项目grep,同时替换另一个,但效果不佳。我也尝试过str_replace 无济于事。
top_attribute_names<- c("Item Number \\(DPCI\\)", "UPC", "TCIN", "Product Form", "Health Facts",
"Beauty Purpose", "Package Quantity", "Features", "Suggested Age",
"Scent")
top_attributes_html<-ifelse(nchar(top_attribute_names)<30,paste("<b>",top_attribute_names,"</b>",sep=""),top_attribute_names) # List adding bold HTML tags for all strings with under 30 char
clean_free_description<-
c("Give your feathered friends a cozy new home with the Ceramic and Wood Birdhouse from Threshold. This simple birdhouse features a natural color scheme that helps it blend in with the tree you hang it from. The ceramic top is easy to remove when you want to clean out the birdhouse, while the small round hole lets birds in and keeps predators out. Sprinkle some seeds inside and watch your bird buddies become more permanent residents of your backyard.\nMaterial: Ceramic, Wood\nDimensions (Overall): 7.7 inches (H) x 8.5 inches (W) x 8.5 inches (L)\nWeight: 2.42 pounds\nAssembly Details: No assembly requiredpets subtype: Bird houses\nProtective Qualities: Weather-resistant\nMount Type: Hanging\nTCIN: 52754553\nUPC: 490840935721\nItem Number (DPCI): 084-09-3572\nOrigin: Imported\n",
"House your parakeets in style with this Victorian-style bird cage. Featuring multiple colors and faux brickwork, the cage serves as a charming addition to your dcor. It's also equipped with two perches and feeding dishes, making it instantly functional.\nMaterial: Steel, Plastic\nDimensions (Overall): 21.5 inches (H) x 16.0 inches (W) x 16.0 inches (L)\nWeight: 15.0 pounds\nMaterial: Metal (Frame)\nIntended Pet Type: Bird\nIncludes: Feeding Dish, perch\nAssembly Details: Assembly required, no tools needed\nPets subtype: Bird cages\nBreed size: Small (0-25 pounds)\nSustainability Claims: Recyclable\nWarranty: 90 day limited warranty. To obtain a copy of the manufacturer's warranty for this item, please call Target Guest Services at 1-800-591-3869.\nWarranty Information:To obtain a copy of the manufacturer's warranty for this item, please call Target Guest Services at 1-800-591-3869.\nVictorian-style parakeet cage with 2 perches\nFeatures a molded base, a single front door and faux plastic brickwork\nMade of wire and plastic; 5/8\" spacing\nWash with soap and water18\nLx25.5\nHx18\nW\"TCIN: 10159211\nUPC: 048081002940\nItem Number (DPCI): 083-01-0167\n",
"The Cockatiel Scalloped Top Bird Cage Kit is an ideal starter kit for cockatiels and other medium sized birds. Designer white scalloped style cage features large front door, easy to clean pull out tray, food and water dishes, wooden perches and swing. To help welcome and pamper your new bird, this starter kit also includes perch covers, kabob bird toy, cuttlebone, flavored mineral treat and a cement perch. Easy to assemble.\nMaterial: Metal\nDimensions (Overall): 27.25 inches (H) x 14.0 inches (W) x 18.25 inches (L)\nWeight: 11.0 pounds\nMaterial: Metal (Frame)\nIntended Pet Type: Bird\nPets subtype: Bird cages\nBreed size: All sizes\nTCIN: 16707833\nUPC: 030172016240\nItem Number (DPCI): 083-01-0248\n")
for(i in top_attribute_names){
clean_free_description[grepl(i, clean_free_description)] <- top_attributes_html[i]
}
理论上,我认为我也可以使用str_replace 来做到这一点:
clean_free_description<-str_replace(clean_free_description,top_attribute_names,top_attributes_html)
但是,这会产生错误:
在 stri_replace_first_regex(string, pattern, fix_replacement(replacement), : 较长的对象长度不是较短对象长度的倍数
当然,我确信有一个更好的解决方案可以添加 HTML 标记,通过匹配正则表达式中的字符串并添加文本包装器来消除一个步骤。不幸的是,我在 Regex 方面还不够好,还没有弄清楚这一点。
【问题讨论】:
-
我认为使用不同的结构可能会更好。除非您需要将每个项目的所有信息都放在一个字符串中,否则我认为这作为嵌套列表是有意义的,其中列表中的每个项目都有自己的属性,例如项目编号,包装数量等。您可以拆分字符串来构建这样的结构。不过,请随意忽略,因为我的建议与您的具体问题有所不同。
标签: r