【问题标题】:Text summary in R for multiple rowsR中的多行文本摘要
【发布时间】:2020-10-03 15:39:47
【问题描述】:

我有一组短文本文件,我可以将它们组合成一个数据测试,这样每个文件都排成一行。

我正在尝试使用通用函数参数 genericSummary(text,k,split=c(".","!","?"),min=5,breakdown=FALSE,...) 使用 LSAfun 包来总结内容

这对于单个文本输入非常有效,但在我的情况下却不行。在包说明中,它说文本输入应该是“长度(文本)= 1 的字符向量,指定要总结的文本”。

请看这个例子

# Generate a dataset example (text examples were copied from wikipedia): 
 
dd = structure(list(text = structure(1:2, .Label = c("Forest gardening, a forest-based food production system, is the world's oldest form of gardening.[1] Forest gardens originated in prehistoric times along jungle-clad river banks and in the wet foothills of monsoon regions. In the gradual process of families improving their immediate environment, useful tree and vine species were identified, protected and improved while undesirable species were eliminated. Eventually foreign species were also selected and incorporated into the gardens.[2]\n\nAfter the emergence of the first civilizations, wealthy individuals began to create gardens for aesthetic purposes. Ancient Egyptian tomb paintings from the New Kingdom (around 1500 BC) provide some of the earliest physical evidence of ornamental horticulture and landscape design; they depict lotus ponds surrounded by symmetrical rows of acacias and palms. A notable example of ancient ornamental gardens were the Hanging Gardens of Babylon—one of the Seven Wonders of the Ancient World —while ancient Rome had dozens of gardens.\n\nWealthy ancient Egyptians used gardens for providing shade. Egyptians associated trees and gardens with gods, believing that their deities were pleased by gardens. Gardens in ancient Egypt were often surrounded by walls with trees planted in rows. Among the most popular species planted were date palms, sycamores, fir trees, nut trees, and willows. These gardens were a sign of higher socioeconomic status. In addition, wealthy ancient Egyptians grew vineyards, as wine was a sign of the higher social classes. Roses, poppies, daisies and irises could all also be found in the gardens of the Egyptians.\n\nAssyria was also renowned for its beautiful gardens. These tended to be wide and large, some of them used for hunting game—rather like a game reserve today—and others as leisure gardens. Cypresses and palms were some of the most frequently planted types of trees.\n\nGardens were also available in Kush. In Musawwarat es-Sufra, the Great Enclosure dated to the 3rd century BC included splendid gardens. [3]\n\nAncient Roman gardens were laid out with hedges and vines and contained a wide variety of flowers—acanthus, cornflowers, crocus, cyclamen, hyacinth, iris, ivy, lavender, lilies, myrtle, narcissus, poppy, rosemary and violets[4]—as well as statues and sculptures. Flower beds were popular in the courtyards of rich Romans.", 
"The Middle Ages represent a period of decline in gardens for aesthetic purposes. After the fall of Rome, gardening was done for the purpose of growing medicinal herbs and/or decorating church altars. Monasteries carried on a tradition of garden design and intense horticultural techniques during the medieval period in Europe. Generally, monastic garden types consisted of kitchen gardens, infirmary gardens, cemetery orchards, cloister garths and vineyards. Individual monasteries might also have had a \"green court\", a plot of grass and trees where horses could graze, as well as a cellarer's garden or private gardens for obedientiaries, monks who held specific posts within the monastery.\n\nIslamic gardens were built after the model of Persian gardens and they were usually enclosed by walls and divided in four by watercourses. Commonly, the centre of the garden would have a reflecting pool or pavilion. Specific to the Islamic gardens are the mosaics and glazed tiles used to decorate the rills and fountains that were built in these gardens.\n\nBy the late 13th century, rich Europeans began to grow gardens for leisure and for medicinal herbs and vegetables.[4] They surrounded the gardens by walls to protect them from animals and to provide seclusion. During the next two centuries, Europeans started planting lawns and raising flowerbeds and trellises of roses. Fruit trees were common in these gardens and also in some, there were turf seats. At the same time, the gardens in the monasteries were a place to grow flowers and medicinal herbs but they were also a space where the monks could enjoy nature and relax.\n\nThe gardens in the 16th and 17th century were symmetric, proportioned and balanced with a more classical appearance. Most of these gardens were built around a central axis and they were divided into different parts by hedges. Commonly, gardens had flowerbeds laid out in squares and separated by gravel paths.\n\nGardens in Renaissance were adorned with sculptures, topiary and fountains. In the 17th century, knot gardens became popular along with the hedge mazes. By this time, Europeans started planting new flowers such as tulips, marigolds and sunflowers."
), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))


# This code is trying to generate the summary into another column:

dd$sum = genericSummary(dd$text,k=1) 


这会产生错误Error in strsplit(text, split = split, fixed = T) : non-character argument

我认为这是由于使用了变量而不是单个文本

我的预期输出是为每一行生成的摘要位于名为 dd$sum 的相应第二列中

我尝试使用as.vector(dd$text),但这不起作用。 (感觉还是把输出合并成一行)。

我试图从 purrr 中阅读一些关于 map 函数的信息,但在这种情况下无法应用它,我想知道是否有 r 编程经验的人可以提供帮助。

此外,如果您知道使用文本摘要包(例如lexrankr)来完成此部分的方法,这也可以。我从这里尝试了他们的代码,但仍然无法正常工作。 Text summarization in R language

谢谢

【问题讨论】:

    标签: r vector purrr sapply lsa


    【解决方案1】:

    检查class(dd$text)。这是一个因素,而不是一个字符。

    以下作品:

    library(dplyr)
    library(purrr)
    dd %>% 
      mutate(text = as.character(text)) %>%
      mutate(sum = map(text, genericSummary, k = 1))
    

    【讨论】:

    • 谢谢。如果您有时间,您能否使用 lexRankr 帮助检查此代码,找出它为什么不起作用? dd %>% mutate(text = as.character(text)) %>% mutate (top_3 = map (lexRankr::lexRank(text, docId = rep(1, length(text)), n = 3, Continuous = TRUE )) %>% mutate(order_of_appearance = map(order(as.integer(gsub("_","",top_3$sentenceId))))) %>% mutate(ordered_top_3 = map(top_3[order_of_appearance, "sentence" ]))
    • @Bahi8482 purrr::map的正确用法是map(data, function, args to function),或者map(data, anonymous function)。当您需要多次引用 args 中的其他内容或数据时,后者在这里会更好。所以对于你的第一个,你想要:mutate(top_3 = map(text, function (x) lexRanr::laxRank(x, docId = rep(1, length(x)), n = 3, continuous = TRUE)).
    • 感谢您的快速回复。我相信根据您的评论,我的第一个是正确的。即使在我尝试更改它们之后,第二个和第三个也不起作用。参数的顺序不正确吗? dd %>% mutate(text = as.character(text)) %>% mutate(top_3 = map(text, function (x) lexRankr::lexRank(x, docId = rep(1, length(x)), n= 3, continuous = TRUE))) %>% mutate(order_of_appearance = map(top_3$sentenceId, function(x) order(as.integer(gsub("_","", x))))) %>% mutate(ordered_top_3 = map(order_of_appearance, function (x) top_3[x, "sentence"]))@Ben
    猜你喜欢
    • 1970-01-01
    • 2016-07-04
    • 1970-01-01
    • 1970-01-01
    • 2016-10-21
    • 1970-01-01
    • 1970-01-01
    • 2021-01-02
    • 1970-01-01
    相关资源
    最近更新 更多