【问题标题】:Count the number of characters of the first, second and third word in a string计算字符串中第一个、第二个和第三个单词的字符数
【发布时间】:2019-04-26 17:29:45
【问题描述】:

我需要了解开发一个可以计算字符串中第二个和第三个单词的字符数的代码。

我得到了这个代码,但它只适用于第一个单词的字符数。

现在我只能使用 Spark SQL 或 dplyr 包。

这是我为第一个单词中的字符数所做的

INSTR(NAME_NORM_LONG,' ')-1)

预期的结果是计算字符并将结果显示在新列中。

word="hey I am Scott"

characters_word1 | characters_word2 | characters_word3 

          3               1                   2

现在我正在运行这段代码进行测试(试图找到第二个单词):

test_query<-test_query %>% 
mutate(Total_char=nchar(NAME_NORM_LONG))%>%
mutate(Name_has_numbers=str_detect(NAME_NORM_LONG,"[[:digit:]]"))%>%
mutate(number_words=LENGTH(NAME_NORM_LONG) - LENGTH(REPLACE(NAME_NORM_LONG, ' ', '')) + 1)%>%
mutate(number_chars_w1=INSTR(NAME_NORM_LONG,' ')-1)%>%
mutate(number_chars_w2=substr(NAME_NORM_LONG,number_chars_w1+1,LENGTH(NAME_NORM_LONG)))``` and the result I am having is this one ```test_query
# Source: spark<?> [?? x 7]
   PBIN0 NAME_NORM_LONG Total_char Name_has_numbers number_words number_chars_w1
   <int> <chr>               <int> <lgl>                   <dbl>           <dbl>
1 4.01e8 GM BUILDERS            11 FALSE                       2               2
# … with 1 more variable: number_chars_w2 <chr>
Warning messages:
1: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
  NAs introduced by coercion
2: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
  NAs introduced by coercion
3: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
  NAs introduced by coercion
4: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
  NAs introduced by coercion
5: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
  NAs introduced by coercion```

【问题讨论】:

  • 你在运行什么代码?你得到了什么结果?
  • @SolutionMill 我已经编辑了帖子,让我知道这有帮助

标签: dplyr apache-spark-sql sparklyr


【解决方案1】:

str_split()怎么样?

word="hey I am Scott"

word_list = stringr::str_split(word, " ")

n = length(word_list[[1]])
for (i in 1:n){
  first_row = paste0("characters_word", 1:n)
  second_row = sapply(word_list[[1]], nchar)
}

df = data.frame(first_row, second_row)

【讨论】:

  • 我可以这样做:ifelse((LENGTH(substring_index(NAME_NORM_LONG," ", 2))-1)- LENGTH(substring_index(NAME_NORM_LONG," ", 1))&lt;0,0, (LENGTH(substring_index(NAME_NORM_LONG," ", 2))-1)- LENGTH(substring_index(NAME_NORM_LONG," ", 1))))
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-10-24
  • 2020-06-21
  • 2021-04-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多