【问题标题】:3rd character index in a string字符串中的第三个字符索引
【发布时间】:2020-01-08 00:51:28
【问题描述】:

我在 python 中有一个字符串。从这个字符串,我想写一个函数,它返回整个字符串,直到(没有)第三个逗号。

import pandas as pd
import numpy as np

mystr = pd.Series(['culture clash, future, space war, space colony, society', 
'ocean, drug abuse, exotic island, east india, love, traitor])

def transform(s):
    index = 0
    count = 0
    while count < 3:
        index = s.str.find(',', index)        
        count = count+1
        index += 1
    return s.str[0:index-1]

out = transform(mystr)
out

这将返回 NaN。我想要:

  • '文化冲突、未来、太空战争'
  • '海洋,吸毒,异国岛屿'

谁能帮我解决这个问题?

【问题讨论】:

    标签: python string pandas indexing


    【解决方案1】:

    如果要考虑性能,列表理解会更快,因为 str 方法在 pandas 中很慢:

    pd.Series([','.join(i.split(',')[:3]) for i in mystr])
    #pd.Series(','.join(i.split(',')[:3]) for i in mystr)
    

    0    culture clash, future, space war
    1    ocean, drug abuse, exotic island
    

    %%timeit
    pd.Series(','.join(i.split(',')[:3]) for i in mystr)
    #111 µs ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    %%timeit
    mystr.apply(lambda x : ",".join(x.split(',')[:3]))
    #180 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %%timeit
    mystr.str.split(",").str[:3].apply(",".join)
    #505 µs ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    【讨论】:

      【解决方案2】:

      试试这个,

      >>> mystr = pd.Series(['culture clash, future, space war, space colony, society','ocean, drug abuse, exotic island, east india, love, traitor'])
      

      输出:

      >>> mystr.apply(lambda x : ",".join(x.split(',')[:3]))
      
      0    culture clash, future, space war
      1    ocean, drug abuse, exotic island
      dtype: object
      

      说明:

      • , 拆分,并通过像[:3] 一样切片来获取前三个单词,然后使用, 再次加入它们。

      【讨论】:

        【解决方案3】:

        使用str.split

        例如:

        import pandas as pd
        
        mystr = pd.Series(['culture clash, future, space war, space colony, society', 'ocean, drug abuse, exotic island, east india, love, traitor'])
        print(mystr.str.split(",").str[:3].apply(",".join))
        

        输出:

        0    culture clash, future, space war
        1    ocean, drug abuse, exotic island
        dtype: object
        

        【讨论】:

          猜你喜欢
          • 2012-12-30
          • 2023-04-07
          • 2020-06-21
          • 2021-04-09
          • 2017-09-21
          • 1970-01-01
          • 2011-05-29
          • 1970-01-01
          • 2016-04-13
          相关资源
          最近更新 更多