【问题标题】:Replace the string in pandas dataframe替换熊猫数据框中的字符串
【发布时间】:2021-09-07 17:42:16
【问题描述】:

我有以下数据框(df):

shape data
POINT POINT (4495 33442)
POLYGON POLYGON ((6324 32691, 6326 32691, 6330 32691, 6333 32693, 6332 32696, 6329 32700, 6328 32704, 6327 32707, 6325 32710, 6322 32713, 6319 32716, 6316 32719, 6313 32722, 6310 32725, 6307 32728, 6303 32728, 6299 32727, 6295 32727, 6291 32730, 6288 32733, 6285 32735, 6281 32735, 6277 32735, 6275 32732, 6274 32729, 6274 32725, 6272 32722, 6269 32720, 6265 32719, 6261 32719, 6258 32716, 6257 32712, 6259 32708, 6262 32705, 6265 32702, 6268 32701, 6272 32701, 6276 32701, 6279 32702, 6283 32702, 6287 32702, 6291 32699, 6294 32696, 6297 32693, 6300 32692, 6304 32692, 6308 32692, 6312 32692, 6316 32692, 6320 32693, 6324 32691))
POINT POINT (4673 33465)
POLYGON POLYGON ((5810 33296, 5813 33297, 5816 33299, 5819 33301, 5822 33303, 5826 33306, 5829 33307, 5833 33307, 5836 33308, 5837 33312, 5837 33316, 5836 33319, 5834 33323, 5832 33327, 5830 33330, 5828 33333, 5826 33336, 5824 33339, 5821 33342, 5817 33342, 5813 33341, 5808 33340, 5803 33339, 5800 33338))

我想将其转换为以下格式: if POINT then (4495, 33442) if POLYGON then [(5810, 33296), (5813, 33297), (5816, 33299), (5819, 33301), (5822, 33303), (5826, 33306), (5829, 33307), (5833, 33307), (5836, 33308), (5837, 33312), (5837, 33316), (5836, 33319), (5834) , 33323), (5832, 33327), (5830, 33330), (5828, 33333), (5826, 33336), (5824, 33339), (5821, 33342), (5817, 33342), (5813, 33341) ), (5808, 33340), (5803, 33339), (5800, 33338)]。我该怎么做?

到目前为止我尝试了什么?

op2=[]
for st, shape in zip(df['data'],df['shape']):
    if 'POINT' in shape:
        val=re.findall('\([0-9., ]+\)', st)[-1]
        op2.append("({})".format(", ".join(re.findall(r"\d+", val))))
        #op2_list = [ast.literal_eval(l) for l in op2]
        #poi = [Point(i).wkt for i in op2_list]
    else:  # Polygon
        val=re.findall('\([0-9., ]+\)', st)[-1]
        paran=val.replace(', ', '),(')
        fin=paran.replace(' ', ',')
        op2.append(fin)
        
data['converted']=pd.DataFrame(op2)   

期望的输出:

shape data converted
POINT POINT (4495 33442) (4495, 33442)
POLYGON POLYGON ((6324 32691, 6326 32691, 6330 32691, 6333 32693, 6332 32696, 6329 32700, 6328 32704, 6327 32707, 6325 32710, 6322 32713, 6319 32716, 6316 32719, 6313 32722, 6310 32725, 6307 32728, 6303 32728, 6299 32727, 6295 32727, 6291 32730, 6288 32733, 6285 32735, 6281 32735, 6277 32735, 6275 32732, 6274 32729, 6274 32725, 6272 32722, 6269 32720, 6265 32719, 6261 32719, 6258 32716, 6257 32712, 6259 32708, 6262 32705, 6265 32702, 6268 32701, 6272 32701, 6276 32701, 6279 32702, 6283 32702, 6287 32702, 6291 32699, 6294 32696, 6297 32693, 6300 32692, 6304 32692, 6308 32692, 6312 32692, 6316 32692, 6320 32693, 6324 32691)) [(6324, 32691), (6326, 32691), (6330, 32691), (6333, 32693), (6332, 32696), (6329, 32700), (6328, 32704), (6327, 32707), (6325, 32710), (6322, 32713), (6319, 32716), (6316, 32719), (6313, 32722), (6310, 32725), (6307, 32728), (6303, 32728), (6299, 32727), (6295, 32727), (6291, 32730), (6288 ,32733), (6285, 32735), (6281, 32735), (6277, 32735), (6275, 32732), (6274, 32729), (6274, 32725), (6272, 32722), (6269, 32720), (6265, 32719), (6261, 32719), (6258, 32716), (6257, 32712), (6259, 32708), (6262, 32705), (6265, 32702), (6268, 32701), (6272, 32701), (6276, 32701), (6279, 32702), (6283, 32702), (6287, 32702), (6291, 32699), (6294, 32696), (6297, 32693), (6300, 32692), (6304, 32692), (6308, 32692), (6312, 32692), (6316, 32692), (6320, 32693), (6324, 32691)]
POINT POINT (4673 33465) (4673, 33465)
POLYGON POLYGON ((5810 33296, 5813 33297, 5816 33299, 5819 33301, 5822 33303, 5826 33306, 5829 33307, 5833 33307, 5836 33308, 5837 33312, 5837 33316, 5836 33319, 5834 33323, 5832 33327, 5830 33330, 5828 33333, 5826 33336, 5824 33339, 5821 33342, 5817 33342, 5813 33341, 5808 33340, 5803 33339, 5800 33338)) [(5810, 33296), (5813, 33297), (5816, 33299), (5819, 33301), (5822, 33303), (5826, 33306), (5829, 33307), (5833, 33307), (5836, 33308), (5837, 33312), (5837, 33316), (5836, 33319), (5834, 33323), (5832, 33327), (5830, 33330), (5828, 33333), (5826, 33336), (5824, 33339), (5821, 33342), (5817, 33342), (5813, 33341), (5808, 33340), (5803, 33339), (5800, 33338)]

这不会转换多边形。我该怎么做?

【问题讨论】:

    标签: python pandas replace re parentheses


    【解决方案1】:

    此函数将正确格式化多边形字符串:

    def format_polygon(s):
        return [tuple([float(i) for i in x.split(" ")]) for x in s[10:-2].split(", ")]
    

    并且此代码将正确格式化点字符串:

    def format_point(s):
        return tuple([float(i) for i in s[7:-1].split(" ")])
    

    然后可以将它们应用于您的数据框,如下所示:

    df[df["shape"]=="POINT"]["data"] = df[df["shape"]=="POINT"]["data"].apply(lambda x: format_point(x))
    df[df["shape"]=="POLYGON"]["data"] = df[df["shape"]=="POLYGON"]["data"].apply(lambda x: format_polygon(x))
    

    【讨论】:

    • 非常感谢。但我收到以下错误ValueError: could not convert string to float:
    • @disukumo 对不起,我的代码出错了,我已经编辑它现在应该可以工作了
    猜你喜欢
    • 2018-08-01
    • 2017-09-09
    • 2017-07-08
    • 1970-01-01
    • 2022-10-13
    • 2018-09-24
    • 2018-02-11
    • 1970-01-01
    相关资源
    最近更新 更多