【问题标题】:Introduction to Data Science in Python problemPython问题中的数据科学简介
【发布时间】:2019-05-07 09:46:55
【问题描述】:

谁能告诉我那部分 (town = thisLine[:thisLine.index('(')-1]) 到底是做什么的?

def get_list_of_university_towns():
'''Returns a DataFrame of towns and the states they are in from the 
university_towns.txt list. The format of the DataFrame should be:
DataFrame( [ ["Michigan", "Ann Arbor"], ["Michigan", "Yipsilanti"] ], 
columns=["State", "RegionName"]  )

The following cleaning needs to be done:
1. For "State", removing characters from "[" to the end.
2. For "RegionName", when applicable, removing every character from " (" to the end.
3. Depending on how you read the data, you may need to remove newline character '\n'. '''

data = []
state = None
state_towns = []
with open('university_towns.txt') as file:
    for line in file:
        thisLine = line[:-1]
        if thisLine[-6:] == '[edit]':
            state = thisLine[:-6]
            continue
        if '(' in line:
            town = thisLine[:thisLine.index('(')-1]
            state_towns.append([state,town])
        else:
            town = thisLine
            state_towns.append([state,town])
        data.append(thisLine)
df = pd.DataFrame(state_towns,columns = ['State','RegionName'])
return df

get_list_of_university_towns()

【问题讨论】:

  • 要格式化代码只需复制/粘贴,突出显示整个块并在编辑器中单击{}
  • these topics 中的任何一个都可以指导您编写对整个 SO 社区有用的问题。
  • 感谢您的建议并对错误表示歉意
  • @RahulAgarwal 是的,谢谢你的回答

标签: python pandas numpy data-science


【解决方案1】:

这一行完成了清理清单中要求 2 的部分:

例如:如果 Line 是:

line = "Michigan, (Ann Arbor"

那么你的代码会输出Michigan,

【讨论】:

  • 这只是一个示例,目的是返回城镇及其所在州的数据框
  • 你问过那条线会做什么......这就是它会做什么......最后一条语句 pf pd.Dataframe 将使您的最终 DF 完成所有清洁
【解决方案2】:

它执行这个步骤:

2. For "RegionName", when applicable, removing every character from " (" to the end.

-1 的索引表示数组或列表的结尾。

【讨论】:

    【解决方案3】:
    import re
    raw_data=open('university_towns.txt','r')
    data=raw_data.readlines()
    raw_data.close()
    subs='[edit]'
    state=''
    region=''
    df=pd.DataFrame(columns=('State','RegionName'))
    
    for line in data:
        line.rstrip()
        if subs in line:
            state=line.replace(subs,'')
        else:
            region=re.sub(r" \(.*",'',line)
            df=df.append({'State':state,'RegionName':region},ignore_index=True)
    
    df=df.replace('\n','',regex=True)
    df
    

    【讨论】:

    • 简单易行
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-08-14
    • 1970-01-01
    • 1970-01-01
    • 2021-05-24
    相关资源
    最近更新 更多