【问题标题】:Python searching a datasetPython 搜索数据集
【发布时间】:2013-04-15 17:26:37
【问题描述】:

我有这个作为家庭作业的问题,不知道该怎么做。

首先,我得到了一个包含员工姓名、地址、电子邮件等列表的数据集,总共约有 50 名员工。

您被要求编写一份申请,以提供有关员工的信息。您的程序应该提示用户输入搜索条件。任何符合搜索条件的员工都应按以下格式打印到屏幕上:

Position Designation Room and Extension Name and Email Address
(列是制表符分隔的)

匹配信息............
您将不得不修改数据集以进行处理,并且您可以选择将其保存在单独的文件中,尽管这不是必需的。你的程序应该满足某些约束:

  • 您应该将数据集中的每一列与搜索条件进行比较。
  • 比较不应区分大小写。
  • 除电子邮件地址外,所有输出均应以首字母大写。
  • 如果找到匹配项,则应打印结果行并且列应全部对齐。
  • 如果不匹配,则应打印一条不带标题行的消息。

您应该保存 (1) 您的程序和 (2) 一段解释您如何完成数据集处理的段落。

您还应该在您的应用程序上运行这些测试用例:

  • 搜索“布伦达”
  • 搜索所有文员。
  • 搜索“BredNa”
  • 找到卡尔博士的位置
  • Neil 在哪个办公室?

那么,首先,我应该如何读取这个数据集?我应该将它作为文本文件读入还是创建一个元组、字典?等等


staff = [['prof.liam maguire','head of school','academic','MS127','75605','lguire@ulster.ac.uk'],
 ['prof. martin McGinnity','director of intelligent systems research centre','academic','MS112','75616','tinnity@ulster.ac.uk'],
 ['dr laxmidhar Behera','reader','academic','MS107','75276','lra@ulster.ac.uk'],
 ['dr girijesh Prasad','professor','academic','MS137','75645','gad@ulster.ac.uk'],
 ['dr kevin Curran','senior lecturer','academic','MS130','75565','krran@ulster.ac.uk'],
 ['mr aiden McCaughey','Senior Lecturer','academic','MG126','75131','aughey@ulster.ac.uk'],
 ['dr tom Lunney','postgraduate courses co-ordinator (Senior Lecturer)','academic','MG121D','75388','tfney@ulster.ac.uk'],
 ['dr heather Sayers','undergraduate courses','co-ordinator (Senior Lecturer)','academic','MG121C','75148','hmyers@ulster.ac.uk'],
 ['dr liam Mc Daid','senior lecturer','academic','MS016','75452','ljid@ulster.ac.uk'], 
['mr derek Woods','senior lecturer','academic','MS134','75380','dnoods@ulster.ac.uk'],
 ['dr ammar Belatreche','lecturer','academic','MS104','75185','aatreche@ulster.ac.uk'],
 ['mr michael Callaghan','lecturer','academic','MS132','75771','mjllaghan@ulster.ac.uk'],
 ['dr sonya Coleman','lecturer','academic','MS133','75030','saeman@ulster.ac.uk'],
 ['dr joan Condell','lecturer','academic','MS131','75024','jdell@ulster.ac.uk'],
 ['dr damien Coyle','lecturer','academic','MS103','75170','dhle@ulster.ac.uk'],
 ['mr martin Doherty','lecturer','academic','MG121A','75552','merty@ulster.ac.uk'],
 ['dr jim Harkin','lecturer','academic','MS108','75128','jgrkin@ulster.ac.uk'],
 ['dr yuhua Li','lecturer','academic','MS106','75528','yi@ulster.ac.uk'],
 ['dr sandra Moffett','lecturer','academic','MS015','75381','soffett@ulster.ac.uk'],
 ['mrs mairin Nicell','lecturer','academic','MG127','75007','micell@ulster.ac.uk'],
 ['mrs maeve Paris','lecturer','academic','MG040','75212','m@ulster.ac.uk'],
 ['dr jose Santos','lecturer','academic','MG035','75034','jantos@ulster.ac.uk'],
 ['dr nH. Siddique','lecturer','academic','MG037','75340','nhique@ulster.ac.uk'],
 ['dr zumao Weng','lecturer','academic','MG050','75358','zmng@ulster.ac.uk'],
 ['dr shane Wilson','lecturer','academic','MG038','75527','s.on@ulster.ac.uk'],
 ['dr caitriona carr','computing and Technical Support','MG121B','75003','crr@ulster.ac.uk'],
 ['mr neil McDonnell','technical Services Supervisor','computing and Technical Support','MS030 / MF143','75360','ndonnell@ulster.ac.uk'],
 ['mr paddy McDonough','technical Services Engineer','computing and Technical Support','MS034','75322','p.ugh@ulster.ac.uk'],
 ['mr bernard McGarry','network Assistant','computing and Technical Support','MG132','75644','bgrry@ulster.ac.uk'],
 ['mr stephen Friel','secretary','clerical staff','MG048','75148','siel@ulster.ac.uk'],
 ['ms emma McLaughlin','secretary','clerical staff','MG048','75153','eughlin1@ulster.ac.uk'],
 ['mrs. brenda Plummer','secretary','clerical staff','MS126','75605','blmmer@ulster.ac.uk'],
 ['miss paula Sheerin','secretary','clerical staff','MS111','75616','perin@ulster.ac.uk'],
 ['mrs michelle Stewart','secretary','clerical staff','MG048','75382','mwart@ulster.ac.uk']]


matches = []

criterion = input ("please enter search criterion: ")
 criterion = criterion.lower()

for person in staff:
 for characteristic in person:
 if characteristic in person:
 if criterion in characteristic:
 matches.append(person)
 break
 if len(matches) == 0:
 print("No Match")
 else:
  print("POSITION |||DESIGNATION ||| EXT & ROOM NO||| NAME & EMAIL")
 for i in matches:
 print (i[1].title(),': ',i[2].title(),':',i[3].upper()+ i[4],':',i[0].title(), i[5].title())`

这是我到目前为止提出的,并且似乎有效,您有什么改进吗?

【问题讨论】:

  • 数据集是什么格式的?你能提供一个样本条目吗?另外,到目前为止,您尝试过什么?

标签: dataset python-3.x


【解决方案1】:

感谢您诚实地告诉我们这是一道作业题。 StackOverflow 不鼓励直接给出家庭作业问题的答案,但我们可以引导您找到正确的答案。

关于“修改数据集以进行处理”:这意味着数据当前的格式不一致。您需要做的第一件事是查看您获得的数据,并确定数据的最佳表示形式。

我建议使用以制表符分隔的列式数据文件 - 通过将数据放入电子表格并将其保存为文本,可以在 Microsoft Excel 中轻松创建。 (Excel 会抱怨它会丢失所有使它成为电子表格而不是文本文件的各种东西,但这没关系 - 你想要一个文本文件。)保存更新的文件。

Excel 生成了所谓的制表符分隔的文本文件:一个二维数据网格(如电子表格的形状),每行用一行数据表示(重新表述,换行符 用于分隔数据行,文本编辑器将其解释为开始在新行上写入的命令)和制表符(在 Python 中以\t 转义字符串编写,但实际上是它自己的单个字符)分隔每行中的单元格。这也称为 制表符分隔值,或 TSV。密切相关的是 逗号分隔值,或 CSV,它是 Excel 中的另一个选项。 CSV 也可以代表 character-separated values,它是任何文本文件的通用术语,通过使用某些字符来表示数据网格(',' 表示逗号分隔,'\t' 表示制表符-separated) 来分隔记录。

CSV 是一种非常常见的文件格式,因此 Python 随时准备为您提供帮助。 Python 有a library, csv,旨在为您读取这些文件。如果您使用 Excel 文本格式,则需要告诉它您的 dialectexcel-tab,因为这表示 Excel 输出文件时以制表符分隔的文件。

您需要构造一个csv.reader 来读取您的格式化数据文件。使用您放置列的顺序来了解您在一次读取 CSV 时获得的列表 - 列的顺序和每行中项目的顺序是相同的,因此使用该信息来索引正确进入列表以查找每个字段。

读完一行后,你想用它做什么?

您可以在程序中选择存储格式:

  • 将每条记录保存到一个列表中(就像一个列表列表,因为每条记录都像一个列表一样)。现在它已加载,当您想要搜索它时,您遍历整个列表列表并使用相等测试来查找匹配项。这可以通过列表理解来完成,这几乎肯定是您的老师正在寻找的。​​li>
  • 此外,为文件的每一列创建一个字典,并将每条记录存储在每个字典中:每个字典将该列值映射到您的键。这里有一个问题!一个字典只能为每个键存储一个记录,但是在同一个“名称”中肯定会有不同的人员(多个教授,多个文职人员等),并且无法确定没有两个人会拥有同样的名字。您的索引字典必须自己存储记录列表,而不仅仅是单个记录。

第二种方法对于重复查询要快得多,因为您在一开始就组织了所有记录以进行快速查找。然而,第一个更容易实现,并且更有可能是您的老师所期望的。我建议实施第一个,理解它,然后如果你有时间,实施第二个。

当然,所有这一切的用户界面都取决于您,但这应该能让您顺利实现程序的核心。祝你好运。

【讨论】:

    【解决方案2】:

    我会这样做:

    staff_details = [["Prof. Liam Maguire","Head of School","Academic","MS127","75605","lp.maguire@ulster.ac.uk"],
                     ["Prof. Martin McGinnity","Director of Intelligent Systems Research Centre","Academic","MS112","75616","tm.mcginnity@ulster.ac.uk"],
                     ["Dr Laxmidhar Behera","Reader","Academic","MS107","75276", "l.behera@ulster.ac.uk"],
                     ["Dr  Girijesh Prasad","Professor","Academic","MS137","75645","g.prasad@ulster.ac.uk"],
                     ["Dr  Kevin Curran","Senior Lecturer","Academic","MS130","75565","kj.curran@ulster.ac.uk"],
                     ["Mr Aiden McCaughey","Senior Lecturer","Academic","MG126","75131","a.mccaughey@ulster.ac.uk"],
                     ["Dr Tom Lunney","Postgraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121D","75388","tf.lunney@ulster.ac.uk"],
                     ["Dr Heather Sayers","Undergraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121C","75148","hm.sayers@ulster.ac.uk"],
                     ["Dr  Liam Mc Daid","Senior Lecturer","Academic","MS016","75452","lj.mcdaid@ulster.ac.uk"],
                     ["Mr Derek Woods","Senior Lecturer","Academic","MS134","75380","dn.woods@ulster.ac.uk"],
                     ["Dr Ammar Belatreche","Lecturer","Academic","MS104","75185","a.belatreche@ulster.ac.uk"],
                     ["Mr Michael Callaghan","Lecturer","Academic","MS132","75771","mj.callaghan@ulster.ac.uk"],
                     ["Dr  Sonya Coleman","Lecturer","Academic","MS133","75030","sa.coleman@ulster.ac.uk"],
                     ["Dr  Joan Condell","Lecturer","Academic","MS131","75024","j.condell@ulster.ac.uk"],
                     ["Dr Damien Coyle","Lecturer","Academic","MS103","75170","dh.coyle@ulster.ac.uk"],
                     ["Mr Martin Doherty","Lecturer","Academic","MG121A","75552","m.doherty@ulster.ac.uk"],
                     ["Dr  Jim Harkin","Lecturer","Academic","MS108","75128","jg.harkin@ulster.ac.uk"],
                     ["Dr Yuhua Li","Lecturer","Academic","MS106","75528","y.li@ulster.ac.uk"],
                     ["Dr  Sandra Moffett","Lecturer","Academic","MS015","75381","sm.moffett@ulster.ac.uk"],
                     ["Mrs Mairin Nicell","Lecturer","Academic","MG127","75007","ma.nicell@ulster.ac.uk"],
                     ["Mrs Maeve Paris","Lecturer","Academic","MG040","75212","m.paris@ulster.ac.uk"],
                     ["Dr Jose Santos","Lecturer","Academic","MG035","75034","ja.santos@ulster.ac.uk"],
                     ["Dr  NH. Siddique","Lecturer","Academic","MG037","75340","nh.siddique@ulster.ac.uk"],
                     ["Dr  Zumao Weng","Lecturer","Academic","MG050 ","75358","zm.weng@ulster.ac.uk"],
                     ["Dr  Shane Wilson","Lecturer","Academic","MG038","75527","s.wilson@ulster.ac.uk"],
                     ["Dr Caitriona Carr","Technical Services Engineer","Computing and Technical Support","MG121B","75003","c.carr@ulster.ac.uk"],
                     ["Mr Neil McDonnell","Technical Services Supervisor","Computing and Technical Support","MS030 / MF143","75360", "n.mcdonnell@ulster.ac.uk"],
                     ["Mr Paddy McDonough","Technical Services Engineer","Computing and Technical Support","MS034","75322","p.mcdonough@ulster.ac.uk"],
                     ["Mr Bernard McGarry","Network Assistant","Computing and Technical Support","MG132","75644","bg.mcgarry@ulster.ac.uk"],
                     ["Mr Stephen Friel","Secretary","Clerical Staff","MG048","75148","s.friel@ulster.ac.uk"],
                     ["Ms Emma McLaughlin","Secretary","Clerical Staff","MG048","75153","e.mclaughlin1@ulster.ac.uk"],
                     ["Mrs. Brenda Plummer","Secretary","Clerical Staff","MS126","75605","bl.plummer@ulster.ac.uk"],
                     ["Miss Paula Sheerin","Secretary","Clerical Staff","MS111","75616","p.sheerin@ulster.ac.uk"],
                     ["Mrs Michelle Stewart","Secretary","Clerical Staff","MG048","75382","m.stewart@ulster.ac.uk"]]
    
    search_result = []
    
    search_input = input ("Please enter a search criterion: ")
    search_input = search_input.title()
    
    for person in staff_details:
        for characteristic in person:
     if characteristic in person:
         if search_input in characteristic:
                 search_result.append(person)
                  break
    
    if len(search_result) == 0:
        print ("No staff members match your search criterion of ->", search_input)
    
    
    else:
        print("We have a match!")
        print ("{0:<30} {1:<40} {2:<40} {3:<50}".format("Position:", "Designation:", "Room and Extension:", "Name and Email:"))
        print ("-" * 160)
    
    for align in search_result:
        print("{0:<30} {1:<40} {2:<40} {3:<50}".format((align[1]), (align[2]), (align[3] + ", Ext:" + align[4]), align[0] + "(" + align[5] + ")"))
    

    希望对你有所帮助!

    【讨论】:

      【解决方案3】:

      我假设您将数据集作为纯文本文件(或电子邮件中的可复制文本等)然后您有 两个 几种选择:

      1. 创建一个文本文件,其中每一行以指定的格式存储有关一名员工的信息:“姓名”、“职位”等。在这种情况下,要进行搜索,您需要扫描文件并打印匹配的行,然后重复匹配的部分。

      2. 使用 Python 数据类型将信息存储在内存中,例如带有“名称”、“位置”等键的字典列表。然后搜索会变得有点复杂(真的有点复杂),但你可以用你喜欢的任何方式格式化输出. 但首先,您需要通过读取文本文件(或手动硬编码,如果您不顾一切)来填充数据列表。

      3. 您可以通过仅从文件的匹配行中形成一个字典来组合这些方法。

      4. 您可以使用真正的数据库引擎,例如 MySQL,但对于这项作业来说,这实在是太过分了。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-07-25
        • 2010-12-13
        • 2011-10-31
        • 1970-01-01
        • 1970-01-01
        • 2010-11-10
        • 1970-01-01
        • 2020-04-19
        相关资源
        最近更新 更多