python外包面试题整理

1、请尽可能列举python列表的成员方法，并给出一下列表操作的答案：
（1） a=[1, 2, 3, 4, 5],

a[::2] = [1,3,5]
a[-2:] = [4,5]

（2）一行代码实现对列表a中的偶数位置的元素进行加3后求和？

from functools import reduce
l2 = reduce(lambda x, y: x + y, map(lambda i: l[i] + 3, list(filter(lambda y: y % 2 == 0, range(len(l))))))
l3 = sum(list(map(lambda i: l[i] + 3, list(filter(lambda y: y % 2 == 0, range(len(l)))))))
print(l2)
print(l3)

（3）将列表a的元素顺序打乱，再对a进行排序得到列表b，然后把a和b按元素顺序构造一个字典d。

from random import shuffle

a = [1, 2, 3, 4, 5]

# 打乱列表a的元素顺序
shuffle(a)

# 对a进行排序得到列表b
b = sorted(a, reverse=True)

# zip 并行迭代，将两个序列“压缩”到一起，然后返回一个元组列表，最后，转化为字典类型。
d = dict(zip(a, b))

print (d)

(4),List = [-2, 1, 3, -6]，如何实现以绝对值大小从小到大将 List 中内容排序。

sorted(list,key=abs)

(5)列表sort方法和sorted的区别是什么?

sort 是list的方法,改变list对象的顺序,返回值None
sorted是python的内置方法,适用iterable对象,返回值是新列表,不影响原来的iterable的顺序

2、用python实现统计一篇英文文章内每个单词的出现频率，并返回出现频率最高的前10个单词及其出现次数，并解答以下问题？（标点符号可忽略）
方法一:

# coding = utf -8
import re
 
with open("this.txt", "r", encoding="utf-8") as fd:
    word_list = []     # 存放所有单词，全部小写，并去除,.!等后缀，并去除空格字符串
    word_dict = {}     # 保留{word: count}键值对
    for line in fd.readlines():
        for word in line.strip().split(" "):
            word_list.append(re.sub(r"[.|!|,]", "", word.lower()))
    word_sets = list(set(word_list))   # 确保唯一
    word_dict = {word: word_list.count(word) for word in word_sets if word}
result = sorted(word_dict.items(), key=lambda d: d[1], reverse=True)[:10]
print(result)
备注: 遍历文件，用word_list保留所有的单词,用word_sets保存唯一的单词，方便word_dict来作为键。最后对字典排序，取出前10个，非常巧妙.

方法二:借助collections模块

# coding = utf -8
import re
from collections import Counter
 
with open("this.txt", "r", encoding="utf-8") as fd:
    texts = fd.read()                         # 将文件的内容全部读取成一个字符串
    count = Counter(re.split(r"\W+", texts))  # 以单词为分隔
 
result = count.most_common(10)                # 统计最常使用的前10个
print(result)

(1）创建文件对象f后，解释f的readlines和xreadlines方法的区别？

直接输出后,readlines结尾是\n,返回值readlines返回列表和xreadlines返回生成器

（2）追加需求：引号内元素需要算作一个单词，如何实现？

思路:以"分割，转换成列表，取其奇数分割，其偶数不做处理

3、简述python GIL的概念，以及它对python多线程的影响？编写一个多线程抓取网页的程序，并阐明多线程抓取程序是否可比单线程性能有提升，并解释原因。

Python语言和GIL没有半毛钱关系。仅仅是由于历史原因在Cpython虚拟机(解释器)，难以移除GIL。
GIL：全局解释器锁。每个线程在执行的过程都需要先获取GIL，保证同一时刻只有一个线程可以执行字节码。
线程释放GIL锁的情况：
在IO操作等可能会引起阻塞的system call之前,可以暂时释放GIL,但在执行完毕后,必须重新获取GIL
Python 3.x使用计时器（执行时间达到阈值后，当前线程释放GIL）或Python 2.x，tickets计数达到100

Python使用多进程是可以利用多核的CPU资源的。

多线程爬取比单线程性能有提升，因为遇到IO阻塞会自动释放GIL锁

因为抓取程序涉及到读取远程网站页面的操作，这个操作中从发出请求到获得内容是需要等待IO的；多线程程序可以利用这个时间进行其他操作，因此可以提高效率。

4、用python编写一个线程安全的单例模式实现。

import threading
import time


class Foo(object):
    _instance = None
    _lock = threading.RLock()

    def __new__(cls, *args, **kwargs):
        if cls._instance:
            return cls._instance
        with cls._lock:
            if not cls._instance:
                cls._instance = object.__new__(cls)
            return cls._instance


def task():
    obj = Foo()
    print(obj)


for i in range(10):
    t = threading.Thread(target=task)
    t.start()

time.sleep(100)
obj = Foo()

View Code