【问题标题】:Saving dictionary of tweets into JSON file results in an empty dictionary将推文字典保存到 JSON 文件会导致字典为空
【发布时间】:2015-06-15 23:43:55
【问题描述】:

我正在尝试收集一些本地化的推文,并将它们作为推文字典存储在我的硬盘上。在 fetchsamples 函数的某些迭代中,保存的字典被强制为空状态,尽管在 for 循环期间数据被添加到字典中(见下面的输出)。

我尝试了不同的编码或将“w”和“wb”标志传递给我的保存函数,但没有帮助。

我尝试使用随机字符串重现此内容(以便人们更轻松地检查我的代码),但我无法做到。我不确定推文结构或我的代码中的什么导致了这种行为。

注意:我添加了一个代码 sn-p 来捕捉字典被强制进入空状态以进行调试。

import oauth2 as oauth
import urllib2 as urllib
import json
import pickle
import os

api_key = "Insert api_key here"
api_secret = "Insert api_secret here"
access_token_key = "Insert access_token_key"
access_token_secret = "Insert access_token_secret"

_debug = 0

oauth_token    = oauth.Token(key=access_token_key, secret=access_token_secret)
oauth_consumer = oauth.Consumer(key=api_key, secret=api_secret)

signature_method_hmac_sha1 = oauth.SignatureMethod_HMAC_SHA1()

http_method = "GET"

http_handler  = urllib.HTTPHandler(debuglevel=_debug)
https_handler = urllib.HTTPSHandler(debuglevel=_debug)

def twitterreq(url, method, parameters):
    req = oauth.Request.from_consumer_and_token(oauth_consumer,
                                                token=oauth_token,
                                                http_method=http_method,
                                                http_url=url, 
                                                parameters=parameters)

    req.sign_request(signature_method_hmac_sha1, oauth_consumer, oauth_token)
    headers = req.to_header()

    if http_method == "POST":
        encoded_post_data = req.to_postdata()
    else:
        encoded_post_data = None
        url = req.to_url()

    opener = urllib.OpenerDirector()
    opener.add_handler(http_handler)
    opener.add_handler(https_handler)

    response = opener.open(url, encoded_post_data)

    return response

def fetchsamples():

    url = "https://stream.twitter.com/1/statuses/sample.json"
    url = "https://stream.twitter.com/1/statuses/filter.json?locations=-0.489,51.28,0.236,51.686"
    parameters = []
    response = twitterreq(url, "GET", parameters)

    data = {}
    count = 1
    for line in response:        
        try:
            strip = json.loads(line.strip())
            if strip['coordinates'] != None:
                data[count] = strip

                count += 1

                if count % 10 == 0: 
                    print count, len(data.keys())

        except Exception as e:
            # Print error and store in a log file
            print e            
            with open("/Temp/Data/error.log","w") as log:
                log.write(str(e))

        # If 100 tweets have passed save the file
        if count % 100 == 0:
            print "Before saving: ", len(data.keys())
            fp =  open("/Temp/Data/"+str(count/100)+".json","w")
            json.dump(data,fp,encoding="latin-1")
            fp.close()

            # This code is for debug purposes to catch when dictionary
            # when dictionary is forced into empty state
            if os.path.getsize("/Temp/Data/"+str(count/100)+".json") < 10:
                print "After saving: ", len(data.keys())
                return data
            else:
                data = {}

data = fetchsamples()

这会产生以下输出而没有错误。 data 字典为空。

100 99
Before saving:  99
110 10
120 20
130 30
140 40
150 50
160 60
170 70
180 80
190 90
200 100
Before saving:  100
Before saving:  0
After saving:  0

【问题讨论】:

    标签: python json python-2.7 twitter dictionary


    【解决方案1】:

    字典为空,因为在每 100 次迭代后,您要么设置 data={},要么字典已经为空。如果我理解正确,您将需要另一本字典,您永远不会清空它,并将项目也推送到该字典。

    import oauth2 as oauth
    import urllib2 as urllib
    import json
    import pickle
    import os
    
    api_key = "Insert api_key here"
    api_secret = "Insert api_secret here"
    access_token_key = "Insert access_token_key"
    access_token_secret = "Insert access_token_secret"
    
    _debug = 0
    
    oauth_token    = oauth.Token(key=access_token_key, secret=access_token_secret)
    oauth_consumer = oauth.Consumer(key=api_key, secret=api_secret)
    
    signature_method_hmac_sha1 = oauth.SignatureMethod_HMAC_SHA1()
    
    http_method = "GET"
    
    http_handler  = urllib.HTTPHandler(debuglevel=_debug)
    https_handler = urllib.HTTPSHandler(debuglevel=_debug)
    
    def twitterreq(url, method, parameters):
        req = oauth.Request.from_consumer_and_token(oauth_consumer,
                                                    token=oauth_token,
                                                    http_method=http_method,
                                                    http_url=url, 
                                                    parameters=parameters)
    
        req.sign_request(signature_method_hmac_sha1, oauth_consumer, oauth_token)
        headers = req.to_header()
    
        if http_method == "POST":
            encoded_post_data = req.to_postdata()
        else:
            encoded_post_data = None
            url = req.to_url()
    
        opener = urllib.OpenerDirector()
        opener.add_handler(http_handler)
        opener.add_handler(https_handler)
    
        response = opener.open(url, encoded_post_data)
    
        return response
    
    def fetchsamples():
    
        url = "https://stream.twitter.com/1/statuses/sample.json"
        url = "https://stream.twitter.com/1/statuses/filter.json?locations=-0.489,51.28,0.236,51.686"
        parameters = []
        response = twitterreq(url, "GET", parameters)
    
        data = {}
        allData = {}
        count = 1
        for line in response:        
            try:
                strip = json.loads(line.strip())
                if strip['coordinates'] != None:
                    data[count] = strip
                    allData[count] = strip
    
                    count += 1
    
                    if count % 10 == 0: 
                        print count, len(data.keys())
    
            except Exception as e:
                # Print error and store in a log file
                print e            
                with open("/Temp/Data/error.log","w") as log:
                    log.write(str(e))
    
            # If 100 tweets have passed save the file
            if count % 100 == 0:
                print "Before saving: ", len(data.keys())
                fp =  open("/Temp/Data/"+str(count/100)+".json","w")
                json.dump(data,fp,encoding="latin-1")
                fp.close()
    
                # Return data if the file is empty and stop
                if os.path.getsize("/Temp/Data/"+str(count/100)+".json") < 10:
                    print "After saving: ", len(data.keys())
                    return allData
                else:
                    data = {}
    
    data = fetchsamples()
    

    【讨论】:

    • 我想清空字典的原因是内存管理的原因。所以我需要找到一种方法来清空字典(在我将其内容保存到磁盘之后)。你的解决方案行不通,因为我很快就会耗尽内存。
    【解决方案2】:

    问题在于我增加count 值的方式。因为count 仅在strip["coordinates"] != None 增加,如果我收到一条推文,其中strip["coordinates"] == None 计数值不会增加,但data = {}count % 100 == 0 给出True,这意味着原始的非空文件被替换一个空的。

    解决方案是在保存后增加count,如下所示:

        if count % 100 == 0:
            print "Before saving: ", len(data.keys())
            fp =  open("/Temp/Data/"+str(count/100)+".json","w")
            json.dump(data,fp,encoding="latin-1")
            fp.close()
    
            count += 1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-12-15
      • 1970-01-01
      • 2017-07-14
      • 1970-01-01
      • 1970-01-01
      • 2016-03-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多