【发布时间】:2021-11-10 17:21:40
【问题描述】:
我正在使用 pymysql 来查询 mysql 数据库中的 mysql 表。 mysql 数据库位于远程 ubuntu 服务器上。代码运行良好,但最近出现错误:
(1114, "The table '/tmp/#sql1422a_19_46' is full")
我检查了 ubuntu 服务器上的磁盘空间(如下所示)。我的三个目录上有足够的可用存储空间。似乎其他一些目录可能已填满。我有每天运行的 cron 作业并更新 mysql 数据库中的其他表。他们还在更新就好了。有谁看到问题可能是什么并建议如何解决它?桌子不应该那么大。
代码:df -h
输出:
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 1.8M 3.2G 1% /run
/dev/nvme0n1p2 228G 215G 1.2G 100% /
tmpfs 16G 12K 16G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/nvme0n1p1 511M 6.7M 505M 2% /boot/efi
/dev/sda1 458G 73M 435G 1% /mnt/data/storage3_500gb
/dev/sdb1 916G 77M 870G 1% /mnt/data/storage1_1tb
/dev/nvme1n1 916G 77M 870G 1% /mnt/data/nvme0n1
/dev/sdc1 916G 77M 870G 1% /mnt/data/storage2_1tb
/dev/sdd1 916G 109G 761G 13% /mnt/data/sda
tmpfs 3.2G 0 3.2G 0% /run/user/1000
/dev/loop1 100M 100M 0 100% /snap/core/11420
/dev/loop0 100M 100M 0 100% /snap/core/11606
代码:
import pandas as pd
import numpy as np
import os
import re, dateutil.parser
#BeautifulSoup provide a model for the source HTML
# from bs4 import BeautifulSoup
import time
import decimal as dc
from bs4 import BeautifulSoup as bs
import time
import json
import urllib.parse
import requests
import datetime
import logging
import pymysql
import glob
from sqlalchemy import create_engine
# function to query mysql db and return dataframe of results
def mysql_query(user,password,database,host,query):
connection = pymysql.connect(user=user, password=password, database=database, host=host)
try:
with connection.cursor() as cursor:
query = query
df = pd.read_sql(query, connection)
logging.info('query succeeded: '+query)
# finally:
connection.close()
logging.info('close connection mysql')
return df
except Exception as err:
logger.error('query failed: '+query+' got error: '+str(err))
pass
# creating zillow_latest
zillow_latest_query="""with zillow_latest as
(
select
distinct
zpid,
last_updated,
first_value(providerListingId) over (partition by zpid order by last_updated desc) as providerListingId,
first_value(imgSrc) over (partition by zpid order by last_updated desc) as imgSrc,
first_value(hasImage) over (partition by zpid order by last_updated desc) as hasImage,
first_value(detailUrl) over (partition by zpid order by last_updated desc) as detailUrl,
first_value(statusType) over (partition by zpid order by last_updated desc) as statusType,
first_value(statusText) over (partition by zpid order by last_updated desc) as statusText,
first_value(countryCurrency) over (partition by zpid order by last_updated desc) as countryCurrency,
first_value(price) over (partition by zpid order by last_updated desc) as price,
first_value(unformattedPrice) over (partition by zpid order by last_updated desc) as unformattedPrice,
first_value(address) over (partition by zpid order by last_updated desc) as address,
first_value(addressStreet) over (partition by zpid order by last_updated desc) as addressStreet,
first_value(addressCity) over (partition by zpid order by last_updated desc) as addressCity,
first_value(addressState) over (partition by zpid order by last_updated desc) as addressState,
first_value(addressZipcode) over (partition by zpid order by last_updated desc) as addressZipcode,
first_value(isUndisclosedAddress) over (partition by zpid order by last_updated desc) as isUndisclosedAddress,
first_value(beds) over (partition by zpid order by last_updated desc) as beds,
first_value(baths) over (partition by zpid order by last_updated desc) as baths,
first_value(area) over (partition by zpid order by last_updated desc) as area,
first_value(latLong) over (partition by zpid order by last_updated desc) as latLong,
first_value(isZillowOwned) over (partition by zpid order by last_updated desc) as isZillowOwned,
first_value(variableData) over (partition by zpid order by last_updated desc) as variableData,
first_value(badgeInfo) over (partition by zpid order by last_updated desc) as badgeInfo,
first_value(hdpData) over (partition by zpid order by last_updated desc) as hdpData,
first_value(isSaved) over (partition by zpid order by last_updated desc) as isSaved,
first_value(isUserClaimingOwner) over (partition by zpid order by last_updated desc) as isUserClaimingOwner,
first_value(isUserConfirmedClaim) over (partition by zpid order by last_updated desc) as isUserConfirmedClaim,
first_value(pgapt) over (partition by zpid order by last_updated desc) as pgapt,
first_value(sgapt) over (partition by zpid order by last_updated desc) as sgapt,
first_value(zestimate) over (partition by zpid order by last_updated desc) as zestimate,
first_value(shouldShowZestimateAsPrice) over (partition by zpid order by last_updated desc) as shouldShowZestimateAsPrice,
first_value(has3DModel) over (partition by zpid order by last_updated desc) as has3DModel,
first_value(hasVideo) over (partition by zpid order by last_updated desc) as hasVideo,
first_value(isHomeRec) over (partition by zpid order by last_updated desc) as isHomeRec,
first_value(hasAdditionalAttributions) over (partition by zpid order by last_updated desc) as hasAdditionalAttributions,
first_value(isFeaturedListing) over (partition by zpid order by last_updated desc) as isFeaturedListing,
first_value(list) over (partition by zpid order by last_updated desc) as list,
first_value(relaxed) over (partition by zpid order by last_updated desc) as relaxed,
first_value(hasOpenHouse) over (partition by zpid order by last_updated desc) as hasOpenHouse,
first_value(openHouseStartDate) over (partition by zpid order by last_updated desc) as openHouseStartDate,
first_value(openHouseEndDate) over (partition by zpid order by last_updated desc) as openHouseEndDate,
first_value(openHouseDescription) over (partition by zpid order by last_updated desc) as openHouseDescription,
first_value(builderName) over (partition by zpid order by last_updated desc) as builderName,
first_value(info3String) over (partition by zpid order by last_updated desc) as info3String,
first_value(brokerName) over (partition by zpid order by last_updated desc) as brokerName,
first_value(lotAreaString) over (partition by zpid order by last_updated desc) as lotAreaString,
first_value(streetViewMetadataURL) over (partition by zpid order by last_updated desc) as streetViewMetadataURL,
first_value(streetViewURL) over (partition by zpid order by last_updated desc) as streetViewURL,
first_value(info2String) over (partition by zpid order by last_updated desc) as info2String,
first_value(info6String) over (partition by zpid order by last_updated desc) as info6String
from realestate.zillow
),
distinct_values as(
select
distinct
zpid,
providerListingId,
imgSrc,
hasImage,
detailUrl,
statusType,
statusText,
countryCurrency,
price,
unformattedPrice,
address,
addressStreet,
addressCity,
addressState,
addressZipcode,
isUndisclosedAddress,
beds,
baths,
area,
latLong,
isZillowOwned,
variableData,
badgeInfo,
hdpData,
isSaved,
isUserClaimingOwner,
isUserConfirmedClaim,
pgapt,
sgapt,
zestimate,
shouldShowZestimateAsPrice,
has3DModel,
hasVideo,
isHomeRec,
hasAdditionalAttributions,
isFeaturedListing,
list,
relaxed,
hasOpenHouse,
openHouseStartDate,
openHouseEndDate,
openHouseDescription,
builderName,
info3String,
brokerName,
lotAreaString,
streetViewMetadataURL,
streetViewURL,
info2String,
info6String
from zillow_latest
)
select * from distinct_values"""
zillow_latest_df=mysql_query(user='username',
password='xxx',
database='realestate',
host='xxxxxxx',
query=zillow_latest_query)
错误:
(1114, "The table '/tmp/#sql1422a_19_46' is full")
【问题讨论】:
-
我看到一些东西表明临时表有 16MB 的限制。这不是官方的 MySQL 页面,所以我没有链接它。您可以尝试在此答案中运行查询以明确确定限制:stackoverflow.com/a/36832492/42346
-
@mechanical_meat 临时表 in RAM 受限于
tmp_table_size或max_heap_table_size,以较小者为准。 MySQL 8.0 中两者的默认值都是 16MB。但是临时表可以比这大得多,它们只需要持久化到磁盘。 -
@BillKarwin:非常感谢您的澄清和回答。
标签: mysql python-3.x pymysql ubuntu-server