【发布时间】:2015-02-20 05:58:46
【问题描述】:
我在读取 3Mb 数据 .xlsx 文件时遇到问题,对于 7Mb 数据 .xls 文件也是如此。读取文件时有大小限制吗?
在我的 Excel 文件中,我有 30,000 行和 36 行。有什么解决方案可以让我读取多达 10 万条或更多的记录吗?
在我的项目中,我必须导入 100 万条记录,但我的代码无法处理超过 29000 条记录。直到 29000 条记录为止,我的代码都可以在本地运行。
而且读取 29000 条记录也太费时间了,时间可能是 25 分钟。
谁能解释一下为什么会发生这种情况,我应该怎么做才能解决这个问题?
这是我的代码:
<?php
error_reporting(E_ALL);
set_time_limit(0);
ini_set("memory_limit","-1");
date_default_timezone_set('Europe/London');
define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');
/** Set Include path to point at the PHPExcel Classes folder **/
set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');
/** Include PHPExcel_IOFactory **/
include 'Classes/PHPExcel/IOFactory.php';
$inputFileName = 'files/30000rows.xls';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '')
{
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow))
{
return true;
}
return false;
}
}
echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
echo '<hr />';
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 1000;
//total rows in excel
$spreadsheetInfo = $objReader->listWorksheetInfo($inputFileName);
$totalRows = $spreadsheetInfo[0]['totalRows'];
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
/** Loop to read our worksheet in "chunk size" blocks **/
for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize) {
echo "in for loop<br>";
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
$cacheMethod = PHPExcel_CachedObjectStorageFactory:: cache_to_phpTemp;
$cacheSettings = array( ' memoryCacheSize ' => '1000MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
$cacheMethod=PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized;
PHPExcel_Settings::setCacheStorageMethod($cacheMethod);
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_in_memory_gzip;
if (!PHPExcel_Settings::setCacheStorageMethod($cacheMethod)) {
die($cacheMethod . " caching method is not available" . EOL);
}
echo date('H:i:s') , " Enable Cell Caching using " , $cacheMethod , " method" , EOL;
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
$objWorksheet = $objPHPExcel->getActiveSheet();
$highestColumn = $objWorksheet->getHighestColumn();
$sheetData = $objWorksheet- >rangeToArray('A'.$startRow.':'.$highestColumn.($startRow + $chunkSize-1),null, false, false, true);
echo '<pre>';
print_r($sheetData);
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);
echo '<br /><br />';
}
?>
【问题讨论】:
-
有错误吗?在处理数据时尽量不要打印太多信息。尽量优化代码。您是否尝试过通过 CLI 做同样的事情?
-
根据this discussion,唯一的 phpexcel 限制大约是“65,536 行和 256 (IV) 列”,这表明内存或超时限制受到影响。您可能应该尝试检查您是否达到了机器的内存限制或像@Justinas 建议的那样通过 cli 运行。
-
另外,如果你在 for 循环中打印那么多数据,它肯定会减慢你的速度。尝试将调试打印输出注释掉(或将它们记录并缓冲到文件中)。
-
当然使用
$objWorksheet- >rangeToArray()也会占用大量内存,因为你将大量数据加载到一个大型PHP数组中,如果可以的话,最好逐行处理这样做 -
你实际使用的是哪种缓存方法,你设置了三个....最后一个被gzip到内存中,你试过检查哪种方法对你的PHP版本和配置最有效吗?跨度>