【问题标题】:Compare the data of two csv file PHP比较两个csv文件PHP的数据
【发布时间】:2016-10-09 16:55:24
【问题描述】:

到目前为止,这是我的代码。我被困在如何查找/比较 payment.csv 与 transactions.csv 的数据上。

<?php
        //this shows the name, email, deposit date and amount from payment.csv

          $data = file("data/payment.csv");
          foreach ($data as $deposit){
            $depositarray = explode(",", $deposit);
            $depositlist = $depositarray;
            $name = $depositlist[0];
            $email = $depositlist[1];
            $depositdate = $depositlist[9]; 
            $depositamount = $depositlist[10];

            //echo $depositamount;
        }

        //this shows the payment date and amount from transaction.csv

        $databank = file("datas/transactions.csv");
          foreach ($databank as $payment){
            $paymentarray = explode(",", $payment);
            $paymentlist = $paymentarray;

            $paymentdate = $paymentlist[0];
            $paymentamount =  $paymentlist[5];

            //echo $paymentamount;

        }
?>

例子:

payment.csv 开始,每个 ($depositdate && $depositamount ) 都会与 transactions.csv ($paymentdate && $paymentamount) 进行比较。

匹配的每一行都将保存到array,然后显示在表格上。如果不匹配保存到数组并稍后显示。

有人会帮助我或给我一个想法来完成这个吗?只是为了显示所有匹配数据的行。

或者像这样:

这应该是交易有多个匹配时的输出

【问题讨论】:

  • 您没有连接这些数据的密钥或 ID?如果两笔交易的日期和金额相同怎么办?但是对于您的问题-我想一种方法是将您的第二个 foreach 放入您的第一个 foreach 中-这可能是一种时间/资源密集型的方法。但是你可以将每一行与每一行进行比较。或者你 map 两个数组。
  • @Jurik,是的,我没有任何密钥/ ID。你介意给我参考吗?是的,这就是我想要完成的。如果两个交易具有相同的日期和金额,则显示所有交易。请检查我附上的照片。谢谢。
  • 主要文件是payment.csv。如果匹配,只需查找 transactions.csv 的每一行。抱歉,如果不是那么清楚。
  • 一旦它们存储在单独的数组中,我将添加一个我可以轻松检查的键。我会使用存款日期和金额的 MD5 哈希。因此,传递主文件并生成密钥,使用哈希的密钥创建另一个数组 ($masterHash)。现在,将生成 MD5 哈希的事务文件传递下去,看看它是否存在于$masterHash 中。如果是的话,你有一个匹配。这将很快,因为它只是两个顺序传递和查找开销。
  • 工作代码:eval.in/586614 - 很快就会发布答案 - 去喝杯咖啡 :)

标签: php arrays loops csv


【解决方案1】:

要求:查找并记录与对应主记录匹配的交易记录。

必须保留与主记录匹配的所有交易记录的列表。

比较基于“日期”和金额。 (更新为轻松允许当前行中的任何值,因为您传递了一个“列名”数组以供使用。)

问题是,如果数组未按您要比较的键排序,这可能会变得非常昂贵。

一种方法是为每个“数据关键字段”生成一个唯一但易于生成且大小固定的“关键”,以便于比较。

然后使用这些“密钥”为原始记录生成一个“生成的密钥”lookup 数组。

这样就不必在要比较的字段上提供排序数据。但是,生成的查找数组必须适合内存。

我决定对连接的数据键使用 MD5 哈希。在这个应用程序中,碰撞的机会并不重要。 MD5 在生成唯一哈希方面非常出色。它也很快。

Working code at eval.in

Source Code

完成工作的类:

// ---------------------------------------------------------------------------------
class HashMatch {

   /*
    * Generate a MD5 hash for each master and tranasaction using some
    * of the data fields as the string to be hashed.
    */ 

    /**
    * Master source records
    * 
    * @var array 
    */
    private $master = null;

    /**
    * Transaction Source records must have the same field names as the master
    * of the indexes that are used to generate the MD5 hash
    * 
    * @var array 
    */
    private $transaction  = null;

    /**
    * The generated MD5 hash is the key in the Master source records.
    * 
    * Each record has a list of other Master Record Ids that also have the same hash 
    * 
    * @var array
    */
    private $hashMaster = array();

    /**
    * The generated MD5 hash is the key in the Transaction source records.
    * 
    * Each record has a list of other Transaction Record Ids that also have the same hash 
    * 
    * @var array
    */
    private $hashTransaction = array();

    /**
    * Specify which index names to use from the supplied data record arrays
    * to generate the MD5 hash with.
    * 
    * @var array 
    */
    private $keyNames = array();

    /**
    * Generate a MD5 hash for each master and transaction using some
    * of the data fields as the string to be hashed.
    * 
    * You can pass an array of field names to used to generate the key.
    * 
    * This allows any records to be used in this class as you just provide
    * the li9st of names to generate the MD5 hash
    *  
    * 
    * @param array $master
    * @param array $transaction
    * @param array $keyNames
    * 
    * @return void
    */    
    public function __construct(array $master, 
                                array $transaction, 
                                array $keyNames = array('when', 'amount')) 
    {
        $this->master = $master;
        $this->transaction  = $transaction;
        $this->keyNames = $keyNames; 
    } 

    /**
    * Generate all the Hashes and store all the matching details
    * 
    * @return bool
    */    
    public function generateMatches()
    {
        $this->processMaster();
        $this->processTransaction();
        return !empty($this->hashMaster) && !empty($this->hashTransaction);
    }

    /**
    * Generate a list of MD5 hashes as a key  
    * 
    * Keep a list of other master records with the same hash 
    *  
    * @return void
    */    
    public function processMaster()
    {
        foreach ($this->master as $recordId => $data) {

            $hash = $this->generateHash($data);
            if (empty($this->hashMaster[$hash])) { // add it...
                $this->hashMaster[$hash]['masterId'] = $recordId;
                $this->hashMaster[$hash]['matchIds'] = array($recordId);
            }            
            else { // is a duplicate so add to the match list
                $this->hashMaster[$hash]['matchIds'][] = $recordId;
            }
        }
    }

    /**
    * Generate a list of MD5 hashes as a key for the Transaction source  
    *   
    * Match the hashes against the master list and record if there is a match
    * 
    * @return void
    */
    public function processTransaction()
    {        
        foreach ($this->transaction as $recordId => $data) {
            $hash = $this->generateHash($data);
            if (empty($this->hashMaster[$hash])) { // skip this record
               continue;
            }

            // record a match with the master
            if (empty($this->hashTransaction[$hash])) { // new record
                $this->hashTransaction[$hash]['masterId'] = $this->hashMaster[$hash]['masterId'];
                $this->hashTransaction[$hash]['matchIds']  = array();
            }

            // add to the list of matches
            $this->hashTransaction[$hash]['matchIds'][] = $recordId;
        }
    }

    /**
    * Return Master MD5 list 
    * 
    * The keys are unique, however there are extra values:
    *   
    *   'masterId'  ==> The first record in the array with this key
    * 
    *   'matchIds'  ==> A *complete* list of all the master records that have this key.
    *                   Yes, it includes itself, this allows you to just use this list
    *                   when reporting.
    * 
    * @return array
    */
    public function getHashMasterList()
    {
        return $this->hashMaster;
    }

    /**
    * Return Master MD5 list with more that one matching master
    * 
    * i.e. duplicate master records with the same hash
    * 
    * @return array
    */
    public function getHashMatchedMasterList()
    {
        $out = array();
        foreach ($this->hashMaster as $key => $item) {
            if (count($item['matchIds']) >= 2) {
                $out[$key] = $item; 
            }
        }
        return $out;
    }

    /**
    * All the transactions  that matched a master record
    * 
    * @return array
    */
    public function getHashTransactionList()
    {
        return $this->hashTransaction;
    }

    /**
    * given a master hash then return the details as:
    * 
    * i.e. this converts a hash key back into source records for processing.
    * 
    * 1) A list of matching master records 
    * 
    *    e.g. $out['master'][] ...  
    *    
    * 
    * 2) A list of matching transaction records 
    * 
    *    e.g. $out['transaction'][] ...   
    * 
    * @param string $hash
    * 
    * @return array
    */
    public function getMatchedRecords($hash)
    {
        $out = array('key'         => $hash,
                      'master'      => array(),
                      'transaction' => array(),
                     );

        if (!empty($this->hashMaster[$hash])) { // just in case is invalid hash
            foreach ($this->hashMaster[$hash]['matchIds'] as $recordId) {
                $out['master'][] = $this->master[$recordId];
            }
        }

        if (!empty($this->hashTransaction[$hash])) {
            foreach ($this->hashTransaction[$hash]['matchIds'] as $recordId) {
                $out['transaction'][] = $this->transaction[$recordId];
            }
        }

        return $out;
    }

    /**
    * Generate an MD5 hash from the required fields in the data record 
    * The columns to use will have been passed in the constructor
    * and found in '$keyNames'
    * 
    * It is so you don't have to edit anything to use this class
    * 
    * @param  array  $row
    * 
    * @return string
    */
    public function generateHash($row) 
    {
        $text = '';
        foreach ($this->keyNames as $name) {
            $text .= $row[$name];
        } 
        return Md5($text);
    }   
}

解释...

later....

运行它的代码:

// !!!! You can pass the names of the fields to be used to generate the key 
$match = new HashMatch($master, 
                       $transaction, 
                       array('whenDone', 'amount'));
$match->generateMatches();


// print output...
echo '<pre>Hash Master Records with multiple Matching Masters ... ', PHP_EOL;
    print_r($match->getHashMatchedMasterList());
echo '</pre>';    

输出:

Matching Master to Transaction... 
Array
(
    [key] => 296099e19b77aad413600a1e2f2cb3cd
    [master] => Array
        (
            [0] => Array
                (
                    [name] => John Matched
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => johnMatched@y.com
                )

            [1] => Array
                (
                    [name] => Jane Matched
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => janeMatched@y.com
                )

        )

    [transaction] => Array
        (
            [0] => Array
                (
                    [name] => John Doe
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => johndoe@y.com
                )

            [1] => Array
                (
                    [name] => micky mean
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => mickym@y.com
                )
        )
)

测试数据

$master[]      = array('name' => 'First last',     'whenDone' => '2016-03-03', 'amount' => 12000,  'email' => 'sample@y.com', );
$master[]      = array('name' => 'John Matched',   'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'johnMatched@y.com');
$master[]      = array('name' => 'Jane Unmatched', 'whenDone' => '2016-05-02', 'amount' => 12345,  'email' => 'janeUnmatched@y.com');
$master[]      = array('name' => 'Jane Matched',   'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'janeMatched@y.com');

$transaction[] = array('name' => 'Mary Lamb',      'whenDone' => '2016-03-04', 'amount' => 12000,  'email'  => 'maryl@y.com');
$transaction[] = array('name' => 'John Doe',       'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'johndoe@y.com');
$transaction[] = array('name' => 'micky mean',     'whenDone' => '2016-04-01', 'amount' => 12345,  'email'  => 'mickym@y.com');

【讨论】:

  • 我现在才遇到一些具有相同键的行不存在。问题是关键(日期和金额可以重复)。如果 2 个或更多用户具有相同的日期和金额,则只有第一个用户会显示其他用户不会显示,因为密钥已经存在。您能否帮助我了解如何显示其他数据/用户?谢谢
  • 它确实维护了一个类似的列表来匹配主记录。 :) 我只是没有添加代码来显示它们。我将添加它并使用新代码更新 pastebin。很快。
  • 谢谢!你能解释一下代码(按行)吗?其中一些,但不是全部。
【解决方案2】:

根据@Ryan Vincent 的评论:

<?php
  $masterData = array();

  //this shows the name, email, deposit date and amount from payment.csv
  $data = file("data/payment.csv");
  foreach ($data as $deposit){
    $depositarray = explode(",", $deposit);
    $key = md5($depositlist[9] . $depositlist[10]); //date + amount
    $depositlist = $depositarray;

    $masterData[$key]['payment'] = array(
      'name' => $depositlist[0],
      'email' => $depositlist[1],
      'depositdate' => $depositlist[9],
      'depositamount' => $depositlist[10]
    );
  }

  //this shows the payment date and amount from transaction.csv
  $databank = file("datas/transactions.csv");
  foreach ($databank as $payment){
    $paymentarray = explode(",", $payment);

    $key = md5($paymentlist[0] . $paymentlist[5]); //date + amount

    $masterData[$key]['transaction'] = array(
      'paymentdate' => $paymentlist[0],
      'paymentamount' =>  $paymentlist[5]
    );
  }
?>

现在您有一个数组$masterData,其中包含相同键下具有相同日期和金额的所有数据。

但我仍然认为这个列表对任何事情都没有好处,因为您不知道哪笔付款属于哪笔交易,因为日期和金额可能相同。

但是,如果您现在进行如下检查:

<?php
  foreach($masterData as $data) {
    echo explode(',', $data[0]);

    if(count($data) == 2) {

      echo explode(',', $data[1]) . ', matched';
    }

    echo '<br/>';
  }
?>

您应该在每一行都包含您的数据,并在有交易时在行尾添加matched

但就像我说的那样 - 由于人们可以在同一天进行相同金额的交易,因此您不知道哪笔交易属于哪个人。

【讨论】:

  • 哇!谢谢你。希望它会成功。现在工作。我附上了照片。好心检查。这可能吗?
  • 是否可以在数组中显示/存储具有多个匹配的所有交易?示例:Payment A 匹配到 Transaction 1, 3, 5.. 等等。目的是查看交易中付款的所有可能匹配项。
  • 是的,这是可能的,当您将 $masterData[$key]['transaction'] = array( 更改为 $masterData[$key][] = array( 时,您可以将其存档。
  • 嘿伙计,我正在尝试运行您的示例代码。它工作正常,除了其他列表不会显示在阵列上。问题是使用md5 data &amp; amount 的密钥,用户可能有相同的日期和金额,因此密钥不是唯一的。你有办法解决这个问题吗?如果用户 A、B 和 C 具有相同的付款日期和金额(键),他们应该显示相同的交易。期待您的回复。谢谢。
【解决方案3】:

首先,如果它是有效的 csv 文件,您应该通过 fgetcsv() 解析数据,而不是依赖于字符串中没有逗号。

您可以在读取 csv1 时为存款日期创建一个索引数组,并在读取 csv2 时简单地查找该索引。如果有匹配的日期,请比较金额并继续进行。

类似这样的:

// read csv1, store in array and create index
$data = array();
$fh = fopen($csv1, 'r');
while($row = fgetcsv($fh)) {
  $data[] = $row;
  $val = $row[$interestd_in];
  $key = count($data) - 1;
  $idx[$val][] = $key;      // array containing all indices
}
fclose($fh);

// read csv2, lookup in index and process further
$fh = fopen($csv2, 'r');
while($row2 = fgetcsv($fh)) {
  $val = $row2[$interest2];
  if(!empty($idx[$val])) {
    foreach($idx[$val] as $key) {
      $row1 = $data[$key];
      /*
    do your further comparisons of these 2 data-lines
    and output, if matches found
       */
    }
  }
}
fclose($fh);

【讨论】:

    猜你喜欢
    • 2022-01-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-09-05
    • 1970-01-01
    • 2017-08-04
    • 2014-06-14
    • 2021-04-27
    相关资源
    最近更新 更多