【问题标题】:reading from different files without keeping old content perl从不同的文件中读取而不保留旧内容 perl
【发布时间】:2016-07-19 13:02:50
【问题描述】:

所以我有一个函数可以打开一个文件来读取,分析行并将一些写入一个新文件。我用不同的文件多次调用这个函数。但是现在我注意到,通过每次新的函数调用,以前文件的行也会被读入。我怎样才能防止这种情况?

es.txt 的内容: El anarquismo es una filosofía política y social que llama

dt.txt 的内容: der Regel mit Veränderungen der chemischen Bindungen in

程序运行后,创建的文件 ProfileDE 如下所示(虽然它应该只包含来自“dt.txt”的标记,但有些标记来自“es.txt”):

una
ilos
ism
lític
qui
lí
ti
polí
socia

--实际代码:

#! /usr/bin/perl
use utf8;
use warnings;
use strict;
use List::Util qw(min);
use open ':encoding(utf8)';
binmode(STDOUT, ":utf8");
binmode(STDIN, ":utf8");

generateProfile("es.txt", "ES"); #function call to read from file es.txt 
generateProfile("dt.txt", "DE"); #second call to read only from file dt.txt

sub generateProfile { 
    my $file= $_[0]; #taking arguments
    my $Lang = $_[1];

    open(IN, "<:utf8",$file) || die "error"; #to read file 
    open(OUT, ">:utf8", "profile$Lang.txt"); # to create and write in file e.g profileDe

    my (%ngramL); #any hash for later
    my $line; 
    my (@words);
    my (%ngramL);
    my (@uni, @bi, @tri, @quad, @five); #array which keeps letterkombinations of different length

    while($line =<IN>){ 
        chomp $line;
       # print $line;  # just for testing: during the second function call, it would print here old content from "es.txt" instead of only reading from "dt.txt"
        push(@words, $line);
        }
     close IN;   #doesn't it closed?

     foreach my $word (@words){
        bigramm($word); #split word in different letter combinations
        }

    freqL(); #fill that hash with frequences, how many times occures one letter combination e.g. "ab" = 2, "tion"=5
    print_hashL(); #print hash 

    sub bigramm{
      my $wort= $_[0];
      my $i; my $k;
      my @letters= split(//, $wort);
      for ($i=0; $i<length($wort)-0; $i++){ ####!!!!! -1?
        my $bi= substr($wort, $i, 1);
        push(@uni, $bi); }   
      for ($i=0; $i<length($wort)-1; $i++){
        my $bi= substr($wort, $i, 2);
        push(@bi, $bi); }
      for ($i=0; $i<length($wort)-2; $i++){
        my $bi= substr($wort, $i, 3);
        push(@tri, $bi); }
      for ($i=0; $i<length($wort)-3; $i++){
        my $bi= substr($wort, $i, 4);
        push(@quad, $bi); }
      for ($i=0; $i<length($wort)-4; $i++){
        my $bi= substr($wort, $i, 5);
        push(@five, $bi); }      

 }

    sub freqL{
      for my $duo (@uni, @bi, @tri, @quad, @five){
        if(defined $ngramL{$duo}) {$ngramL{$duo}++;}
        else {$ngramL{$duo}=1;}
    }
  }

    sub print_hashL{
      foreach my $elem(sort{$ngramL{$b}<=>$ngramL{$a}} keys %ngramL) {
        print OUT "$elem\n";}
     }

}

还有一些警告,可能会也可能不会导致这个问题? :

"my" variable %ngramL masks earlier declaration in same scope at stack.pl line 23.
Variable "@uni" will not stay shared at stack.pl line 46.
Variable "@bi" will not stay shared at stack.pl line 49.
Variable "@tri" will not stay shared at stack.pl line 52.
Variable "@quad" will not stay shared at stack.pl line 55.
Variable "@five" will not stay shared at stack.pl line 58.
Variable "@uni" will not stay shared at stack.pl line 63.
Variable "@bi" will not stay shared at stack.pl line 63.
Variable "@tri" will not stay shared at stack.pl line 63.
Variable "@quad" will not stay shared at stack.pl line 63.
Variable "@five" will not stay shared at stack.pl line 63.
Variable "%ngramL" will not stay shared at stack.pl line 64.
Variable "%ngramL" will not stay shared at stack.pl line 70.

【问题讨论】:

  • 这部分代码看起来没问题。 bigramm 子是使用全局变量还是状态变量?我会假设 bigramm 缓存(全局或在状态变量中)其输入并且之后不会清除缓存。
  • 我不是在 subs 内有 subs 的专家,但是如果你在任何 subs 之外声明 my (@words, %ngramL, @uni, @bi, @tri, @quad, @five) 并在调用 generateProfile 时使用 (@words, %ngramL, @uni, @bi, @tri, @quad, @five) = (); 重置,它会解决你的问题。那是为了快速修复,我会让有更多经验/知识的人发布更准确的答案。我的猜测是,其他 subs 中的 subs 只编译一次,并且在它们之外声明但您在内部使用的数组/散列被转换为 globales 变量......但这只是一个猜测
  • 我对整个代码进行了重构,subs 内没有 subs,甚至尝试过包,它仍然表现得那样.. 只要我以这种方式保留代码,快速修复也确实有效,但是我会添加一个其他函数调用,例如 generateProfile("en.txt", "EN");它会再次搞砸一切..

标签: perl stdin filereader fileinputstream filehandle


【解决方案1】:
while($line =<IN>){ 
    chomp $line;
    print $line;  # during the second function call, it would print here old content from "es.txt" instead of only reading from "dt.txt"
    push(@words, $line);
    close IN;   #doesn't it closed?

从输入文件中读取一行后,您基本上关闭了文件处理程序。然后,当您转到第二行时,您将无法从文件中读取,因为您之前关闭了它。

【讨论】:

  • 不,我很抱歉,这是复制粘贴错误.. 我忘了放一个“}”
  • 您如何制作“使用不同的文件多次调用此功能”?代码示例?
  • 这里我的意思是调用具有不同文件的函数,这些行:(1) generateProfile("es.txt", "ES"); (2) generateProfile("dt.txt", "DE");参数只是改变了..但是在调用(1)之后,第二次调用中的流似乎记住了第一个文件并从(1)打印行,然后是(2)
  • 能否把这2个文件的内容也加一下?我测试了,我无法重现你的问题
  • link 如您所见,您显示的代码没有问题
猜你喜欢
  • 1970-01-01
  • 2014-06-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-05-28
  • 1970-01-01
  • 1970-01-01
  • 2019-08-03
相关资源
最近更新 更多