【问题标题】:Running the svd on mahout在 mahout 上运行 svd
【发布时间】:2014-03-21 14:58:29
【问题描述】:

我正在使用命令在 mahout 上运行 svd 应用程序 /usr/local/mahout/bin/mahout svd -i /user/hduser/reuters-vectors/tfidf-vectors -o svd_output -nr 41702 -nc 20863 -r 10000 -sym "false" -wd temp_svd --cleansvd "true " -mem "假"

但是我遇到了错误:

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /usr/local/mahout/examples/target/mahout-examples-0.6-job.jar
14/03/20 14:51:27 INFO common.AbstractJob: Command line arguments: {--cleansvd=true, --endPhase=2147483647, --inMemory=false, --input=/user/hduser/reuters-vectors/tfidf-vectors, --maxError=0.05, --minEigenvalue=0.0, --numCols=20863, --numRows=41702, --output=svd_output, --rank=10000, --startPhase=0, --symmetric=false, --tempDir=temp, --workingDir=temp_svd}
14/03/20 14:51:28 WARN decomposer.HdfsBackedLanczosState: temp_svd/projections exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/norms exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/scaleFactor exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/projections exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/norms exists, will overwrite
14/03/20 14:51:29 WARN decomposer.HdfsBackedLanczosState: temp_svd/scaleFactor exists, will overwrite
14/03/20 14:51:29 INFO lanczos.LanczosSolver: Finding 10000 singular vectors of matrix with 41702 rows, via Lanczos
14/03/20 14:51:30 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/20 14:51:30 INFO mapred.JobClient: Running job: job_201403201104_0045
14/03/20 14:51:31 INFO mapred.JobClient:  map 0% reduce 0%
14/03/20 14:51:43 INFO mapred.JobClient:  map 100% reduce 0%
14/03/20 14:51:55 INFO mapred.JobClient:  map 100% reduce 50%
14/03/20 14:51:58 INFO mapred.JobClient:  map 100% reduce 100%
14/03/20 14:52:00 INFO mapred.JobClient: Job complete: job_201403201104_0045
14/03/20 14:52:00 INFO mapred.JobClient: Counters: 18
14/03/20 14:52:00 INFO mapred.JobClient:   Job Counters 
14/03/20 14:52:00 INFO mapred.JobClient:     Launched reduce tasks=2
14/03/20 14:52:00 INFO mapred.JobClient:     Launched map tasks=1
14/03/20 14:52:00 INFO mapred.JobClient:     Data-local map tasks=1
14/03/20 14:52:00 INFO mapred.JobClient:   FileSystemCounters
14/03/20 14:52:00 INFO mapred.JobClient:     FILE_BYTES_READ=12
14/03/20 14:52:00 INFO mapred.JobClient:     HDFS_BYTES_READ=167104
14/03/20 14:52:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80
14/03/20 14:52:00 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=196
14/03/20 14:52:00 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce input groups=0
14/03/20 14:52:00 INFO mapred.JobClient:     Combine output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map input records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Spilled Records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map output bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map input bytes=0
14/03/20 14:52:00 INFO mapred.JobClient:     Combine input records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Map output records=0
14/03/20 14:52:00 INFO mapred.JobClient:     Reduce input records=0
Exception in thread "main" java.util.NoSuchElementException
    at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
    at org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
    at org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
    at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:200)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:152)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:111)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:283)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:289)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

请告诉我如何解决这个问题

【问题讨论】:

    标签: mahout


    【解决方案1】:

    你在哪个版本的 mahout 上工作? 另请注意,您必须使用 ssvd 来实现您想要实现的目标。见http://mahout.apache.org/users/dim-reduction/ssvd.html

    【讨论】:

    • 我使用的是 mahout 0.6 版本
    • 请使用最新版本的 mahout 0.9,你应该使用 ssvd 而不是 svd。
    【解决方案2】:

    您是否已将矢量文件保存在 hdfs 中。并提到了正确的路径。如果你在本地运行,那么你应该通过 export HADOOP_LOCAL="TRUE" 设置并重新运行它。

    【讨论】:

    • 是的,矢量文件在 hdfs 中
    • tfidf-vectors 是 hdfs 中的一个文件夹。只需浏览文件夹,您会发现序列文件名 part-r-00000 添加路径并重新运行 pgm
    猜你喜欢
    • 2012-07-27
    • 2014-01-05
    • 1970-01-01
    • 2012-07-09
    • 1970-01-01
    • 1970-01-01
    • 2016-06-16
    • 2016-02-21
    • 1970-01-01
    相关资源
    最近更新 更多