안단테 안단테

머하웃 완벽 가이드) - LDA 명령어 실행 본문

IT 기술/BigData

머하웃 완벽 가이드) - LDA 명령어 실행

안단테에 2023. 2. 2. 13:29
728x90
반응형

머하웃 LDA 실행순서

 

1. seqdirectory: : Generate sequence files (of Text) from a directory

mahout seqdirectory -i content -o jack-seqdir -c UTF-8 -chunk 64 -xm sequential

 

문제점 : 파일의 크기가... 크면 힙사이즈 에러.... ㅠㅠ

 
2. seq2sparse : Sparse Vector generation from Text sequence files
mahout seq2sparse -i jack-seqdir -o jack-cvb -wt tf -seq -nv
 
3. rowid : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
mahout rowid -i jack-cvb/tf-vectors -o jack-cvb
 
4. cvb : LDA via Collapsed Variation Bayes (0th deriv. approx)
mahout cvb -dict jack-cvb/dictionary.file-0 -ow -i jack-cvb/matrix/ -o jack-topics -k 2 -x 10 -dt jack-to -mt jack-mo
 
5. vectordump: : Dump vectors from a sequence file to text
mahout vectordump -i jack-topics/part-m-00000 -d jack-cvb/dictionary.file-0 -dt sequencefile -o topics_word.txt -sort true
 
 
Mr.LDA 실행순서
 
/home/jack/mahout/mahout
1.
hadoop jar mrlda-0.9.0-SNAPSHOT-fatjar.jar cc.mrlda.ParseCorpus -input content -output MR-output
728x90
반응형
Comments