'IT 기술/BigData' 카테고리의 글 목록 (3 Page)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록IT 기술/BigData (38)

안단테 안단테

머하웃 완벽 가이드) - LDA 명령어 실행

머하웃 LDA 실행순서 1. seqdirectory: : Generate sequence files (of Text) from a directory mahout seqdirectory -i content -o jack-seqdir -c UTF-8 -chunk 64 -xm sequential 문제점 : 파일의 크기가... 크면 힙사이즈 에러.... ㅠㅠ 2. seq2sparse : Sparse Vector generation from Text sequence files mahout seq2sparse -i jack-seqdir -o jack-cvb -wt tf -seq -nv 3. rowid : Map SequenceFile to {SequenceFile, SequenceFile} mahout rowid ..

IT 기술/BigData 2023. 2. 2. 13:29

머하웃 완벽 가이드) - MR.LDA 하고있는것들...

현재 facebook에 관련된 뉴스기사를 크롤링해서 분류가 잘 되는지 테스트중 1. To prepare the corpus into the internal format used by Mr.LDA, run the following command [root@masters mahout]# hadoop jar mrlda-0.9.0-SNAPSHOT-fatjar.jar cc.mrlda.ParseCorpus -input facebook.txt -output jack-face -stoplist stoplist.txt 2. And to example the first 10 terms of the dictionary: [root@masters mahout]# hadoop jar mrlda-0.9.0-SNAPSHOT-fat..

IT 기술/BigData 2023. 2. 2. 13:29

머하웃 완벽 가이드) - LDA 돌리기 (Mr.LDA)

Mr.LDA 돌리려면 깃에서 다운받고 메이븐 프로젝트로 import 시키고 maven build 해주고 실행시켜보면 findCounter 못찾는다고 나와있을텐데 그건 하둡 2.0.0 부터 메소드가 달라져서 그런거니깐 1.1.1대로 바꿔주면 된다.

IT 기술/BigData 2023. 2. 2. 13:28

머하웃 완벽 가이드) - mahout vectordump 인자값

Usage: [--input --output --useKey --printKey --dictionary --dictionaryType --csv --namesAsComments --nameOnly --sortVectors --quiet --sizeOnly --numItems --vectorSize --filter [ ...] --help --tempDir --startPhase --endPhase ] Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --useKey (-u) useKey If the Key is a vector th..

IT 기술/BigData 2023. 2. 2. 13:28

머하웃 완벽 가이드) - mahout cvb 인자값

Usage: [--input --output --maxIter --convergenceDelta --overwrite --num_topics --num_terms --doc_topic_smoothing --term_topic_smoothing --dictionary --doc_topic_output --topic_model_temp_dir --iteration_block_size --random_seed --test_set_fraction --num_train_threads --num_update_threads --max_doc_topic_iters --num_reduce_tasks --backfill_perplexity --help --tempDir --startPhase --endPhase ] Job..

IT 기술/BigData 2023. 2. 2. 13:28

머하웃 완벽 가이드) - mahout rowid 인자값 & 과정

========================================================================================================= 인자값 Usage: [--input --output --help --tempDir --startPhase --endPhase ] Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --help (-h) Print out help --tempDir tempDir Intermediate output directory --startPhase startP..

IT 기술/BigData 2023. 2. 2. 13:27

머하웃 완벽 가이드) - mahout seq2sparse 인자값 & 과정...

인자값 Usage: [--minSupport --analyzerName --chunkSize --output --input --minDF --maxDFSigma --maxDFPercent --weight --norm --minLLR --numReducers --maxNGramSize --overwrite --help --sequentialAccessVector --namedVector --logNormalize] Options --minSupport (-s) minSupport (Optional) Minimum Support. Default Value: 2 --analyzerName (-a) analyzerName The class name of the analyzer --chunkSize (-chunk..

IT 기술/BigData 2023. 2. 2. 13:26

머하웃 완벽 가이드) - mahout seqdirectory 인자값

mahout seqdirectory 인자값 [root@masters mahout-distribution-0.9]# mahout seqdirectory -i path -o path-seqdir -c UTF-8 -chunk 64 -xm sequential -s MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/local/hadoop/hadoop-1.1.1/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-1.1.1/conf MAHOUT-JOB: /usr/local/mahout/mahout-distribution-0.9/mahout-examples..

IT 기술/BigData 2023. 2. 2. 13:25

머하웃 완벽 가이드) - 9장 LDA 알고리즘 돌리기

9장 LDA 돌리는 순서 + $MAHOUT_HOME/bin/mahout seqdirectory + -i reuters + -o reuters-seqdir + -c UTF-8 + -chunk 64 + -xm sequential mahout seq2sparse \ -i reuters-seqdir \ -o reuters-cvb -wt tf -seq -nv mahout rowid -i reuters-cvb/tf-vectors -o reuters-cvb mahout cvb -dict reuters-cvb/dictionary.file-0 -ow -i reuters-cvb/matrix/ -o reuters-topics -k 10 -x 20 -dt topics-output -mt topics-model mahout v..

IT 기술/BigData 2023. 2. 2. 13:25

머하웃 완벽 가이드) - 9장 classdump로 k-means 결과 확인하기 2

새로 만든 벡터로 k-means 인풋으로 넣고.... 결과값 확인하려고 classdump 이용했는데... 다음과 같은 오류가 나서... MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/local/hadoop/hadoop-1.1.1/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-1.1.1/conf MAHOUT-JOB: /usr/local/mahout/mahout-distribution-0.9/mahout-examples-0.9-job.jar 15/03/09 20:15:03 INFO common.AbstractJob: Command line..

IT 기술/BigData 2023. 2. 2. 13:24

Prev 1 2 3 4 Next

목록IT 기술/BigData (38)

안단테 안단테

티스토리툴바