안단테 안단테
머하웃 완벽 가이드) - mahout cvb 인자값 본문
Usage:
[--input <input> --output <output> --maxIter <maxIter> --convergenceDelta
<convergenceDelta> --overwrite --num_topics <num_topics> --num_terms
<num_terms> --doc_topic_smoothing <doc_topic_smoothing> --term_topic_smoothing
<term_topic_smoothing> --dictionary <dictionary> --doc_topic_output
<doc_topic_output> --topic_model_temp_dir <topic_model_temp_dir>
--iteration_block_size <iteration_block_size> --random_seed <random_seed>
--test_set_fraction <test_set_fraction> --num_train_threads <num_train_threads>
--num_update_threads <num_update_threads> --max_doc_topic_iters
<max_doc_topic_iters> --num_reduce_tasks <num_reduce_tasks>
--backfill_perplexity --help --tempDir <tempDir> --startPhase <startPhase>
--endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input
directory.
--output (-o) output The directory
pathname for output.
--maxIter (-x) maxIter The maximum number of
iterations.
--convergenceDelta (-cd) convergenceDelta The convergence delta
value
--overwrite (-ow) If present, overwrite
the output directory
before running job
--num_topics (-k) num_topics Number of topics to
learn
--num_terms (-nt) num_terms Vocabulary size
--doc_topic_smoothing (-a) doc_topic_smoothing Smoothing for
document/topic
distribution
--term_topic_smoothing (-e) term_topic_smoothing Smoothing for
topic/term
distribution
--dictionary (-dict) dictionary Path to
term-dictionary
file(s) (glob
expression supported)
--doc_topic_output (-dt) doc_topic_output Output path for the
training doc/topic
distribution
--topic_model_temp_dir (-mt) topic_model_temp_dir Path to intermediate
model path (useful
for restarting)
--iteration_block_size (-block) iteration_block_size Number of iterations
per perplexity check
--random_seed (-seed) random_seed Random seed
--test_set_fraction (-tf) test_set_fraction Fraction of data to
hold out for testing
--num_train_threads (-ntt) num_train_threads number of threads per
mapper to train with
--num_update_threads (-nut) num_update_threads number of threads per
mapper to update the
model with
--max_doc_topic_iters (-mipd) max_doc_topic_iters max number of
iterations per doc
for p(topic|doc)
learning
--num_reduce_tasks num_reduce_tasks number of reducers to
use during model
estimation
--backfill_perplexity enable backfilling of
missing perplexity
values
--help (-h) Print out help
--tempDir tempDir Intermediate output
directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
'IT 기술 > BigData' 카테고리의 다른 글
머하웃 완벽 가이드) - LDA 돌리기 (Mr.LDA) (0) | 2023.02.02 |
---|---|
머하웃 완벽 가이드) - mahout vectordump 인자값 (0) | 2023.02.02 |
머하웃 완벽 가이드) - mahout rowid 인자값 & 과정 (0) | 2023.02.02 |
머하웃 완벽 가이드) - mahout seq2sparse 인자값 & 과정... (0) | 2023.02.02 |
머하웃 완벽 가이드) - mahout seqdirectory 인자값 (0) | 2023.02.02 |