머하웃 완벽 가이드) - mahout cvb 인자값

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

안단테 안단테

머하웃 완벽 가이드) - mahout cvb 인자값 본문

IT 기술/BigData

머하웃 완벽 가이드) - mahout cvb 인자값

안단테에 2023. 2. 2. 13:28

728x90

Usage:

[--input <input> --output <output> --maxIter <maxIter> --convergenceDelta

<convergenceDelta> --overwrite --num_topics <num_topics> --num_terms

<num_terms> --doc_topic_smoothing <doc_topic_smoothing> --term_topic_smoothing

<term_topic_smoothing> --dictionary <dictionary> --doc_topic_output

<doc_topic_output> --topic_model_temp_dir <topic_model_temp_dir>

--iteration_block_size <iteration_block_size> --random_seed <random_seed>

--test_set_fraction <test_set_fraction> --num_train_threads <num_train_threads>

--num_update_threads <num_update_threads> --max_doc_topic_iters

<max_doc_topic_iters> --num_reduce_tasks <num_reduce_tasks>

--backfill_perplexity --help --tempDir <tempDir> --startPhase <startPhase>

--endPhase <endPhase>]

Job-Specific Options:

--input (-i) input Path to job input

directory.

--output (-o) output The directory

pathname for output.

--maxIter (-x) maxIter The maximum number of

iterations.

--convergenceDelta (-cd) convergenceDelta The convergence delta

value

--overwrite (-ow) If present, overwrite

the output directory

before running job

--num_topics (-k) num_topics Number of topics to

learn

--num_terms (-nt) num_terms Vocabulary size

--doc_topic_smoothing (-a) doc_topic_smoothing Smoothing for

document/topic

distribution

--term_topic_smoothing (-e) term_topic_smoothing Smoothing for

topic/term

distribution

--dictionary (-dict) dictionary Path to

term-dictionary

file(s) (glob

expression supported)

--doc_topic_output (-dt) doc_topic_output Output path for the

training doc/topic

distribution

--topic_model_temp_dir (-mt) topic_model_temp_dir Path to intermediate

model path (useful

for restarting)

--iteration_block_size (-block) iteration_block_size Number of iterations

per perplexity check

--random_seed (-seed) random_seed Random seed

--test_set_fraction (-tf) test_set_fraction Fraction of data to

hold out for testing

--num_train_threads (-ntt) num_train_threads number of threads per

mapper to train with

--num_update_threads (-nut) num_update_threads number of threads per

mapper to update the

model with

--max_doc_topic_iters (-mipd) max_doc_topic_iters max number of

iterations per doc

for p(topic|doc)

learning

--num_reduce_tasks num_reduce_tasks number of reducers to

use during model

estimation

--backfill_perplexity enable backfilling of

missing perplexity

values

--help (-h) Print out help

--tempDir tempDir Intermediate output

'IT 기술 > BigData' 카테고리의 다른 글

머하웃 완벽 가이드) - LDA 돌리기 (Mr.LDA) (0)	2023.02.02
머하웃 완벽 가이드) - mahout vectordump 인자값 (0)	2023.02.02
머하웃 완벽 가이드) - mahout rowid 인자값 & 과정 (0)	2023.02.02
머하웃 완벽 가이드) - mahout seq2sparse 인자값 & 과정... (0)	2023.02.02
머하웃 완벽 가이드) - mahout seqdirectory 인자값 (0)	2023.02.02

'IT 기술/BigData' Related Articles

Comments

안단테 안단테

머하웃 완벽 가이드) - mahout cvb 인자값 본문

머하웃 완벽 가이드) - mahout cvb 인자값

'IT 기술 > BigData' 카테고리의 다른 글

티스토리툴바