indobenchmark/indobert-base-p2

132次阅读

Model	#params	Arch.	Training data
`indobenchmark/indobert-base-p1`	124.5M	Base	Indo4B (23.43 GB of text)
`indobenchmark/indobert-base-p2`	124.5M	Base	Indo4B (23.43 GB of text)
`indobenchmark/indobert-large-p1`	335.2M	Large	Indo4B (23.43 GB of text)
`indobenchmark/indobert-large-p2`	335.2M	Large	Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-base-p1`	11.7M	Base	Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-base-p2`	11.7M	Base	Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-large-p1`	17.7M	Large	Indo4B (23.43 GB of text)
`indobenchmark/indobert-lite-large-p2`	17.7M	Large	Indo4B (23.43 GB of text)

IndoBERT Base Model (phase2 – uncased)

IndoBERT is a state-of-the-art language model for Indonesian based on the BERT model. The pretrained model is trained using a masked language modeling (MLM) objective and next sentence prediction (NSP) objective.

All Pre-trained Models

Model #params Arch. Training data

indobenchmark/indobert-base-p1 124.5M Base Indo4B (23.43 GB of text)

indobenchmark/indobert-base-p2 124.5M Base Indo4B (23.43 GB of text)

indobenchmark/indobert-large-p1 335.2M Large Indo4B (23.43 GB of text)

indobenchmark/indobert-large-p2 335.2M Large Indo4B (23.43 GB of text)

indobenchmark/indobert-lite-base-p1 11.7M Base Indo4B (23.43 GB of text)

indobenchmark/indobert-lite-base-p2 11.7M Base Indo4B (23.43 GB of text)

indobenchmark/indobert-lite-large-p1 17.7M Large Indo4B (23.43 GB of text)

indobenchmark/indobert-lite-large-p2 17.7M Large Indo4B (23.43 GB of text)