- 📙Paper: BLOOM A 176B-Parameter Open-Access Multilingual Language Model
- 📚Publisher:
arxiv
- 🏠Author Affiliation:
BigScience
- 🔑Public: ✅
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
176B
- 🗂️Data pre-processing
- Data Resource
- Our pre-training dataset is based on a snapshot of selected public GitHub repositories taken on 2021/07/14.
- De-duplication: ✅
- Filter Strategies
- /
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- The tokenizer is a learned subword tokenizer trained using BBPE.
- Technology
- 🧪Hyperparameters (BLOOM 176B)
- optimizer: Adam
- betas: 0.9, 0.95
- eps: 1e-8
- batch size:
2,048
- context window:
2,048
- gradient accumulation steps: /
- warmup steps: /
- learning rate:
6e-5
- weight decay:
0.1
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
bp16
- optimizer: Adam
- 🏃♀️Training
- model initialization: from scratch
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: 366B tokens
- hardware: /
- training time: /
BLOOM
This post is licensed under CC BY 4.0 by the author.
Recently Updated