BLOOM

Posted Nov 9, 2022

By Anonymous

1 min read

📙Paper: BLOOM A 176B-Parameter Open-Access Multilingual Language Model
📚Publisher: arxiv
🏠Author Affiliation: BigScience
🔑Public: ✅
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 176B
🗂️Data pre-processing
- Data Resource
  - Our pre-training dataset is based on a snapshot of selected public GitHub repositories taken on 2021/07/14.
- De-duplication: ✅
- Filter Strategies
  - /
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - The tokenizer is a learned subword tokenizer trained using BBPE.
🧪Hyperparameters (BLOOM 176B)
- optimizer: Adam
  - betas: 0.9, 0.95
  - eps: 1e-8
- batch size: 2,048
- context window: 2,048
- gradient accumulation steps: /
- warmup steps: /
- learning rate: 6e-5
- weight decay: 0.1
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: bp16
🏃‍♀️Training
- model initialization: from scratch
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: 366B tokens
- hardware: /
- training time: /

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.