- 📙Paper: Competition-level code generation with AlphaCode
- 📚Publisher:
Science - 🏠Author Affiliation:
Deepmind - 🔑Public: ❌
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
284M;1.1B;2.8B;8.7B;41.1B
- 🗂️Data pre-processing
- Data Resource
- Our pre-training dataset is based on a snapshot of selected public GitHub repositories taken on 2021/07/14.
- De-duplication: ✅
- Filter Strategies
- We filtered out files which were likely auto-generated
- all files larger than 1MB
- lines longer than 1000 characters
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details: /
- Technology
- 🧪Hyperparameters (AlphaCode 41.1B)
- optimizer: AdamW
- betas: 0.9, 0.95
- eps: /
- batch size:
2,048 - context window:
6,144 - gradient accumulation steps: /
- warmup steps:
1,000 - learning rate:
1e-4 - weight decay:
0.1 - decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
bf16
- optimizer: AdamW
- 🏃♀️Training
- model initialization: /
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: 967B tokens
- hardware: TPUv4
- training time: /
AlphaCode
This post is licensed under CC BY 4.0 by the author.
Recently Updated