- 📙Paper: ERNIE-Code Beyond English-Centric Cross-lingual Pretraining for Programming Languages
- 📚Publisher:
arxiv
- 🏠Author Affiliation:
Baidu
- 🔑Public: ✅ (promise)
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
560M
- 🗂️Data pre-processing
- Data Resource
- CodeSearchNet
- CC-100
- OPUS
- MultiUN
- IIT
- OPUS
- WikiMatrix
- De-duplication: ❌
- Filter Strategies
- /
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- We add a set of tokens representing whitespace indentation of different lengths in PL
- Technology
- 🧪Hyperparameters (ERNIE-Code 560M)
- optimizer: AdaFa
- betas: /
- eps: /
- batch size: a micro-batch size of 8/4
- context window:
1,024
- gradient accumulation steps:
15
- warmup steps:
1,000
- learning rate:
1e-4
- weight decay: /
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
bf16
- optimizer: AdaFa
- 🏃♀️Training
- model initialization:
mT5
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps:
100k
steps - hardware: 32 NVIDIA A100 GPUs with 40G memory
- training time: 4 weeks
- model initialization:
ERNIE-Code
This post is licensed under CC BY 4.0 by the author.
Recently Updated