ERNIE-Code

Posted Dec 13, 2022 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: ERNIE-Code Beyond English-Centric Cross-lingual Pretraining for Programming Languages
📚Publisher: arxiv
🏠Author Affiliation: Baidu
🔑Public: ✅ (promise)
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 560M
🗂️Data pre-processing
- Data Resource
  - CodeSearchNet
  - CC-100
  - OPUS
  - MultiUN
  - IIT
  - OPUS
  - WikiMatrix
- De-duplication: ❌
- Filter Strategies
  - /
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - We add a set of tokens representing whitespace indentation of different lengths in PL
🧪Hyperparameters (ERNIE-Code 560M)
- optimizer: AdaFa
  - betas: /
  - eps: /
- batch size: a micro-batch size of 8/4
- context window: 1,024
- gradient accumulation steps: 15
- warmup steps: 1,000
- learning rate: 1e-4
- weight decay: /
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: bf16
🏃‍♀️Training
- model initialization: mT5
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: 100k steps
- hardware: 32 NVIDIA A100 GPUs with 40G memory
- training time: 4 weeks

This post is licensed under CC BY 4.0 by the author.