- 📙Paper: CodeT5 Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
- 📚Publisher:
EMNLP
- 🏠Author Affiliation:
Salesforce Research Asia
- 🔑Public: ✅
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
60M
;220M
;770M
- 🗂️Data pre-processing
- Data Resource
- CodeSearchNet
- BigQuery
- De-duplication: ❌
- Filter Strategies
- /
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- we train a byte-level BPE tokenizer. We allow tokens to extend across whitespace (excluding newline characters) so that common code idioms (e.g., import numpy as np) are represented as single tokens in the vocabulary.
- Technology
- 🧪Hyperparameters (CodeT5 770M)
- optimizer: AdamW
- betas: /
- eps: /
- batch size: /
- context window:
2,048
- gradient accumulation steps: /
- warmup steps:
1,000
- learning rate:
2e-4
- weight decay:
0.05
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
fp16
- optimizer: AdamW
- 🏃♀️Training
- model initialization: from scratch
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: /
- hardware: 16 A100 GPUs with 40G memory
- training time: 21 days
CodeT5
This post is licensed under CC BY 4.0 by the author.
Recently Updated