- 📙Paper: CodeT5Mix A Pretrained Mixture of Encoder-decoder Transformers for Code Understanding and Generation
- 📚Publisher:
arxiv
- 🏠Author Affiliation:
Anonymous
- 🔑Public: ✅ (promise)
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
220M
;770M
- 🗂️Data pre-processing
- Data Resource
- CodeSearchNet
- CodeParrot
- De-duplication: ✅
- Filter Strategies
- filter the dataset by preserving only permissively licensed code
- files with 50 to 2000 tokens
- besides, we filter out the overlapped subset with CodeSearchNet and other downstream tasks by checking their GitHub repositories
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- CodeT5 tokenizer
- Technology
- 🧪Hyperparameters (CodeT5Mix 770M)
- optimizer: AdamW
- betas: /
- eps: /
- batch size: /
- context window: /
- gradient accumulation steps: /
- warmup steps: /
- learning rate: /
- weight decay:
0.1
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
fp16
- optimizer: AdamW
- 🏃♀️Training
- model initialization: from scratch
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: /
- hardware: 16 A100 GPUs with 40G memory
- training time: /
CodeT5Mix
This post is licensed under CC BY 4.0 by the author.
Recently Updated