CodeT5Mix

Posted Sep 22, 2022 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: CodeT5Mix A Pretrained Mixture of Encoder-decoder Transformers for Code Understanding and Generation
📚Publisher: arxiv
🏠Author Affiliation: Anonymous
🔑Public: ✅ (promise)
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 220M; 770M
🗂️Data pre-processing
- Data Resource
  - CodeSearchNet
  - CodeParrot
- De-duplication: ✅
- Filter Strategies
  - filter the dataset by preserving only permissively licensed code
  - files with 50 to 2000 tokens
  - besides, we filter out the overlapped subset with CodeSearchNet and other downstream tasks by checking their GitHub repositories
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - CodeT5 tokenizer
🧪Hyperparameters (CodeT5Mix 770M)
- optimizer: AdamW
  - betas: /
  - eps: /
- batch size: /
- context window: /
- gradient accumulation steps: /
- warmup steps: /
- learning rate: /
- weight decay: 0.1
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: fp16
🏃‍♀️Training
- model initialization: from scratch
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: /
- hardware: 16 A100 GPUs with 40G memory
- training time: /

This post is licensed under CC BY 4.0 by the author.