Home CodeT5Mix
Post
Cancel

CodeT5Mix

  • 📙Paper: CodeT5Mix A Pretrained Mixture of Encoder-decoder Transformers for Code Understanding and Generation
  • 📚Publisher: arxiv
  • 🏠Author Affiliation: Anonymous
  • 🔑Public: ✅ (promise)
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 220M; 770M
  • 🗂️Data pre-processing
    • Data Resource
      • CodeSearchNet
      • CodeParrot
    • De-duplication: ✅
    • Filter Strategies
      • filter the dataset by preserving only permissively licensed code
      • files with 50 to 2000 tokens
      • besides, we filter out the overlapped subset with CodeSearchNet and other downstream tasks by checking their GitHub repositories
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • CodeT5 tokenizer
  • 🧪Hyperparameters (CodeT5Mix 770M)
    • optimizer: AdamW
      • betas: /
      • eps: /
    • batch size: /
    • context window: /
    • gradient accumulation steps: /
    • warmup steps: /
    • learning rate: /
    • weight decay: 0.1
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: fp16
  • 🏃‍♀️Training
    • model initialization: from scratch
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: /
    • hardware: 16 A100 GPUs with 40G memory
    • training time: /
This post is licensed under CC BY 4.0 by the author.

HumanEval-X

MBXP-HumanEval