Home CodeT5
Post
Cancel

CodeT5

  • 📙Paper: CodeT5 Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
  • 📚Publisher: EMNLP
  • 🏠Author Affiliation: Salesforce Research Asia
  • 🔑Public: ✅
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 60M; 220M; 770M
  • 🗂️Data pre-processing
    • Data Resource
      • CodeSearchNet
      • BigQuery
    • De-duplication: ❌
    • Filter Strategies
      • /
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • we train a byte-level BPE tokenizer. We allow tokens to extend across whitespace (excluding newline characters) so that common code idioms (e.g., import numpy as np) are represented as single tokens in the vocabulary.
  • 🧪Hyperparameters (CodeT5 770M)
    • optimizer: AdamW
      • betas: /
      • eps: /
    • batch size: /
    • context window: 2,048
    • gradient accumulation steps: /
    • warmup steps: 1,000
    • learning rate: 2e-4
    • weight decay: 0.05
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: fp16
  • 🏃‍♀️Training
    • model initialization: from scratch
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: /
    • hardware: 16 A100 GPUs with 40G memory
    • training time: 21 days
This post is licensed under CC BY 4.0 by the author.

MBPP

GPT-CC