Home ERNIE-Code
Post
Cancel

ERNIE-Code

  • 📙Paper: ERNIE-Code Beyond English-Centric Cross-lingual Pretraining for Programming Languages
  • 📚Publisher: arxiv
  • 🏠Author Affiliation: Baidu
  • 🔑Public: ✅ (promise)
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 560M
  • 🗂️Data pre-processing
    • Data Resource
      • CodeSearchNet
      • CC-100
      • OPUS
      • MultiUN
      • IIT
      • OPUS
      • WikiMatrix
    • De-duplication: ❌
    • Filter Strategies
      • /
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • We add a set of tokens representing whitespace indentation of different lengths in PL
  • 🧪Hyperparameters (ERNIE-Code 560M)
    • optimizer: AdaFa
      • betas: /
      • eps: /
    • batch size: a micro-batch size of 8/4
    • context window: 1,024
    • gradient accumulation steps: 15
    • warmup steps: 1,000
    • learning rate: 1e-4
    • weight decay: /
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: bf16
  • 🏃‍♀️Training
    • model initialization: mT5
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: 100k steps
    • hardware: 32 NVIDIA A100 GPUs with 40G memory
    • training time: 4 weeks
This post is licensed under CC BY 4.0 by the author.

DS-1000

Vgen