Home CodeParrot
Post
Cancel

CodeParrot

  • 📙Paper: CodeParrot
  • 📚Publisher: other
  • 🏠Author Affiliation: huggingface
  • 🔑Public: ✅
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 110M; 1.5B
  • 🗂️Data pre-processing
    • Data Resource
      • CodeParrot dataset
    • De-duplication: ✅
    • Filter Strategies
      • > 1MB
      • max line length > 1000
      • mean line length > 100
      • fraction of alphanumberic characters < 0.25
      • containing the word “auto-generated”
      • similar in the first 5 lines
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • Trained GPT-2 tokenizer on the training split
  • 🧪Hyperparameters (CodeParrot 1.5B)
    • optimizer: AdamW
      • betas: 0.9, 0.999
      • eps: 1e-8
    • batch size: 512 or 524K tokens
    • context window: 1,024
    • gradient accumulation steps: 16
    • warmup steps: 750
    • learning rate: 5e-5
    • weight decay: 0.1
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: /
  • 🏃‍♀️Training
    • model initialization: from scratch
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: 30K steps or 41B tokens
    • hardware: 16 x A100 (40GB)
    • training time: /
This post is licensed under CC BY 4.0 by the author.

GPT-CC

JigsawDataset