Home Codex
Post
Cancel

Codex

  • 📙Paper: Evaluating Large Language Models Trained on Code
  • 📚Publisher: Arxiv
  • 🏠Author Affiliation: OpenAI
  • 🔑Public: ❌
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 12M; 25M; 42M; 85M; 300M; 679M; 2.5B; 12B
  • 🗂️Data pre-processing
    • Data Resource
      • May 2020 from 54 million public software repositories hosted on GitHub, containing 179 GB of unique Python files under 1 MB.
    • De-duplication: ✅
    • Filter Strategies
      • average line length greater than 100
      • maximum line length greater than 1000
      • contain a small percentage of alphaunmeric characters
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • GPT3 tokenizer+additional set of tokens for representing whitespace runs of different lengths
  • 🧪Hyperparameters (Codex 12B)
    • optimizer: Adam
      • betas: /
      • eps: /
    • batch size: 2M
    • context window: 4,096
    • gradient accumulation steps: /
    • warmup steps: 175
    • learning rate: 1e-4
    • weight decay: 0.1
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: /
  • 🏃‍♀️Training
    • model initialization: GPT3
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: 100B tokens
    • hardware: V100
    • training time: /
This post is licensed under CC BY 4.0 by the author.

APPS

HumanEval