CodeGPT

Posted Feb 9, 2021 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: CodeXGLUE A Machine Learning Benchmark Dataset for Code Understanding and Generation
📚Publisher: NeurIPS
🏠Author Affiliation: Microsoft
🔑Public: ✅
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 124M
🗂️Data pre-processing
- Data Resource
  - We pretrain monolingual models on Python and Java corpora from the CodeSearchNet dataset.
- De-duplication: ❌
- Filter Strategies
  - /
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - Optional 1: Newly obtained on the code corpus
  - Optional 2: GPT-2 tokenizer
🧪Hyperparameters (CodeGPT 124M)
- optimizer: Adam
  - betas: /
  - eps: /
- batch size: 32
- context window: 768
- gradient accumulation steps: /
- warmup steps: /
- learning rate: 5e-5
- weight decay: /
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: /
🏃‍♀️Training
- model initialization: GPT-2
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: /
- hardware: 2 P100 GPUs
- training time: /

This post is licensed under CC BY 4.0 by the author.