- 📙Paper: CodeXGLUE A Machine Learning Benchmark Dataset for Code Understanding and Generation
- 📚Publisher:
NeurIPS
- 🏠Author Affiliation:
Microsoft
- 🔑Public: ✅
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
124M
- 🗂️Data pre-processing
- Data Resource
- We pretrain monolingual models on Python and Java corpora from the CodeSearchNet dataset.
- De-duplication: ❌
- Filter Strategies
- /
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- Optional 1: Newly obtained on the code corpus
- Optional 2: GPT-2 tokenizer
- Technology
- 🧪Hyperparameters (CodeGPT 124M)
- optimizer: Adam
- betas: /
- eps: /
- batch size:
32
- context window:
768
- gradient accumulation steps: /
- warmup steps: /
- learning rate:
5e-5
- weight decay: /
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point: /
- optimizer: Adam
- 🏃♀️Training
- model initialization: GPT-2
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: /
- hardware: 2 P100 GPUs
- training time: /
CodeGPT
This post is licensed under CC BY 4.0 by the author.
Recently Updated