CodeParrot

Posted Nov 1, 2021 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: CodeParrot
📚Publisher: other
🏠Author Affiliation: huggingface
🔑Public: ✅
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 110M; 1.5B
🗂️Data pre-processing
- Data Resource
  - CodeParrot dataset
- De-duplication: ✅
- Filter Strategies
  - > 1MB
  - max line length > 1000
  - mean line length > 100
  - fraction of alphanumberic characters < 0.25
  - containing the word “auto-generated”
  - similar in the first 5 lines
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - Trained GPT-2 tokenizer on the training split
🧪Hyperparameters (CodeParrot 1.5B)
- optimizer: AdamW
  - betas: 0.9, 0.999
  - eps: 1e-8
- batch size: 512 or 524K tokens
- context window: 1,024
- gradient accumulation steps: 16
- warmup steps: 750
- learning rate: 5e-5
- weight decay: 0.1
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: /
🏃‍♀️Training
- model initialization: from scratch
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: 30K steps or 41B tokens
- hardware: 16 x A100 (40GB)
- training time: /

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.