- 📙Paper: Unified Pre-training for Program Understanding and Generation
- 📚Publisher:
NAACL
- 🏠Author Affiliation:
University of California
- 🔑Public: ✅
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
140M
;406M
- 🗂️Data pre-processing
- Data Resource
- We download all the GitHub repositories associated with Java and Python languages available on Google BigQuery
- We collect the StackOverflow posts by downloading the data dump from stackexchange
- De-duplication: ❌
- Filter Strategies
- /
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- /
- Technology
- 🧪Hyperparameters (PLBART 406M)
- optimizer: Adam
- betas: −, 0.98
- eps: 1e-6
- batch size: /
- context window:
768
- gradient accumulation steps: /
- warmup steps: /
- learning rate:
5e-5
- weight decay: /
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point:
fp16
- optimizer: Adam
- 🏃♀️Training
- model initialization: /
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: /
- hardware: 8 Nvidia GeForce RTX 2080 Ti GPUs
- training time: /
PLBART
This post is licensed under CC BY 4.0 by the author.
Recently Updated