PLBART

Posted Mar 10, 2021 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: Unified Pre-training for Program Understanding and Generation
📚Publisher: NAACL
🏠Author Affiliation: University of California
🔑Public: ✅
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 140M; 406M
🗂️Data pre-processing
- Data Resource
  - We download all the GitHub repositories associated with Java and Python languages available on Google BigQuery
  - We collect the StackOverflow posts by downloading the data dump from stackexchange
- De-duplication: ❌
- Filter Strategies
  - /
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - /
🧪Hyperparameters (PLBART 406M)
- optimizer: Adam
  - betas: −, 0.98
  - eps: 1e-6
- batch size: /
- context window: 768
- gradient accumulation steps: /
- warmup steps: /
- learning rate: 5e-5
- weight decay: /
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: fp16
🏃‍♀️Training
- model initialization: /
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: /
- hardware: 8 Nvidia GeForce RTX 2080 Ti GPUs
- training time: /

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.