- 📙Paper: PanGu-Coder Program Synthesis with Function-Level Language Modeling
- 📚Publisher:
arxiv
- 🏠Author Affiliation:
Huawei
- 🔑Public: ❌
- 🌐Architecture
- Encoder-Decoder
- Decoder-Only
- 📏Model Size
317M
;2.6B
- 🗂️Data pre-processing
- Data Resource
- The initial dataset was collected through GHTorrent, an online tool that collects and stores Github public event metadata. In an effort to make our pre-training data comparable to previous work, we focused exclusively on Github repositories established before May 2021.
- De-duplication: ✅
- Filter Strategies
- To establish data quality, we only kept the collected files that met the following criteria: the file size is under 1MB;
- the code is Python3 compatible, as determined through its Abstract Syntactic Tree (AST);
- there are fewer than 100 characters per line on average;
- and there are fewer than 1,000 characters in any single line.
- Data Resource
- 🍉Tokenizer
- Technology
- Byte-level Byte-Pair-Encoding (BBPE)
- SentencePiece
- Details
- /
- Technology
- 🧪Hyperparameters (PanGu-Coder 2.6B)
- optimizer: Adam
- betas: 0.9, 0.95
- eps: /
- batch size:
256
- context window:
1,024
- gradient accumulation steps: /
- warmup steps: /
- learning rate: /
- weight decay:
0.01
- decay schedule
- Cosine
- Linear
- Polynomial
- Inverse Square
- precision floating point: /
- optimizer: Adam
- 🏃♀️Training
- model initialization: from scratch
- training strategies
- left-to-right
- fill-in-the-middle
- trained tokens/steps: 387B tokens
- hardware: Nvidia V100 GPU
- training time: /
PanGu-Coder
This post is licensed under CC BY 4.0 by the author.
Recently Updated