GPT-C

Posted May 16, 2020 Updated Jan 29, 2023

By Anonymous

1 min read

📙Paper: IntelliCode compose code generation using transformer
📚Publisher: FSE
🏠Author Affiliation: Microsoft
🔑Public: ❌
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 366M
🗂️Data pre-processing
- Data Resource
  - We collect a large unsupervised source code dataset to train and evaluate the code sequence completion model. It comprises over 1.2 billion lines of source code in Python, C#, JavaScript and TypeScript programming languages. A total of over 52000 top-starred (non-fork) projects in GitHub has been selected, containing libraries from a diverse set of domains, with over 4.7 million source code files.
- De-duplication: ❌
- Filter Strategies
  - /
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - /
🧪Hyperparameters (GPT-C 366M)
- optimizer: Adam
  - betas: /
  - eps: /
- batch size: /
- context window: 1,024
- gradient accumulation steps: /
- warmup steps: /
- learning rate: 6.25e-5
- weight decay: /
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: /
🏃‍♀️Training
- model initialization: from scratch
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: /
- hardware: 5 Lambda V100 boxes, each having sixteen V100 GPUs with 32 GB HBM2 memory
- training time: /

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.