FIM

Posted Jun 28, 2022

By Anonymous

1 min read

📙Paper: Efficient Training of Language Models to Fill in the Middle
📚Publisher: arxiv
🏠Author Affiliation: OpenAI
🔑Public: ❌
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 50M; 77M; 164M; 411M; 844M; 1.4B; 2.8B; 6.9B
🗂️Data pre-processing
- Data Resource
  - Same with Codex: which is a 159 GB Python dataset scraped in May 2020.
- De-duplication: ✅
- Filter Strategies
  - We filtered out files which were likely auto-generated: average line length greater than 100;
  - maximum line length greater than 1000;
  - contain a small percentage of alphaunmeric characters.
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - Same with Codex: GPT3 tokenizer+additional set of tokens for representing whitespace runs of different lengths
🧪Hyperparameters (FIM 6.9B)
- optimizer: Adam
  - betas: /
  - eps: /
- batch size: 2M
- context window: 2,048
- gradient accumulation steps: /
- warmup steps: /
- learning rate: 2.4e-4
- weight decay: /
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: /
🏃‍♀️Training
- model initialization: from scratch
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: 100B tokens
- hardware: /
- training time: /

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.