LLaMA

Posted Feb 24, 2023 Updated Feb 25, 2023

By Anonymous

1 min read

📙Paper: LLaMA Open and Efficient Foundation Language Models
📚Publisher: arxiv
🏠Author Affiliation: Meta AI
🔑Public: √
🌐Architecture
- Encoder-Decoder
- Decoder-Only
📏Model Size
- 6.7B; 13.0B; 32.5B; 65.2B
🗂️Data pre-processing
- Data Resource
  - Google BigQuery
- De-duplication: √
- Filter Strategies
  - We deduplicate the resulting dataset at the file level, with exact matches.
🍉Tokenizer
- Technology
  - Byte-level Byte-Pair-Encoding (BBPE)
  - SentencePiece
- Details
  - /
🧪Hyperparameters (LLaMA 65.2B)
- optimizer: AdamW
  - betas: 0.9
  - eps: 0.95
- batch size: 4M tokens
- context window: /
- gradient accumulation steps: /
- warmup steps: 2,000
- learning rate: 1.5e-4
- weight decay: 0.1
- decay schedule
  - Cosine
  - Linear
  - Polynomial
  - Inverse Square
- precision floating point: /
🏃‍♀️Training
- model initialization: /
- training strategies
  - left-to-right
  - fill-in-the-middle
- trained tokens/steps: 1.4T tokens
- hardware: 2048 A00 GPU with 80GB of RAM
- training time: training over our dataset containing 1.4T tokens takes approximately 21 days.

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

Trending Tags

models benchmarks products papers resources evaluation Getting Started LLMs surveys tools

A new version of content is available.