Home LLaMA
Post
Cancel

LLaMA

  • 📙Paper: LLaMA Open and Efficient Foundation Language Models
  • 📚Publisher: arxiv
  • 🏠Author Affiliation: Meta AI
  • 🔑Public: √
  • 🌐Architecture
    • Encoder-Decoder
    • Decoder-Only
  • 📏Model Size
    • 6.7B; 13.0B; 32.5B; 65.2B
  • 🗂️Data pre-processing
    • Data Resource
      • Google BigQuery
    • De-duplication: √
    • Filter Strategies
      • We deduplicate the resulting dataset at the file level, with exact matches.
  • 🍉Tokenizer
    • Technology
      • Byte-level Byte-Pair-Encoding (BBPE)
      • SentencePiece
    • Details
      • /
  • 🧪Hyperparameters (LLaMA 65.2B)
    • optimizer: AdamW
      • betas: 0.9
      • eps: 0.95
    • batch size: 4M tokens
    • context window: /
    • gradient accumulation steps: /
    • warmup steps: 2,000
    • learning rate: 1.5e-4
    • weight decay: 0.1
    • decay schedule
      • Cosine
      • Linear
      • Polynomial
      • Inverse Square
    • precision floating point: /
  • 🏃‍♀️Training
    • model initialization: /
    • training strategies
      • left-to-right
      • fill-in-the-middle
    • trained tokens/steps: 1.4T tokens
    • hardware: 2048 A00 GPU with 80GB of RAM
    • training time: training over our dataset containing 1.4T tokens takes approximately 21 days.
This post is licensed under CC BY 4.0 by the author.

svGen

MIM