LLM training involves Transformer architectures, massive datasets (requiring cleaning), and costly compute. Scaling laws guide resource allocation. Pre-training focuses on language modeling; post-training (SFT, RLHF/DPO) aligns models with user intent. Evaluation challenges exist beyond perplexity. Optimal token-to-parameter ratios and high costs are highlighted.