Large Language Models: The Rise of Data
White Paper
Published June 2024
Large Language Models (LLM) have become widely adopted for Natural Language Processing (NLP) applications because of the greater accuracy, broader vocabulary, and richer creativity over more traditional language tools. LLM-powered platforms such as ChatGPT, MidJourney, and Google Bard are demonstrating the potential power of these more complex tools in real-world applications.
These large-scale generative AI models with billions or even trillions of parameters can consistently outperform fine-tuned models given the same number of training steps. Between 2019 and 2023, LLMs have seen a surge in model size, growing by 1000x in just a few years. However, model training and inference at this scale can be challenging, with enormous pressure on memory resources.
In this white paper, we take a platform approach to accelerate LLMs and show how GPU, memory, and storage collectively contribute to overall LLM performance. We also look at how LLMs can be made more efficient by optimizing resource usage and how system throughput is the key element to performance for techniques such as offloading GPU models. By comparing different types of model offloading, we show how DDN’s A³I parallel storage (AI400X2) can speed up LLM inference throughput:
- x16 faster compared with traditional NFS storage;
- 1.8x faster than the local RAID storage of a typical GPU-accelerated system;
- Matching the performance of offloading to main CPU memory in some scenarios.
This white paper will further demonstrate how DDN can help customers anticipate the resource requirements and increase model size driven by transformers and NLP, and show how storage performance could become an accelerator for LLM performance as models and evolve.