Trends, Challenges With Solid-State Drives For AI
An executive from Solidigm discusses the trends for solid-state drives (SSDs) in the AI era.
By Mark LaPedus
Greg Matson, senior vice president of strategic planning and marketing at Solidigm, sat down with Semiecosystem to discuss the latest trends with solid-state drives (SSDs), a storage technology used in general-purpose and AI servers as well as other products.
Semiecosystem: Who is Solidigm and what is the company’s charter?
Matson: Solidigm is a leading global provider of NAND flash memory solutions and enterprise SSDs. Our roots are in storage, and we have been at the forefront of storage innovation for decades. Now, our solid-state drives are making a major impact across the industry, as we’ve become the leader in SSDs for AI deployments and have the broadest portfolio of data center SSDs for performance and high capacity. In fact, Solidigm is the QLC (quad-level cell) density leader with more than 100EB (exabytes) of QLC-based product shipped since 2018. Our data center SSDs are also optimized for real-world performance and have industry-leading quality and reliability.
Solidigm’s new 122TB (terabyte) D5-P5336 data center SSD. The SSD has enough storage capacity for 4K-quality copies of every movie theatrically released in the 1990s roughly 2.6 times over. Source: Company
Semiecosystem: SSDs are used in servers, PCs and other products. Can you talk about the overall trends in storage and SSDs in the general server market? That involves non-AI servers. How many SSDs are incorporated in a typical server and how much storage capacity is involved here?
Matson: It depends on the individual server and its storage needs. The number of SSDs deployed in a server can vary greatly by user and usage. Typical configurations may be 4 to 8 SSDs, with an average density for Solidigm customers of about 6 terabytes (TB). That’s higher than the industry average.
On average, Solidigm QLCs can store six times as much data as broadly available HDDs (hard disk drives) and twice as much data as TLC SSDs in the same space. Compared to widely adopted TLC SSDs, Solidigm D5 Series SSDs deliver equivalent read performance and strong write performance with ample lifetime write endurance for mainstream and read-intensive workloads.
Meanwhile, our newly introduced 122.88TB drives that are shipping to customers now will allow for up to 4 petabytes of storage per single rack unit.
Semiecosystem: Do you see the storage capacities or the SSD specs changing in the general server market over the next year or two?
Matson: Absolutely. As general storage needs increase with the ever-growing amounts of data that users are producing and storing, SSD providers will need to evolve in order to keep up. This applies to general use servers just as much as those focusing on AI. General purpose servers continue to be vitally important for enterprises as they support a wide range of workloads like company databases, email servers and internal networks. For consumer use cases, our social media activity, internet banking and online shopping habits are all powered by data stored in the cloud.
As we generate more and more data, either in work or in our personal lives, we’ll need to focus on SSDs with higher storage capabilities. SSDs can cut down on energy use significantly as they allow for up to 80% less storage power and up to four times fewer storage racks than HDDs at the core and at the edge, all while enabling high performance and efficiency. With AI adoption booming, it’ll be even more crucial for organizations to focus on cutting down on storage power and space to ensure they have the capacity to handle advancing AI workloads.
Semiecosystem: AI, particularly generative AI (GenAI), is a hot topic. For this market, data centers are using AI or GPU servers that incorporate the latest GPUs and other devices. Can you talk about the overall trends in storage and SSDs for AI and GPU servers?
Matson: A typical GPU server has eight storage slots inside, and most deployments today fill those with 4TB or 8TB SSDs, although we see the capacities going up over time. Across a standard AI cluster made up of 32 GPU servers, that’s 256 drives for a total of between 1 to 2 petabytes of SSD storage. Of course, big AI data centers use a great many more GPU servers than that.
Inside the GPU server is where you need the absolute fastest storage technology to keep GPUs highly utilized during AI model training. For that reason, we see many model developers going with PCIe 5.0 SSDs like the Solidigm D7-PS1010, which is capable of random read speeds up to 3.1 million IOPS. High sequential write performance is also key, because long training runs usually require periodic checkpointing – writing the in-flight state of the model to disk, in case of disaster – which leaves GPUs idle for as long as it takes to write the checkpoints. The D7-PS1010 offers excellent performance in that regard, capable of up to 14.5 GB/s.
Legacy storage solutions like HDDs lack the scalability, performance, reliability, and data management capabilities needed for modern AI applications, leading to inefficiencies as AI projects progress. When working with AI workloads, to ensure you’re getting the most power and performance while saving on energy costs, SSDs are essential.
Semiecosystem: Besides AI servers, data centers also use network-attached storage (NAS) units. What is a NAS unit and what are the storage/SSD trends here?
Matson: Network-attached storage (NAS) is a way for data to be stored and then shared across multiple users and devices. It can be easy to use for high capacity and low cost, but it should not be relied on as a solo storage option.
Legacy NAS deployments typically consist of higher performance SSDs, providing a management and speed tier in conjunction with lower cost and less reliable HDDs, also known as a hybrid environment. This hybrid architecture comes with added layers of latency, which can reduce bandwidth for NAS units, even when working with high-speed networks. The maintenance associated with these units can also be a burden and may require specialized knowledge.
Compared to hybrid arrays using TLC and HDDs, all-QLC NAS solutions can improve power efficiency for new AI data center builds by up to 80% and reduce storage footprint by up to 4X.
Solidigm’s newly introduced 122.88TB D5-P5336 drives offer the highest storage capacity with up to for 4 petabytes of storage per single rack unit. This allows data centers to cut total cost of ownership by nearly 50% and reduce energy costs compared to TLC SSD and hybrid storage options. Our new drives can also deliver read performance more than twice as fast as some TLC SSDs and nearly 10 times faster reads than leading SAS HDDs.
Semiecosystem: Today, we hear that the large language models (LLMs) for GenAI are growing 10x per year. What does that mean for storage and SSDs over the next year and beyond?
Matson: Generative AI use has grown exponentially in the past two years, and data from Statista shows that the market for generative AI is expected to continue to grow at an annual rate of greater than 46%. As consumer and enterprise use grows, so does its need for storage. Organizations looking to get the most out of their AI investments don’t have time to waste with slow data access speeds or power and space hungry legacy storage configurations.
HDDs lack the scalability, performance, reliability, data management capabilities and, most importantly, the efficiency that modern solutions provide. This is increasingly becoming more and more of an issue as organizations begin deploying AI solutions, or developing new AI models, and will continue to cause issues as these organizations get further along in the AI journey.
High-performance, high-capacity SSDs help propel the computing speeds and storage capabilities necessary to store efficiently and access massive data sets at speed, which is absolutely crucial for powering generative AI workloads.
Semiecosystem: What are the challenges in terms of keeping up with the SSD/storage demands for GenAI?
Matson: Two of the greatest storage demands for generative AI are space and power efficiency. High-capacity SSDs are the most efficient option in both cases. Generative AI requires both large amounts of computing power and long-term storage. When data centers are able to fit more drives in their racks, as they can with SSDs, they’re able to increase their storage per square foot, improving power efficiency and freeing up space. SSDs allow for 80% less storage power and four times fewer storage racks than HDDs at the core and at the edge, all while enabling high performance and efficiency. While it takes five hard drives to support one GPU, one SSD can easily support the computing power of as many as eight GPUs.
Semiecosystem: What is Solidigm’s relationship with SK hynix?
Matson: They are our parent company. Solidigm was created when SK hynix purchased Intel’s NAND and SSD business in December 2021. We work in a very collaborative model that leverages each company’s strengths.
Semiecosystem: Who is leading Solidigm?
Matson: Dave Dixon and Kevin Noh are Solidigm’s co-CEOs and bring complementary strengths to the role. Dave came from Intel and has deep experience in NAND technology and SSD engineering. Kevin has been a tenured leader at SK hynix and has extensive M&A and business development experience.
Semiecosystem: Can you provide a brief description of your latest product?
Matson: Our latest product is an improved version of our D5-P5336 data center SSD that doubles the storage of our earlier 61.44TB version. As AI adoption is increasing, data center architects are struggling to power these workloads with the available power and space. The 122TB D5-P5336 makes every watt and every inch in those data centers count with storage efficiency from core data center to the edge.
(Send comments to: mdlapedus@gmail.com Semiecosystem reserves the right to post and edit comments.)