Test Challenges Grow for 2nm, AI Chips, Chiplets
ATE executive discusses the test trends and challenges with AI chips, gate-all-around transistors, chiplets, 3D NAND and other topics.
By Mark LaPedus
Keith Schaub, vice president of technology and strategy for Advantest America, sat down with Semiecosystem to discuss the test challenges for today’s complex AI chips, gate-all-around transistors, chiplets and 3D NAND. Advantest is a major supplier of automatic test equipment (ATE).
Semiecosystem: Chip testing has been around since the inception of the semiconductor industry. Basically, a semiconductor company manufactures a chip line. The chips are assembled in a package and then tested using various ATE. Of course, this process can vary. Nonetheless, many years ago, IC test was more or less an afterthought. Over time, IC test has become a more important part of the semiconductor industry, right? If so, why is IC test important?
Schaub: Once an afterthought, IC test has become a critical component of the semiconductor industry. As semiconductor technology has advanced, the complexity and scale of ICs have increased, making effective testing essential for ensuring performance, reliability, and safety. IC testing is crucial because it verifies that chips meet design specifications, detects defects early, and ensures high yield rates during production.
As devices become more sophisticated and embedded in various critical applications, the cost of failure increases, heightening the importance of thorough testing. Additionally, rigorous IC testing helps manufacturers avoid costly recalls and maintain customer trust by delivering high-quality, reliable products. The growing demand for advanced features and miniaturization has further underscored the necessity of robust testing solutions. Thus, IC testing has become an integral part of the semiconductor industry’s quality assurance and overall success.
Semiecosystem: How has IC test changed over the years?
Schaub: IC test has undergone significant changes over the years, evolving alongside the increasing complexity of semiconductor devices. Initially, testing focused on basic functionality, but as devices grew more intricate, the need for more advanced testing methods emerged. The ATE industry responded by developing new, more capable testers to handle higher pin counts, faster speeds, and greater integration.
There was also a shift from purely functional testing to structural testing, which provides deeper insights into the integrity of the chip’s design and manufacturing process. In recent years, system-level testing (SLT) has become increasingly important, enabling verification of complete systems and ensuring interoperability of components.
Modern IC tests now include sophisticated techniques such as built-in self-test (BIST) and scan testing, enabling more thorough verification. Additionally, the rise of system-on-chip (SoC) and other advanced technologies has driven the development of testers that can handle multiple functions simultaneously. This IC testing evolution reflects the semiconductor industry’s continual pursuit of higher quality, performance, reliability, and efficiency.
(A general chip testing flow using various types of automatic test equipment (ATE). Chips are manufactured in a wafer fab (left). Then, the chips undergo a parametric test step, followed by a wafer probe step, package test and system-level test. Source: Advantest)
Semiecosystem: AI is a hot topic today. In the semiconductor world, chip vendors are developing new AI chips and accelerators. Some of these chips involve complex GPUs with large die sizes and billions of transistors. In addition, we are also seeing new and complex processors for AI and other apps. In general terms, what are the test challenges with these types of devices?
Schaub: Testing AI chips and accelerators presents several significant challenges due to their complexity and scale. These devices often feature large die sizes, billions of transistors, and dozens to hundreds of cores running at different speeds, depending on workloads. This variability increases the importance of advanced thermal management and control. Ensuring performance and reliability in such dense circuitry requires highly sophisticated ATE capable of handling high-speed interfaces and extreme thermal performance.
Hotspots are a critical concern; not only understanding where they occur but being able to predict when they’ll occur during testing is vital for effective thermal control. Comprehensive validation is needed to ensure these chips perform optimally under diverse conditions, particularly in managing power densities and minimizing thermal hotspots.
The integration of AI processors with other system components necessitates thorough SLT to verify overall system functionality and interoperability. Additionally, the rapid evolution of AI technology means testing methodologies must continually adapt to keep pace with new architectures and innovations. Overall, these challenges underscore the need for cutting-edge testing solutions to ensure the reliability and performance of AI semiconductor devices.
Semiecosystem: I assume there are different ways to test these devices. In general, how do you test the latest AI chips and/or processors?
Schaub: Testing the latest AI chips and processors involves several advanced techniques and methodologies to address their complexity and performance requirements. ATE with more channels is used to handle the increased pin counts and parallelism inherent in these devices.
BIST and design-for-testability (DFT) features are incorporated into the chips to enable self-diagnosis and facilitate efficient testing. SLT is crucial for validating the functionality and interoperability of the entire system, ensuring that the AI chips work seamlessly with other components. For example, consider testing an AI chip with hundreds of cores. Because the cores may run at different speeds, precise control and monitoring are essential. The ATE can test multiple cores in parallel, significantly speeding up the process.
During testing, thermal sensors integrated into the chip help identify hotspots. These sensors, combined with advanced thermal management algorithms, ensure that hotspots are controlled and minimized by dynamically adjusting workloads and cooling solutions. Additionally, advanced DFT techniques, such as modern scan fabrics or test over functional high-speed interfaces like PCI Express, are employed to increase test coverage while keeping test times under control. By using a combination of ATE, BIST, SLT, and DFT, manufacturers can ensure that AI chips and processors meet stringent performance and reliability standards before they reach the market.
(Advantest’s V93000 EXA Scale SoC Test System. SoC, or System-on-Chip, devices integrate multiple different functions into a single chip. The tester addresses the test needs of various advanced semiconductors. Source: Company)
Semiecosystem: I hear a lot about system-level test (SLT). What is SLT and when do you use it in the test flow?
Schaub: SLT is a comprehensive testing methodology used to validate the functionality, performance, and interoperability of semiconductor devices within their intended system environments. Unlike traditional testing methods that focus on individual components, SLT evaluates the entire system, ensuring that all integrated parts work seamlessly together under real-world conditions.
Over the years, SLT has evolved from being an optional insertion to a mandatory step in the test flow, particularly for complex devices such as AI chips, processors, and SoC solutions, where multiple functionalities and high integration levels are involved. SLT is typically used in the latter stages of the test flow after initial component-level tests have been performed.
It follows traditional tests like wafer sort, package test, and burn-in, providing an additional layer of assurance by verifying the complete system’s behavior. For example, in the case of an AI processor, SLT would involve running actual AI workloads and applications to ensure the chip performs correctly within the end-user system. This helps identify any issues related to power management, thermal behavior, and interactions with other system components that might not be detected during earlier test stages.
Semiecosystem: For AI chips and other complex devices, what types of test coverage do you need? Do the test times and test costs increase when testing these complex devices?
Schaub: For AI chips and other complex devices, comprehensive test coverage is essential to ensure their performance, reliability, and functionality. This includes:
1. Functional Testing: Verifies that the device operates according to its specifications.
2. Structural Testing: Checks the integrity of the chip’s internal structures, often using DFT techniques like scan chains and boundary-scan.
3. System-Level Testing: Ensures the device works seamlessly within its intended system, verifying overall system functionality and interoperability.
4. Thermal Testing: Monitors and manages hotspots, ensuring the chip can handle varying workloads without overheating.
5. Power Testing: Evaluates power consumption and management to ensure efficient operation and prevent failures due to power issues.
6. Performance Testing: Assesses the chip’s performance under different workloads to ensure it meets required benchmarks.
7. Reliability Testing: Includes burn-in and stress tests to identify potential long-term failures.
Testing these complex devices, initially, often leads to increased test times. The higher number of cores, intricate architectures, and the need for thorough validation require more sophisticated and time-consuming test procedures. However, incorporating DFT and BIST strategies is crucial in managing and modulating these costs. These techniques enable self-diagnosis and more efficient testing processes, reducing reliance on external testing resources.
Additionally, the continual evolution of DFT/BIST strategies, alongside enhanced ATE methodologies, helps optimize test coverage and control costs. Despite these challenges, the investment in extensive testing is crucial to ensure the quality and reliability of AI chips and other advanced semiconductor devices.
Semiecosystem: At the 3nm and/or 2nm logic nodes, leading-edge foundries and many design houses are making a major transition on the transistor front. Many are moving from today’s finFET transistors to gate-all-around (GAA) transistors. In other words, we will soon see new chips with GAA transistors at 2nm and beyond. Does that present any new challenges for test?
Schaub: The transition to gate-all-around (GAA) transistors at the 3nm and 2nm logic nodes presents several new challenges for testing. GAA transistors offer improved performance and power efficiency compared to finFETs, but their unique structure and increased density introduce complexities in test processes. One of the primary challenges is ensuring accurate characterization and validation of these advanced transistors, as their electrical properties can be more sensitive to variations in manufacturing processes.
Moreover, the increased device density at these nodes requires more sophisticated ATE with higher resolution and precision. Thermal management becomes even more critical due to the higher power densities, necessitating advanced thermal testing techniques to identify and mitigate hotspots. The integration of GAA transistors also demands enhanced DFT and BIST strategies to ensure comprehensive coverage and efficient testing processes. The rapid evolution of these technologies requires continuous updates to test methodologies to keep pace with the latest advancements in GAA transistor design and fabrication. Overall, while GAA transistors at 2nm and beyond promise significant performance benefits, they also necessitate advanced and adaptive testing solutions to address the new challenges they bring.
Semiecosystem: Chiplets are generating a lot of attention in the industry. What are some of the test challenges and ATE solutions for chiplets?
Schaub: Chiplets are generating significant attention in the semiconductor industry due to their potential to enhance performance and flexibility in chip design. However, they introduce several unique test challenges, such as ensuring seamless integration and communication between multiple chiplets within a single package, requiring rigorous testing of interconnects and interfaces.
Ensuring known good die (KGD) is critical, as a single defective chiplet can render the entire package unusable, leading to high costs. To address this, shift-left strategies are increasingly important, involving early and comprehensive testing during the design and pre-assembly phases, leveraging AI techniques to enhance test coverage and predict potential failures. The heterogeneous nature of chiplets necessitates highly adaptable ATE capable of handling diverse test requirements.
Additionally, SLT is crucial to verify the functionality and interoperability of the combined chiplets under real-world conditions. Thermal management and power delivery are critical, as multiple chiplets within a confined space can lead to hotspots and power distribution issues. Advanced thermal testing techniques and power analysis are required to identify and mitigate these problems. ATE solutions are evolving to provide higher channel counts, greater flexibility, and improved precision. DFT features, such as BIST and boundary-scan, are increasingly integrated into chiplets to facilitate efficient testing. Overall, while chiplets offer exciting possibilities for innovation, their successful implementation hinges on advanced and flexible ATE solutions, ensuring KGD, and employing shift-left strategies enhanced by AI.
Semiecosystem: Last year, Advantest introduced a new memory test system for NAND flash memory. What are the general challenges with testing 3D NAND devices? What do the new testers bring to the party?
Schaub: Testing 3D NAND devices presents several challenges due to their complex architecture and high-density design. One primary challenge is ensuring the accuracy and reliability of tests across the multiple layers of memory cells stacked vertically in 3D NAND. These layers increase the complexity of signal integrity, power delivery, and thermal management during testing. Additionally, as cell dimensions shrink, variability and susceptibility to defects increase, requiring more precise and thorough testing methodologies.
Advantest’s new memory test system, the T5835, addresses these challenges by providing advanced capabilities tailored for 3D NAND testing. The system offers higher channel counts and improved precision, allowing for comprehensive testing of each memory layer. It includes enhanced signal integrity features to manage the intricate interconnections within 3D NAND structures. Advanced thermal management capabilities ensure accurate temperature control and hotspot identification, which is crucial for maintaining device reliability.
Moreover, the new tester integrates sophisticated error detection and correction algorithms to identify and mitigate potential defects early in the test process. It also supports high-speed and parallel testing, significantly reducing test times and improving throughput. Overall, Advantest’s new memory test system brings enhanced accuracy, efficiency, and reliability to the testing of 3D NAND devices, ensuring they meet stringent performance and quality standards.
Semiecosystem: What else?
Schaub: The industry is also moving toward higher bandwidth to meet the requirements of AI technology. Given today’s market acceleration for AI and large language models (LLMs), many expect to see AI inference processing and disaggregation in data centers. Next-generation storage with serial interfaces and multi-lanes must be designed to cover higher bandwidth. Moreover, embedded NAND is in a similar situation, accelerating the same trend toward higher bandwidth by supporting edge computation and AI inference processing in the PC/mobile market. Demand for higher speed per die, including interleaving at the interface speed of the 3D NAND die used in these applications, is expected to be more critical, especially in consumer storage (UFS, PCIe BGA) for AI-PC/mobile due to a limited device form factor and mounting area.
Semiecosystem: Maybe you would like to briefly discuss what’s happening with Advantest’s V93000 test system.
Schaub: The V93000 single scalable SoC test platform is continually extended to address the evolving test needs of advanced semiconductors. For very large high-performance computing (HPC) and AI devices, the real estate on existing DUT boards is not always sufficient to accommodate all the external components required for test. The DUT Scale Duo interface available on the V93000 helps with this by increasing the available space by 66% for large DUT interface boards, while maintaining compatibility to existing DUT boards.
For scan test, the tester must efficiently move scan data in and out of the DUT. Test patterns must be deployed at the KGD testing step and at the final test of the complete heterogeneously integrated device. To meet this requirement, Advantest offers the Pin Scale 5000, which provides three gigavectors per pin at up to 5 Gbit/sec.
Compression and memory pooling can scale this capacity to hundreds of gigavectors. As interface speeds between ICs are growing, the digital channel cards for the V93000 have been enhanced step by step. The latest Pin Scale Multilevel Serial card can test up to 32GBaud, including NRZ, PAM3 and PAM4.
Advantest now offers the XHC32 high-current power supply, which delivers up to 640A per card and is an ideal complement to the widely used XPS256 universal power supply. In combination, they can serve the needs of power-hungry high-performance digital devices for years to come.