AI Revises Chip Designs When Requirements Change

TL;DR

A new AI framework updates hardware code in small steps using traceability links, cutting token usage by 23% without full regeneration.

In the fast-paced world of chip design, requirements often evolve after the initial specifications are set, forcing engineers to repeatedly rewrite code from scratch. This process is not only time-consuming but also prone to errors, as even minor changes can cause the entire design to drift away from its original structure. A new AI-driven framework called IncreRTL addresses this by enabling incremental updates to hardware code, ensuring that only the affected parts are modified while preserving the overall design integrity. This approach could significantly accelerate hardware development cycles and reduce costs, making it a practical tool for real-world engineering deployment.

The key finding from the research is that IncreRTL achieves higher consistency and efficiency in updating Register-Transfer Level (RTL) code compared to traditional s. By constructing traceability links between natural-language requirements and Verilog code segments, the framework can pinpoint exactly which parts of the design need to be changed when requirements evolve. In experiments, IncreRTL demonstrated a consistency score of 0.8123, outperforming baseline s that scored 0.7337 for full-code regeneration and 0.2684 for direct generation without context. This means the AI maintains the original design's structure and interfaces more effectively, reducing the risk of unintended changes that could break the hardware functionality.

Ology behind IncreRTL involves three main stages: splitting and structuring the code, constructing traceability links, and performing regeneration and integration. First, the Verilog code is parsed into syntax-preserving blocks, such as port declarations or logic sections, to create manageable units. Then, large language models (LLMs) are used to build semantic alignments between atomic requirements and these code blocks, scoring links based on lexical and semantic matching as detailed in Algorithm 1. Finally, guided by a task template shown in Figure 3, the LLM regenerates only the impacted code snippets, which are then merged back into the original design using a line-offset-based mechanism. This localized approach minimizes token usage and prevents structural drift, as illustrated in Figure 4 where a change from a 32-bit to a 24-bit output only updates the relevant fragment.

From the EvoRTL-Bench benchmark, which includes 30 design modules and 120 requirement-change instances, show that IncreRTL reduces relative token usage to 1.46 times that of direct generation, compared to 1.8 times for full-code regeneration. This represents a 23.29% reduction in tokens, making the process more computationally efficient. The framework also maintains strong performance across different types of requirement changes, with consistency scores ranging from 0.6846 for control configuration changes to 0.8109 for interface protocol changes, as shown in Table 3. For major changes like module structural refactoring, consistency drops to 0.6142, but syntax accuracy remains high at 78.26%, indicating 's scalability. Additionally, IncreRTL generalizes well across various LLMs, including GPT-5 and Claude-Sonnet-4.5, consistently improving correctness and stability as depicted in Figure 5.

Of this research are significant for the hardware industry, where evolving requirements are common due to specification updates or performance adjustments. By enabling incremental updates, IncreRTL can cut down on the time and resources needed for chip design, allowing engineers to focus on innovation rather than repetitive coding. The framework's ability to maintain consistency reduces debugging efforts and testing costs, which are critical in safety-sensitive applications like defense or automotive systems. Moreover, the introduction of EvoRTL-Bench provides a standardized way to evaluate AI tools in this domain, fostering further advancements in automated hardware design.

Despite its strengths, IncreRTL has limitations that the paper acknowledges. The framework performs less effectively on major requirement changes, such as system interface refactoring, where functional correctness rates can drop to around 44%, as seen in Table 3. This is because large-scale modifications involve complex dependencies that are harder to localize. Additionally, the traceability links require manual validation to ensure accuracy, which adds overhead, and relies on the quality of the LLM's semantic understanding, which may vary across models. The benchmark also includes instances that fail original testbenches, highlighting s in capturing all behavioral deviations. Future work could address these issues by enhancing link automation and handling more intricate design evolutions.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn