System enables automated repair in hardware designs

The new framework lets developers cut down on time spent designing bug fixes for hardware specifications, adapting techniques now used widely in software development.
Circuit board

Researchers at the University of Michigan have devised a framework for automatically repairing defects in computer hardware designs. The system, called CirFix (for circuit fix), works with designs implemented in hardware specification languages like Verilog, enabling hardware designers to automatically generate potential bug repairs for a number of common problems.

CirFix was inspired by techniques that have long been available to software developers, say paper authors Hammad Ahmad, Yu Huang, and Prof. Westley Weimer, and provides successful repair rates comparable to their software counterparts.

“There’s been over a decade of research on automated program repair for software and none of that had been applied to hardware descriptions yet,” says CSE PhD student Ahmad, lead author on the paper. 

Automated program repair has become a useful time saver in the software world by cutting down on developer effort needed to work out a bug fix. While automated tools don’t necessarily completely fix all bugs, they help developers narrow down the potential issues and provide candidate repairs that can be improved faster than they could have been written by hand.

But according to Ahmad, certain key differences in the design processes of software and hardware have kept these techniques from crossing over. First, it’s much easier to conceive of test cases with clear-cut passes and fails for a piece of software than it is for hardware. A program will have inputs with measurably correct outputs, but hardware has a more complex concept of correct behavior. In the case of software, repairs are usually assigned a “fitness” value defined as a ratio of correct test cases.

Second, software has clear execution paths that can be used to find out where things went wrong (a process called fault localization). Many automated techniques look at states, or snapshots, of the software mid-execution to assess things like variable values for correctness. This doesn’t map well to hardware, where the entire circuit is typically in use for the duration of a simulation.

The first issue is a barrier to assessing a repair’s correctness, and the second is a barrier to determining where issues arise in the first place.

“That’s one of the biggest things that automated program repair needs,” says Ahmad. “Every time you propose a patch, it checks whether it passes or fails. If you pass all the test cases then you know that you’ve made a plausible repair.”

CirFix aims to provide a means to judge the correctness of a circuit based on certain recorded metrics, bridging the gap between test cases available to software. 

CirFix first adds a logging process to design testbeds, recording the state of different processor registers and wires at different points in a simulation. Once these states and values are recorded, the system can compare them against an annotation of how the circuit is supposed to behave at that timestamp. This annotation, what the authors call an oracle, is produced by a developer — but, they argue, it can ultimately save time further along in the process.

“Once this expected circuit behavior is produced, you can use it for any defect that you might encounter,” says Ahmad. In cases where CirFix finds a repair, this cuts future developer effort down to simply assessing and improving the automated repair. Previously, they would have to do the work of tracking down buggy behavior on their own and formulating a fix from scratch.

Under the hood, CirFix tallies up bits at each timestamp that don’t match the oracle, records where and when the mismatches happen, and finally applies a fitness value based on total errors and pre-assigned importance weights given to different bits. The system returns a value from 0 to 1 indicating how well the circuit’s behavior matches the oracle. Developers can change how often the system measures matches and can provide more or less detail in their oracle depending on how granular they want their results to be.

The system provided plausible repairs in 21 of the 32 defect scenarios in the authors’ testbed, and completely correct repairs in 16. 

Ahmad hopes to continue this application of automated software repair to hardware, arguing that the domain could potentially benefit even more given the necessity of airtight design before deployment.

“It’s borderline impossible to fix a circuit after deployment,” he says. “Once you deploy a processor model, you’re not going to call it back and fix everything.”

In the future he plans to apply CirFix to larger real-world defect scenarios, as well as adapt it to hardware synthesis languages rather than simulation languages like Verilog. He expects that, much like automated software repair when it was new, there may be challenges to early adoption.

“But over time and as things evolved, enough developers became receptive to the point that Facebook actually has their own in-house automated program repair algorithm that they use for bugs in software.”

CirFix was presented at the 2022 ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 22) in the paper “CirFix: automatically repairing defects in hardware design code.”