Enterprise AI Analysis

Approximate Computing in High-Level Synthesis: From Survey to Practical Implementation

Leveraging AI for approximate computing, this analysis offers a strategic overview for enterprise leaders on optimizing high-level synthesis.

Schedule Your Strategy Session

Executive Impact Summary

Key metrics reflecting the potential gains and considerations for implementing approximate computing within your enterprise.

0 Total Citations

70 Total Downloads

2 Years Until Publication

455 Papers in 2024

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction to Approximate Computing

HLS & Approximation Stages

Approximation Primitives

Strategic Implementation

Future Directions & Pitfalls

Core Concept: Approximate Computing

Approximate computing reduces power, area, and performance by tolerating errors in non-critical applications like image processing, Digital Signal Processing (DSP), and machine learning. It simplifies hardware/software designs, with error effects measured through simulation against predefined thresholds.

455 Papers in 2024 (IEEE Xplore)

Research on approximate computing has seen rapid growth, with a significant increase in publications since 2004, highlighting its rising importance, especially for data-intensive applications. Figure 1 illustrates this trend, showing 455 papers published in 2024 alone on IEEE Xplore.

20dB PSNR for Higher Savings

Using edge detection as an example, approximate computing demonstrates that relaxing output image quality constraints (e.g., PSNR to 20 dB from 40 dB) can lead to significantly lower power circuits and larger design savings. This highlights the critical trade-off designers must manage.

HLS as a Key Abstraction Level for Approximation

This survey specifically focuses on High-Level Synthesis (HLS) because approximations applied at this highest abstraction level yield the most significant impact on the resultant circuit's area, power, and performance. HLS is now widely adopted for designing hardware accelerators, which are often ideal candidates for approximation.

Enterprise Process Flow

Application (C/C++)

→

HW/SW Partition

→

High-Level Synthesis (HLS)

→

RTL Generation (Verilog/VHDL)

→

Logic Synthesis

→

Place and Route

→

System-on-chip (SoC)

The HLS process transforms behavioral C/C++ descriptions into efficient hardware, forming a critical part of SoC design. Approximations can be introduced at various stages, as illustrated in the overall design flow, with HLS offering a key point of intervention.

Iterative HLS Approximation Flow

Approximate Design (C_apprx)

→

High-Level Synthesis

→

Compute Error (MAPE/PSNR)

→

Error Max Met? (Yes/No)

A typical HLS approximation flow is iterative: apply an approximation, synthesize, compute error metrics (MAPE/PSNR) against test vectors and golden outputs, and repeat until the maximum allowable error (Emax) is met, identifying the smallest or lowest power design.

Key Error Metrics for Approximate Computing

Accurate measurement of approximation impact requires domain-specific error metrics. Commonly used metrics include Mean Absolute Percentage Error (MAPE) for DSP, Peak Signal to Noise Ratio (PSNR) for image processing, and Bit Error Rate (BER) for communication applications. These quantify output degradation and guide design decisions.

Importance of Training Data

The reliability of approximate circuits heavily depends on the training data used. Dynamic workloads differing from training data can lead to unacceptable errors. Using uniform random data for worst-case analysis or specific distributions can affect the area-error trade-off. Hence, the choice of training data is critical.

Efficient Hardware Implementation Algorithms (Approximation 1)

Algorithm	Advantages	Disadvantages	Best Used For
CORDIC	HW efficient (only shifts/adds)	Slow (iterative) Limited range	FPGA/ASICs Low-power embedded
Polynomial Apprx. (Taylor, Chebyshev)	High accuracy Flexible precision	High computational cost Requires multipliers	SW implementations GPUs
LUTs	Extremely fast Low latency	Memory-intensive Limited precision	Low-precision applications Real-time applications
Bipartite Tables	Reduced memory vs. LUTs Good accuracy	Complex address decoding	Medium-precision FPGA designs
Newton-Raphson Iteration	Quadratic convergence High accuracy	Needs multipliers Divisions	High-precision SW

Approximation 1 involves selecting efficient hardware-friendly algorithms. For trigonometric functions, alternatives like CORDIC, polynomial approximations, and LUTs offer distinct trade-offs in speed, resources, and precision. This initial choice profoundly impacts subsequent approximation opportunities.

Data Type and Bitwidth Adjustment (Approximation 2)

Type	Name	Bitwidth
char		8 bits
integer	short	16 bit signed
integer	unsigned short int	16 bit
integer	int	32 bit signed
integer	long long	64 bit signed
float	float	32 bit signed
float	double	64 bit signed
float	long double	128 bit signed

Approximation 2 includes adjusting data types and bitwidths, a fundamental method for optimizing hardware. Converting floating-point to fixed-point and scaling integer bitwidths (Table 3) directly reduces circuit area, power, and delay. HLS tools support custom fixed-point (Table 4) and algorithmic C data types for fine-grained control over precision.

Loop Skipping / Perforation (Approximation 2)

Loop skipping is an approximation primitive (Approximation 2) that selectively skips loop iterations to enhance performance. This technique, widely used in software, can be integrated into HLS workflows to reduce computation, with frameworks like SpeedGuard dynamically controlling accuracy within specified thresholds.

Variable to Variable (V2V) and Variable to Constant (V2C) Substitution (Approximation 2)

V2V and V2C substitutions (Approximation 2) simplify behavioral code by replacing variables with highly correlated ones or constants, based on statistical analysis of their values. This can lead to significant area/power savings by eliminating complex computations. Source code refactoring can increase opportunities for these powerful approximations.

Code Transformations (Approximation 2)

Code transformations (Approximation 2) modify the behavioral description to make it more amenable to approximation or to replace portions with predictive models. Examples include arithmetic expression transformations, function memoization (using LUTs), and substituting complex logic with simplified models like Artificial Neural Networks (ANNs) or linear regression for significant design savings.

Example of code refactoring and annotation

Functional Unit (FU) Approximations (Approximation 3)

Approximation 3 involves replacing exact arithmetic FUs (like adders and multipliers) with imprecise, approximate versions. These are highly effective for error-tolerant applications and are particularly beneficial in DSP, ML, and image processing where error tolerance is common. Libraries of approximate FUs exist. To maximize impact, loops may need to be unrolled to expose individual operations for targeted approximation.

HLS Scheduling Approximations (Approximation 4)

Approximation 4 modifies the HLS scheduling process to reduce clock steps or latency. This can be achieved by relaxing timing constraints (e.g., modifying the HLS technology library's FU delays) or by manually inserting clock boundary directives in the behavioral description, allowing more operations to chain within a single clock cycle, despite potential timing violations at higher frequencies.

HLS Binding Approximations (Approximation 5)

Approximation 5 occurs during the HLS binding stage, where operations from the Dataflow Graph (DFG) are mapped to available Functional Units (FUs), including approximate FUs. The goal is to select the optimal binding of exact and approximate FUs to minimize area/power while adhering to error constraints. This step is critical for fine-tuning the trade-offs.

Voltage Over-scaling (VOS) and Timing Slack

Voltage Over-scaling (VOS) reduces supply voltage to lower power consumption, but at the cost of introducing timing errors due to slower transistors. HLS can strategically generate hardware implementations with large positive timing slack, making them ideal candidates for aggressive VOS and maximizing power savings.

Example of HLS generating designs with different timing slack

Optimal Approximation Order

1. Efficient HW Implementation Algorithm

→

2. Algorithmic Approximations (Code Transformations, V2V/V2C, Loop Skipping, Memoization, Data Types)

→

3. Allocation (FU Approximation)

→

4. Approximate Scheduling

→

5. Approximate Binding

To maximize savings, approximations should be applied in a specific order: starting with algorithmic changes (HW algorithm, code transformations, data types), followed by resource allocation (FU approximation), scheduling, and finally binding. This sequence ensures that high-impact optimizations are performed first, guiding automated HLS approximation flows.

Simulation-Based vs. Correct-by-Construct Approximations

Feature	Simulation-Based / Post-Design Characterization	Correct-by-Construct
Verification Method	Exhaustive or statistical simulation to build an error profile (e.g., PSNR, MAPE, and ER).	Formal methods, analytical analysis, or architectural properties.
Primary Advantage	Flexibility: Can create highly specialized designs leading to smaller/lower power circuits.	Verification Efficiency: No time consuming simulation required.
Primary Disadvantage	Evaluation time: Simulation time grows exponentially. Simulation data needs to match workload	Design Flexibility: Leads to smaller savings.
Error Behavior	Often probabilistic and described statistically.	Often deterministic with bounded guarantees (e.g., max error).
Approximation Primitives	Code transformation FUs approximations V2V/V2C substitutions Loop skipping Memoization Predictive model substitution HLS allocation, scheduling, and binding	Precision Scaling / Truncation Bitwidth reduction Floating-point to fixed-point conversion

A fundamental distinction exists between simulation-based (post-design characterization) and correct-by-construct approximations. Simulation-based methods offer flexibility but require extensive verification, while correct-by-construct techniques provide inherent error guarantees with less design flexibility and potentially lower savings. The choice depends on verification rigor vs. optimization flexibility.

Training Data Dependency Pitfall

A significant pitfall is the reliance of approximate circuits on specific training data. If the operational workload deviates from the training data, output errors can become unacceptable. Future research needs to focus on developing robust, workload-agnostic approximate circuits that are stable across dynamic input conditions.

Code Visibility Limitations in HLS

While HLS improves productivity, it can reduce the visibility of underlying hardware operations compared to direct HDL coding. This 'abstraction penalty' can limit opportunities for approximations. Code transformations and source-to-source compilers are crucial to increase visibility and unlock more approximation potential.

HLS Pragmas: Tool-Dependent Syntax

HLS tools use synthesis directives (pragmas) to control the generation of hardware. However, the syntax for these pragmas is tool-dependent and not standardized. This lack of interoperability means that approximation primitives tied to pragmas (e.g., FU binding) must be re-written for different HLS tools, hindering widespread adoption.

Leveraging Large Language Models (LLMs) for Approximate Design

LLMs present a promising future direction for automating approximate circuit design. They can be fine-tuned to generate approximate code, identify suitable blocks for approximation, suggest primitives, and even generate approximate FUs based on error thresholds, acting as intelligent assistants for HLS designers.

Dynamic Tunable Approximations

To address the risk of changing workloads leading to unacceptable errors, dynamic tunable approximations are being proposed. These architectures allow runtime control over approximation levels, enabling selective power consumption management and adapting to varying operational conditions without requiring static, irreversible design choices.

Calculate Your Potential ROI

Estimate the impact of Approximate Computing on your operational efficiency and cost savings.

Your Industry

Number of Employees Impacted

Avg. Hours Per Week on Manual Tasks

Avg. Hourly Rate ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Unlock Full Savings Potential

Your Path to Implementation

A phased approach to integrate approximate computing into your high-level synthesis workflow.

Phase 1: Discovery & Strategy

Assess current HLS practices, identify suitable applications for approximation, and define clear error tolerance metrics (Emax). Develop a tailored strategy based on our analysis and your specific enterprise goals.

Phase 2: Pilot & Proof of Concept

Implement and test initial approximations using commercial HLS tools, focusing on high-impact primitives identified in Phase 1. Validate performance, power, and area gains against defined error thresholds.

Phase 3: Integration & Scaling

Integrate proven approximate computing techniques into your existing HLS design flow. Develop custom libraries of approximate functional units and refine automated approximation flows for broader application across projects.

Phase 4: Continuous Optimization

Establish monitoring and feedback loops to continuously optimize approximate circuits. Explore advanced techniques like dynamic tunable approximations and leverage AI/ML for automated design space exploration.

Start Your Custom Roadmap

Ready to Transform Your Hardware Design?

Schedule a free 30-minute consultation with our experts to explore how approximate computing can benefit your enterprise.

Book Your Free Consultation