Huang, Junqi
(2021)
Energy efficient and runtime based approximate computing techniques for image processing applications: an integrated approach covering circuit to algorithmic level.
PhD thesis, University of Nottingham.
Abstract
Approximate computing has been widely used in error resilient design for improving the energy performance by reducing circuit complexity and allowing circuits to produce acceptable error results (approximation). Generally, the approximate computing techniques have been developed and implemented either at algorithmic level or logic level or circuit level and with no feasibility of on-the-fly or runtime change of approximation. Thus, different from the existing methods, this thesis presents novel energy-efficient integrated approach of implementing approximate computing techniques from circuit level to the algorithmic level that incorporate the change of approximation at runtime without incurring any extra hardware requirement. The two new techniques proposed are namely frequency upscaling (FUS) technique and voltage overscaling (VOS) technique. These two new techniques developed for the logic/circuit level abstract are integrated into two new proposed algorithmic level approximate computing techniques known as zigzag low-complexity approximate DCT (ZLCADCT) technique for image compression and approximate Newton method for image denoising. Thus, developing an integrated approach of implementing runtime based approximate computing techniques from circuit level abstract to algorithmic level abstract for image processing applications.
In proposed FUS method, the frequency of the input values applied to an exact and approximate (AMA1) full adder cell is increased (upscaled) beyond its maximum operating value thereby generating errors in the addition operation and at the same time increasing the computational throughput. As for VOS technique, the supply voltage of exact and approximate adder cells is scaled down below the nominal voltage such that the delay in the output increases beyond the worst-case delay thereby generating errors for addition results while reducing energy dissipation. The approximation of a given circuit is realized in runtime through controlling the operating frequency and supply voltage on the circuit without the need to modify or include additional circuits.
Also, FUS and VOS techniques optimise the approximation of the approximate adder cell while increasing the processing speed and decreasing the energy dissipation of the cell. It is observed that operating frequency of approximate adder cell using FUS technique can be increased to 1.4 times (11.49GHz to 16.6GHz) for the minimum approximation (2 errors) and the maximum approximation (7 errors) is achieved by increasing the operating frequency by 2.5 times (11.49GHz to 29GHz). Thus, the approximation can be varied at the runtime without the need for any additional hardware. Moreover, the processing speed of the approximate adder cell is increased as well. Also, on applying FUS technique to an exact adder cell shows that the approximate adder cell sustains a higher (around 1.3 times) frequency operation for the same approximation and results in 50% reduction in energy dissipation when compared to the exact adder cell. By applying VOS on both exact and approximate adder cells, it is observed that the approximate adder cell when compared with exact adder cell, reduces 30% energy dissipation for the maximum approximation. In addition, with VOS, the approximation of the approximate adder cell can be varied from minimum value to maximum value at the runtime without incurring any additional hardware. At the same time saving the energy dissipation from 31.1% to 87% when compared with the exact adder cell.
The proposed techniques are further validated by analyzing the effect of process variations (such as gate length, supply voltage, input frequency) on applying both proposed techniques to adder cell. It is observed from the FUS technique, the decrease (increase) of both gate length variation and supply voltage variation results in reduced (increased) frequency variations for the same number of errors (approximation). Similar trend is observed in the absolute energy variation as well. When applying VOS, it is observed that the energy variation due to change of gate length for approximate full adder is significantly lower than exact full adder. In addition, the mathematical models as applicable to both exact and approximate full adders are presented for FUS and VOS techniques respectively. The results of the developed mathematical models are validated with the simulation results and it is observed that the results are in close agreement.
Further using exhaustive simulations, the proposed techniques are validated by applying it on 4-bits,8-bits RCAs (Ripple Carry Adders) and subtractors followed by addition of two images using exact and approximate adder cells have been presented. From the results it is observed that when the frequency is upscaled, approximate (AMA1-based) RCA can sustain 1.18 to 1.37 times higher frequency than exact full adder-based RCA for having the maximum ER (Error rate) by keeping lower NMED (normalized mean error distance) and MRED (mean relative error distance) values. The error rate (ER) and NMED for approximate adder cell (AMA1)-based RCA is significantly lower than exact full adder-based RCA by using VOS technique; 62% of energy saving is achieved by using approximate adder-based RCA when compared with exact full adder at the maximum ER level. The PSNR results for addition of two images show that approximate circuit achieves a higher output image quality than the exact circuit by using FUS and VOS techniques.
Next, at the algorithmic level of approximate computing, two new techniques at known as Zigzag low-complexity approximate DCT (ZLCADCT) technique for image compression and an approximate Newton method using approximate additions for image denoising are proposed. Furthermore, FUS technique and VOS technique are applied into proposed ZLCADCT and approximate newton method using approximate additions to show the performance evaluation across circuit level to algorithmic level. The proposed ZLCADCT is a deterministic technique that accurately configures the size of the transform matrix (T) according to the number of retained coefficients in the zigzag scanning process. This is achieved by establishing the relationship between the number of retained coefficients and the number of rows of the ‘T’ matrix. When compared with approximate DCT (ADCT), ZLCADCT decreases the number of addition operations and the energy consumption while retaining the PSNR of the compressed image. In addition, ZLCADCT eliminates the zigzag scanning process used in ADCT. Moreover, to characterize the deterministic operation of ZLCADCT, a detailed mathematical model is provided. A hardware platform based on FPGAs is then utilized to experimentally evaluate and compare the proposed technique; as modular, deterministic, low latency and scalable, the proposed techniques can be implemented upon any change in the number of retaining coefficients by realizing only a partial reconfiguration of the FPGA resources for the additional required hardware. Extensive simulation and experimental results show the superior performance compared with previous ADCT techniques under different metrics. Besides, when FUS and VOS are implemented respectively for ZLCADCT, approximate full adder can sustain significant higher input frequency (around 19.2GHZ by 32nm adders) and lower supply voltage (around 0.77v) when compared with an exact full adder (around 15.4GHZ and 0.83v) without having significant decreases in PSNR value. The number of completed DCT operations for ZLCADCT (2.95 to 3.53 for approximate full adder at 16.6GHZ) is higher than 2.3 to 2.77 for ADCT by using FUS technique. Total energy dissipation for voltage overscaled ZLCADCT (9.54E-10J to 6.52E-10J for approximate full adder at 0.76v) is lower than 12.1E-10J to 8.08E-10J for voltage overscaled ADCT.
In the proposed approximate Newton method using approximate addition, an additional step length parameter (α) for the approximate newton method using conjugate gradient is initially introduced, such that the number of iterations and total processing time decreases for the total variation-based image denoising. Then, a floating-point adder (32-bits) made of approximate or truncated cells is applied to reduce the processing time in each iteration. The proposed technique is tested on a set of images taken from a public domain library and is found that when 1.39<α<1.45, the number of iterations tend to be the lowest. Moreover, the processing time of an iteration decreases significantly by applying an approximate adder at usually a very small loss of accuracy and quality of the output image; the number of iterations remains constant when the number of approximate or truncated cells in the least significant positions (given by so-called NAB) is below 10. Irrespective of the noise level and adder cell type, the quality of output images does not incur in a significant degradation when NAB<18. Besides, by using FUS and VOS techniques, the PSNR of output images keeps nearly unchanged when NAB<15 for both adders. At high NAB value level (NAB≥18), the frequency of AMA1(22.64GHZ) can be scaled up to be a little higher than exact full adder (21.04GHZ) for keeping the low number of iterations. By using VOS technique at high NAB value level (e.g. NAB=17 at 50% noise), the number of iterations (12.4 to 16.2) and energy consumption (8.35nJ to 14.33nJ) for AMA1 can be higher than for exact full adder (6.8 to 8.6 for number of iterations and 7.26nJ to 9.61nJ for energy consumption) at the low approximation (less than 5 output errors).
Item Type: |
Thesis (University of Nottingham only)
(PhD)
|
Supervisors: |
Kumar, T. Nandha Al-Murib, Haider Abbas Mohammed |
Keywords: |
approximate computing, low-power design, approximate full adder, ripple carry adder, frequency upscaling, voltage overscaling, inverter equivalent circuit, approximate dct, zigzag scanning, image compression, fpga, vlsi design, inexact newton method, unconstrained optimization, total variation, image denoising, binary adder, binary multiplier, energy dissipation, worst-case delay |
Subjects: |
T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Faculties/Schools: |
University of Nottingham, Malaysia > Faculty of Science and Engineering — Engineering > Department of Electrical and Electronic Engineering |
Item ID: |
64542 |
Depositing User: |
Junqi, Huang
|
Date Deposited: |
04 Aug 2021 04:40 |
Last Modified: |
01 Feb 2024 02:52 |
URI: |
https://eprints.nottingham.ac.uk/id/eprint/64542 |
Actions (Archive Staff Only)
|
Edit View |