International Research Journal of Engineering and Technology (IRJET) Volume: 02 Issue: 07 | Oct-2015 www.irjet.net

# VLSI Architecture for Lifting based 3-D Discrete Wavelet Transform

## Faiz Mohammad Karobari<sup>1</sup>, Dr. Bharathi S H<sup>2</sup>

<sup>1</sup> PG Student, Department of ECE, Reva institute of technology and management, Bengaluru-64 <sup>2</sup> Professor, Department of ECE, Reva institute of technology and management, Bengaluru-64 \*\*\*

Abstract - In this paper, a lifting based parallel 3-D Discrete Wavelet Transform (DWT) architecture is proposed. Four parallel temporal and spatial DWT components of the proposed architecture bequeaths high throughput of 16 results per clock cycle. To ascertain the temporal and spatial processing, we deploy 1-D DWT blocks. It was seen that the usage of 4 parallel spatial processors lessens the need of frame memory in case of temporal transformation. Also, lower power designs are possible due to higher throughput that reduces the number of working cycles, considerably. We make use of Verilog for the realization of the Register Transfer Logic (RTL) of the proposed architecture. The verification of the same is Simulator. accomplished using ModelSim The performance analysis of the proposed architecture is done by synthesizing for the Xilinx Virtex-VI series by proper acquisition of diverse factors through the synthesis report produced by Xilinx ISE.

*Key Words – VLSI, Discreet Wavelet Transform, 3D- DWT, RTL.* 

### **1. INTRODUCTION**

The credit of the development of the DWT goes to Alfred Haar, a Hungarian mathematician. The Haar transform deals with pairing up the values inputted storage of the difference and producing the sum. This is a recursive function involving the pairing of the sums to obtain the succeeding scales. Thus  $2^{n-1}$  differences and one final sum are obtained. In 1988, Ingrid Daubechies, a Belgian mathematician formulated the most commonly used DWTs. This basically involved recursive generation of discreet samples (each resolution being twice the previous one). Many new formulations to the Daubechies wavelets have been added from the time of its invention.

Popular forms of DWTs includes the decimated or the undecimated (down sampling being omitted) wavelet transform, Newland transform wherein an orthogonal wavelet basis is formed, Complex wavelet transform, Wavelet packet transform, etc.

The 3D transform is obtained by a 1D DWT in every dimension, i.e. the 3D DWT is separable. The wavelet and the scaling functions, i.e.  $\Phi(x)$  and  $\Psi(x)$  for 3D DWT are as given below:

| $\Phi(x,y,z) = \Phi(x) \Phi(y) \Phi(z)$                                                       | (Scaling)   |
|-----------------------------------------------------------------------------------------------|-------------|
| $\Psi_1(x,y,z) = \Phi(x)\Phi(y)\Psi(z)$                                                       | (wavelet1)  |
| $\Psi_2(\mathbf{x},\mathbf{y},\mathbf{z}) = \Phi(\mathbf{x})\Psi(\mathbf{y})\Phi(\mathbf{z})$ | (wavelet 2) |
| $\Psi_3(x,y,z) = \Psi(x)\Psi(y)\Phi(z)$                                                       | (wavelet 3) |
| $\Psi_4(x,y,z) = \Phi(x)\Psi(y)\Psi(z)$                                                       | (wavelet 4) |
| $\Psi_5(x,y,z) = \Psi(x)\Phi(y)\Psi(z)$                                                       | (wavelet 5) |
| $\Psi_6(\mathbf{x},\mathbf{y},\mathbf{z}) = \Psi(\mathbf{x})\Psi(\mathbf{y})\Phi(\mathbf{z})$ | (wavelet 6) |
| $\Psi_7(x,y,z) = \Psi(x)\Psi(y)\Psi(z)$                                                       | (wavelet 7) |

3D DWT is comparable to 1D DWT in 3 directions. Initially the data is transformed in the x- direction. The output of the high and the low pass filters being fed to the other filter pairs, thereby transforming the obtained data in along the y-plane. These 4 data streams get into 4 other pairs of filters and then perform the final transform along the z-direction. Totally 8 data streams are generated as a result of this process. The input to the next octave is the approximate signal obtained by the scaling operations. This signal contains roughly 90% of the total energy while the detailed signals are contained in the seven other streams.

Basically a 3D DWT is the amalgamation of 3 1D DWT along the x, y and z planes. After a 1 level 3D DWT, the volume of the image is decomposed into HHL, HLL, LHL, LLL, HHH, HLH, LHH and LLH signals.

Transmission coupled with compression via 3D transform is of utmost importance in case of medical data. Better compression is rendered since DWT ensures correlation of images under test. As per the study, 3-D DCT is comparatively efficient when compared to the 2-D DCT x-ray CT. As a result, the 3-D DWT is anticipated to perform better in comparison to 2-D DWT. Also, it is to be noted that wavelets do not create blocking artificats that are unpredicted in case of medical images. Hence, DCT is not well suited for medical image compression.

An image sequence that is moving can be better depicted like multiple 2-D slices and hence, can be coded as 2-D images frame-by-frame, independently. However, the temporal dependence amongst the frames remains unused in 2-D coding. The 3-D sub-band is constructed and validated via number of researchers due to its reduced blocking artifacts and enhanced scalability of resolution.

We propose a lifting based 3-D DWT architecture that deploys memory storage blocks for temporal and spatial processing. This in turn adds to the overall performance characteristics of the design schema. The proposed method lessens the power consumption and latency by deploying 4-temporal processing components and 4-parallel 2-D DWT processors. 4 spatially managed frames are synchronously delivered to the temporal processor that saves 2 frame memories. This scheme uses less power due to read/write, lower memory addressing, and refresh operations. This makes it well suited for the devices operated using batteries.

Discreet Wavelet Transform find tremendous applications in the field of preconditioning for data compression, acoustics, sub-band coding, astronomy, nuclear engineering, image and signal processing, (MRI), Magnetic Resonance Imaging Electroencephalography (EEG) [11] in neuroscience, music, discrimination of speech, prediction of earthquakes via seismic waveform analysis, data compression, fractals, pure mathematics, optics, turbulence, radar, computer vision, etc. In a broader sense, wavelets are also being utilized in quality control, outlier analysis, geophysics, biology and biological computing, astrophysics, imaging technology, traffic modeling in networking, aural signal analysis for medical science, video-signal coding, weather forecasting, etc.

### 2. RELATED WORK

Owing to its decorrelation feature, the DWT has gained mass-acceptance, right from the time of its invention [1] in the transformation stages for images, compression, video, etc. [2]. Zerves, et al, [3] proposed that there can be 3 architectures for a 2D DWT viz. linebased, level-by-level and block based. Kaur, et al, [4] compared the efficiencies of both the DWT and the DCT based methods for image compression. It was found that DWT avoided artifacts related to blocking and rendered higher compression ratios. Darji, et al. [5] designed and proposed a high throughput, memory efficient design for lifting based 3D DWT. Jen-Shiun Chiang, et al, [6] proposed a VLSI architecture using DWT for 2D lifting based 5/3 filter. The main focus was to reduce the area of silicon and to achieve full utilization of hardware. Mansouri, et al. [7] devised VLSI architecture for real time video and image processing using 2D DWT. The system ensured low control complexity, least power consumption and lower space and time complexities. Thomas, et al, [8] proposed the compression technique using SPIHT on real time space images from NASA. Several variants of DWTs were tested and the folded DWT design was used exclusively for the study. Mohamed, et al, [9] compared the efficiency of both the Haar and the Daubechies wavelet transforms using FPGA. The result obtained via simulation is compared by the Bit Error Rate (BER) against the reconstructed output signal and the audio input signal. Yong Liu, et al, [10] reported the design of 2D biorthogonal DWT using Residue Number System (RNS) arithmetic. The results from the synthesis confirmed the fitting of the entire system on a 1,000,000 gate FPGA.

### 3. PROPOSED 3-D DWT ARCHITECURE

The proposed system architecture is shown in Fig. 1. It is basically a one-level 3-D DWT architecture with an inclusion of a block level design of principal functional components uses a scheme that manipulates the spatial transform first and then its temporal counterpart. The following sections elaborate the functioning of the various functional blocks:

The proposed 3D DWT scheme comprises of the following major components:

1. Temporal Processor 2. Spatial Processor

The spatial processor has 4 column processing element ( $C_{PE}$ ) and 4 row processing elements ( $R_{PE}$ ). Every  $C_{PE}$  comprises 2 data splitting module, 2 1-D DWT components and column memory module ( $C_{MM}$ ). Every  $R_{PE}$  comprises of row memory module ( $R_{MM}$ ) and 1 data splitting module, 1-D DWT module.

The input pixel values are passed to the spatial processor that transforms it two dimensionally with the aid of 2 devoted efficient components i.e., the columns and the row processing elements. As shown in Fig. 1, the input images are warehoused in the memory. This makes it mandatory to have the memory size larger than that of the image size. During the main step, the memory control addresses the transformed coefficients to the column memory and the coefficients of the band to the row processor. After the calculation of all the octaves, the coefficients are transferred for the column processing.

 $R_{PE}$  basically functions on the pixel values of the input image. The output produced by the row processor divides the input image into 2 bands of H and L, whereas  $C_{PE}$  is idle till the input image is converted by the  $R_{PE}$  and the coefficient components are stored in  $C_{MM}$ . Then the column processors function on the row processed output. The output produced by the column processors divide the input image into 4 sub-bands of HH, LH, LL and HL.

Temporal processor has 4 temporal processing elements termed as  $T_{PE}$ . Every  $T_{PE}$  comprises of temporal memory module -  $T_{MM}$  and 2 1-D DWT components.

The spatially transformed data that need to be decomposed consequently in the temporal domain are passed to the  $T_{PE}$ . Due to the global clock, at each cycle, 2 pixels of 2 varying frames but equal sub-bands are passed to the  $T_{PE}$  for the calculation purposes. This is applicable between the same sub-bands of different frames. On similar grounds, the temporal processing continues bereft of any on-chip memory to back up the whole spatially transformed data.  $T_{MM}$  is basically an off-chip memory that is often used to log the temporal processed values of

pixels. To combine sync in the flow of data related with  $C_{\text{MM}}$ , effective addressing strategy and synchronous clock management are made use of.



Fig. 1 Proposed System Architecture

### 4. RESULTS AND DISCUSSION

This section throws light on the results obtained from the overall implementation.

### 4.1 Simulation Results



Fig. 2 Simulation Waveform of 3-D DWT using ModelSim



# Fig. 3 Simulation Waveform of Proposed 3-D DWT using ISIM

Fig. 4 and 5 shows the input images and the ones produced by MATLAB after computing them for 3-D DWT in ModelSim.



Fig. 4 Input image sequence

| LAL LAL DAL DAL DAL DAL DAL DAL DAL DAL |  |
|-----------------------------------------|--|
|                                         |  |
|                                         |  |
|                                         |  |
|                                         |  |
| HL HL HR 196                            |  |
|                                         |  |
| 1631 HEAI 1986 3091                     |  |
|                                         |  |
|                                         |  |

Fig. 5 Output images after 3-D DWT Computation

### 4.2 Device Utilization Summary Results

Table 1 gives the summary of the device deployment produced from the synthesis report of structural modeling. By scrutinizing the results, it is obvious that there is radical decreasing of allocated resources of FPGA when modeled. Fig. 6 gives the summary of the timing results. structural modeling. By scrutinizing the results, it is obvious that there is radical decreasing of allocated resources of FPGA when modeled. Fig. 6 gives the summary of the timing results.

| Device Utilization Summary (estimated values)                                                                                                                              |                                          |                                                         |                                       |                                          |                      |                 |     |  |    |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|---------------------------------------------------------|---------------------------------------|------------------------------------------|----------------------|-----------------|-----|--|----|--|
| Logic Utilization                                                                                                                                                          | Used                                     |                                                         | Available                             |                                          | Utilization          |                 |     |  |    |  |
| Number of Slice Registers                                                                                                                                                  |                                          | 5919                                                    |                                       | 687360                                   | 0%                   |                 |     |  |    |  |
| Number of Slice LUTs                                                                                                                                                       |                                          | 3632                                                    |                                       | 343680                                   | 1%                   |                 |     |  |    |  |
| Number of fully used LUT-FF pairs                                                                                                                                          |                                          | 1481                                                    |                                       | 8070                                     | 18%                  |                 |     |  |    |  |
| Number of bonded IOBs                                                                                                                                                      |                                          | 325                                                     | 22                                    | 1200                                     | 27%                  |                 |     |  |    |  |
| Number of Block PAM/ETEO                                                                                                                                                   |                                          | 512                                                     |                                       | 632                                      | 81%                  |                 | 217 |  |    |  |
|                                                                                                                                                                            |                                          | 512                                                     |                                       | 002                                      | 01/8                 |                 | 617 |  |    |  |
| Number of BUFG/BUFGCTRLs                                                                                                                                                   |                                          | 2                                                       |                                       | 32                                       | 6%                   |                 | 6%  |  | 6% |  |
| Number of DSP48E1s                                                                                                                                                         |                                          | 100                                                     |                                       | 864                                      |                      | 11%             |     |  |    |  |
| Timing Summary:                                                                                                                                                            |                                          |                                                         |                                       |                                          |                      |                 |     |  |    |  |
| Speed Grade: -1                                                                                                                                                            |                                          |                                                         |                                       |                                          |                      |                 |     |  |    |  |
| Minimum period: 9<br>Minimum input arr:<br>Maximum output red<br>Maximum combinatio                                                                                        | .286ns<br>ival ti<br>quired<br>onal pa   | (Maximum Fr<br>me before o<br>time after<br>th delay: N | equenc<br>:lock:<br>clock:<br>lo path | y: 107.68<br>2.843ns<br>2.835ns<br>found | 39MHz)               |                 |     |  |    |  |
| Timing constraint: Default period analysis for Clock 'clk'<br>Clock period: 9.286ns (frequency: 107.689MHz)<br>Total number of paths / destination ports: 10408312 / 29121 |                                          |                                                         |                                       |                                          |                      |                 |     |  |    |  |
| Delay:                                                                                                                                                                     | 9.286n                                   | s (Levels o                                             | f Logi                                | c = 19)                                  |                      |                 |     |  |    |  |
| Source:                                                                                                                                                                    | Source: inst_z4/inst_x/Mram_mem181 (RAM) |                                                         |                                       |                                          |                      |                 |     |  |    |  |
| Destination: inst_z4/inst_x/inst_lft/u_final_lift_design/                                                                                                                  |                                          |                                                         |                                       |                                          |                      |                 |     |  |    |  |
| Source Clock:                                                                                                                                                              | clk ri                                   | sing                                                    |                                       |                                          |                      |                 |     |  |    |  |
| Destination Clock:                                                                                                                                                         | clk ri                                   | sing<br>Gate                                            | Net                                   |                                          |                      |                 |     |  |    |  |
| Cell:in->out f                                                                                                                                                             | anout                                    | Delay D                                                 | elay                                  | Logical N                                | lame (Net Na         | me)             |     |  |    |  |
| RAMB36E1                                                                                                                                                                   | 0                                        | 2.500 0                                                 | . <u>o</u> oo                         | ins <u>t</u> z4/i                        | .n <u>s</u> t x/Mram | mem <u>1</u> 81 |     |  |    |  |
| RAMB36E1                                                                                                                                                                   | 2                                        | 0.799 0                                                 | .423                                  | inst_z4/i                                | .nst_x/Mram_         | mem18           |     |  |    |  |
| LUT2:I1->0                                                                                                                                                                 | 1                                        | 0.068 0                                                 | .000                                  | inst_z4/i                                | .nst_x/inst_         | lft/            |     |  |    |  |
| MUXCY:S->O                                                                                                                                                                 | 1                                        | 0.290 0                                                 | .000                                  | inst_z4/i                                | .nst_x/inst_         | lft/            |     |  |    |  |
| MUXCY:CI->O                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst_z4/i                                | .nst_x/inst_         | lft/            |     |  |    |  |
| MUXCY:CI->O                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->O                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | .nst x/inst          | lft/            |     |  |    |  |
| MUXCY:CI->O                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | .nst x/inst          | lft/            |     |  |    |  |
| MUXCY:CI->O                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | .nst x/inst          | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY:CI->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | lft/            |     |  |    |  |
| MUXCY CT->0                                                                                                                                                                | 1                                        | 0 019 0                                                 | 000                                   | inst 74/i                                | net v/inst           | 1ft/            |     |  |    |  |
| MUXCY:CT->0                                                                                                                                                                | 1                                        | 0.019 0                                                 | .000                                  | inst z4/i                                | nst x/inst           | 1ft/            |     |  |    |  |
| MUXCY:CT->0                                                                                                                                                                | ō                                        | 0.019 0                                                 | .000                                  | inst 24/1                                | nst v/inet           | 1ft/            |     |  |    |  |
| XORCY CI-NO                                                                                                                                                                | 10                                       | 0.239 0                                                 | 458                                   | ingt 74/1                                | net v/inet           | 1f+/            |     |  |    |  |
| DSD49F1 • 315-5025                                                                                                                                                         | 10                                       | 3 826 0                                                 | 399                                   | inet 24/1                                | net v/inet           | 1f+/            |     |  |    |  |
| FDCE:D                                                                                                                                                                     | 1                                        | 0.011                                                   | . 3 3 3                               | inst_z4/i                                | .nst_x/inst_         | lft/            |     |  |    |  |
| Total                                                                                                                                                                      |                                          | 9.286ns (                                               | <br>8.006n                            | s logic,                                 | 1.280ns rou          | te)             |     |  |    |  |
|                                                                                                                                                                            | (86.2% logic, 13.8% route)               |                                                         |                                       |                                          |                      |                 |     |  |    |  |

### Table 1. Device Utilization summary for 3-DDWT Unit

Fig. 6 Snapshot of timing summary

### **5. CONCLUSION**

The current work targeted mainly on the implementation of a simple, yet effective VLSI architecture for 3-D DWT based on a lifting based schema. This was necessary due to the overgrowing need of the design in modern day electronics. The proposed architecture makes it a point to cater the throughput related issues ingrained in conventional 3-D DWT architectures. This is achieved by the pipelined design of 1-D DWT that considerably reduces the critical path delay. ModelSim is used to serve the purpose of simulation to ensure proper functional validation and verification. In order to analyze the architecture to generate RTL schematic and device utilization summary, this project has been synthesized for target FPGA board using Xilinx ISE. As a summary, the proposed design scheme can be used in real time applications to overcome the curtailments corresponding to latency, bandwidth, transmission delays, storage, compression and other related issues. This opens many avenues of research domains to exploit and avail the benefits of the proposed scheme.

### ACKNOWLEDGEMENTS

The authors are immensely grateful to the valuable help rendered by the Dept. of Electronics and communication, Reva institute of technology and management, Bangalore, Karnataka, India.

### REFERENCES

- [1] Olivier Rioul and Martin Vetterli, "Wavelets and Signal Processing", IEEE Trans. on Signal Processing, Vol. 8, Issue 4, pp. 14 - 38 October 1991.
- [2] D. S. Taubman, "High performance scalable image compression with EBCOT", IEEE Transaction Image Processing, Vol. 9, No. 7, pp. 1158–1170, July 2000
- [3] Evaluation of Design Alternatives for the 2-D-Discrete Wavelet Transform. Nikos D. Zervas, Giorgos P. Anagnostopoulos, Vassilis Spiliotopoulos, Yiannis Andreopoulos, and Costas E. Goutis. DECEMBER 2001, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vols. 11, NO. 12, pp. 1246-1262.
- [4] Kaur, Amanjot, and Jaspreet Kaur. "Comparison of DCT and DWT of Image Compression Techniques." International Journal of Engineering Research and Development 1.4 (2012): 49-52.
- [5] Darji, Anand, et al. "Hardware Efficient VLSI Architecture for 3-D Discrete Wavelet

Transform." VLSI Design and 2014 13th International Conference on Embedded Systems, 2014 27th International Conference on. IEEE, 2014.

- [6] Jen-Shiun Chiang, and Chih-Hsien Hsia, "An Efficient VLSI Architecture for 2-D DWT using Lifting Scheme," IEEE International Conference on Systems and Signals, pp. 528- 531, April 2005, Taipei, Taiwan.
- [7] Mansouri, A., A. Ahaitouf, and F. Abdi. "An efficient VLSI architecture and FPGA implementation of highspeed and low power 2-D DWT for (9, 7) wavelet filter." IJCSNS International Journal of Computer Science and Network Security9.3 (2009): 50-60.
- [8] Fry, Thomas W., and Scott A. Hauck. "SPIHT image compression on FPGAs."Circuits and Systems for Video Technology, IEEE Transactions on 15.9 (2005): 1138-1147.s
- [9] Mahmoud, Mohamed I., et al. "Comparison between haar and daubechies wavelet transformations on FPGA technology." World Academy of Science, Engineering and Technology 26 (2007): 68-72.
- [10] Liu, Yong, and E. M. Lai. "Design and implementation of an RNS-based 2-D DWT processor." Consumer Electronics, IEEE Transactions on 50.1 (2004): 376-385.
- [11] Geeta Navalyal, Rahul Gavas, "A dynamic attention assessment and enhancement tool using computer graphics", Human-centric Computing and Information Sciences 2014 4:11, Springer. doi:10.1186/s13673-014-0011-0

### BIOGRAPHIES



**Faiz Karobari** completed B.E. in ECE from S.L.N. College of Engineering, Raichur in the year and 2013 and currently pursuing MTech in VLSI and ES from Reva institute of technology and management, Bengaluru.



**Dr. Bharathi S H** completed B.E. in Electronics from Bangalore University in the year 1992. She did her M.E. in ECE from Bangalore University in the year 2002, and PhD from Krishnadevaraya university, AP in 2012. Presently, she is working as a professor in the

Department of ECE, Reva ITM, Bangalore. Her areas of interest are Image and video compression, FPGA, and electromagnetic.