Project 1

Low Power Coarse Grained Reconfigurable Architectures

Staff:

  • Kunjan Patel
  • Chris Bleakley

Motivation

Biosignals are generally one dimensional and multichannel. They are used in patient monitoring, diagnosis and detection of diseases and conditions such as epilepsy. To perform these operations without interrupting the patient's daily life, portable biosignal processing devices are essential. Current devices, such as EEG Halters, last for 1-2 days before recharging. Biosignal data is simply stored – no real-time analysis is possible.

For many biomedical applications, real-time data analysis is desirable. However, the power consumption of programmable DSPs is often too high to allow their inclusion in portable devices and implantable devices, such as pacemakers. Hard-wired application specific circuits offer lower power but at a high cost.

This project investigates the design of a low power Coarse Grained Reconfigurable Array for on-body biosignal processing. Coarse Grained Reconfigurable Array architectures promise low power consumption, can deliver high performance and have sufficient flexibility to allow mapping of common biosignal processing algorithms.

Aims

The overall aim of the project is to develop low power processor platform for multichannel biosignal processing.

Approach

A simple processor supports execution of irregular software and an array of configurable MACs supports execution of regular DSP routines. A shared memory interface is provided between the processor and the array. Systolic mapping of DSP algorithms ensures efficiency of computation. Power saving is achieved by almost eliminating the fetch-decode steps and by reducing the number of RAM access. Futhermore, the high degree of parallelism allows aggressive voltage scaling.

 

Figure 1: Main processor and Coarse Grained Reconfigurable Array co-processor.

 

 Figure 2: 4x4 Matrix Multiplication on CGRA

 

 

Results

An array functional model and simulation engine have been implemented in software. A number of key algorithms have been implemented on the simulator and their performance assess, see Tables 1 and 2. Initial results indicate significant reductions in RAM accesses. RTL implementation of the array is now underway.

 

 

Table 1: Performance of selected biosignal processing algorithms.

 

 

 

 

 

 

 

Total Operations

 

 

 

Algorithm

 

Description

 

Iterations

 

Proposed algorithms

 

TI C5510

 

No of CFUs

 

FIR

 

5 taps

 

256

 

6

 

6

 

6

 

Matrix Multiplication

 

4x4 by 4x4

 

25

 

67

 

64

 

16

 

Matrix Determinant

 

3x3

 

25

 

15

 

17

 

5

 

FFT Butterfly

 

-

 

25

 

9

 

8

 

8

 

Wavelet Filterbank

 

type db2

 

256

 

8

 

8

 

10

 

DFT using correlation

 

8 point

 

8

 

61

 

61

 

61

 

 

Table 2: Register and RAM access comparison for selected biosignal processing algorithms.

 

 

 

CGRA

 

DSP

 

Memory Access

 

Algorithm no

 

RGA

 

RMA

 

RGA

 

RMA

 

Reduction (%)

 

FIR filter

 

12

 

2

 

12

 

7

 

489

 

Matrix Multiplication

 

208

 

42

 

256

 

192

 

368

 

Matrix Determinant

 

35

 

16

 

26

 

10

 

73

 

FFT Butterfly

 

25

 

8

 

68

 

5

 

-51

 

Wavelet Filterbank

 

16

 

2

 

16

 

9

 

679

 

DFT using correlation

 

188

 

15

 

183

 

130

 

780

 

Top