Input and output formatters - Accelerator Engine

Hardware Accelerator description

4.2 Accelerator Engine

4.2.2 Input and output formatters

Hardware Accelerator description

DMATRIGEN 1 Yes

Completion trigger to DMA:

This parameter-set register is used to enable DMA channel trigger upon completion of the accelerator operation for that parameter set. This trigger mechanism enables the accelerator to hand-shake with the DMA so that output data samples are copied out of the accelerator local memory. If enabled, the accelerator triggers a specified DMA channel, so that the output samples can be shipped from the local memory to Radar data memory.

ACC2DMA_

CHANNEL_

TRIGDST

4 Yes

DMA channel select for accelerator completion trigger:

This parameter-set register is used to select which of the 16 DMA channels allocated to the accelerator should be trig-gered upon completion of the accelerator operation for that parameter set. This register is to be used in conjunction with DMATRIGEN.

CR42DMATRIG 16 No

Trigger from processor to DMA:

This register can be used by the processor to trigger a DMA channel for the first time, so that a full sequence of repeated operations between the DMA and the accelerator gets kick-started.

Table 4.1: State machine registers. [4]

Hardware Accelerator description

The flexibility of this operational blocks lies in the programmable access pat-terns and the possibility to perform various iterations back-to-back. The patpat-terns are programmed using different parameter-set registers that specify for example how many bytes there are between a sample and the following (^SRCAINDX and

DSTAINDX), how many samples a single operation has to process (SRCACNT and

DSTACNT), how many bytes there are between the starting address of an iteration and the following (SRCBINDXandDSTBINDX) and how many back-to-back iterations have to be performed (^REG_BCNT).

These registers are explained in detail in Table 4.2.

Figure 4.3: Example of source memory with the values of the input formatter registers. [4]

To understand better the operations of the input and output formatters an example from HWA User’s Guide is reported. In Figure 4.3 the source memory in a first dimension FFT case is analysed.

The colored cells represents the samples received from the four RX antennas of the same transmitted chirp. In this case the data are in a 16-bit real and 16-bit imaginary format and the number of samples per RX is 128. The register

Hardware Accelerator description

REG_BCNT is set to three (4 − 1) since four FFT iterations, one for each RX antenna, must be performed. Every clock cycle the input formatter can fetch a complex sample, format it and send it to the core computational unit in order to be processed.

As mentioned, the computational unit works with 24-bit parallelism so each sample has to be scaled or extended to the correct number of bits depending on the initial width. Also this operation can be adjusted by the processor specifying, in the case of 16-bits samples, how many bits to pad at the MSB or, in the case of 32-bits width, how many MSBs to clip. The input formatter will treat the samples according to whether they are signed or not.

These considerations are also valid for the output formatter taking into account that the operation is reversed, that is, when the results need to be stored in 32-bits, a padding is applied while, if the 2 bytes representation is chosen, a clipping is performed.

Another feature of the input formatter is the automatic zero padding when per-forming an FFT. In fact, if the number of input samples is not a power of two, this modules adds zeros until the programmed FFT size is reached so that the core computational unit can operate correctly.

To continue with the same example, the destination local memory with the output formatter parameters is represented in Figure 4.4. As can be seen the output samples are stored in a transposed manner. The offset between consecutive samples is 16 bytes corresponding to the size of the sample (4 bytes) times the number of RX antennas. The index between two consecutive iterations is 32 bits to effectively interleave the processed samples.

Table 4.2 then explains in detail the registers of the two formatters and their purpose.

Hardware Accelerator description

Figure 4.4: Example of destination memory with the values of the output for-matter registers. [4]

SRCADDR

DSTADDR

16 Yes

Source and destination start address:

These registers specify the starting address of the input and output samples. These are byte-address and these 16-bit registers cover the entire address space of the four local memories (4 × 16KB

= 64 KB). The four accelerator local memories are contiguous in the memory address space and any of them can act as the source or destination memory (as long as the same memory bank is not configured to be used as both at the same time).

Table 4.2: Input and output formatters registers. [4]

Hardware Accelerator description

SRCACNT

DSTACNT

12 Yes

Source and destination sample count:

These registers specify the number of samples (minus 1) to be read from or written to the local memory for every iteration. The sample count is in number of samples, not number of bytes. For example, the sample count can be specified as 255 (0x0FF) in a case where 256 samples must be processed. Note however that the sample count register does not always match the FFT size.

This can happen when zero-padding of input samples is required or when only a part of the FFT bins must be written to memory.

SRCAINDX

DSTAINDX

16 Yes

Source and destination sample index increment:

These registers specify the number of bytes separating successive samples in the source memory or to be written to the destination one. For example, a value of DSTAINDX = 16 means that succes-sive samples written to the destination memory should be separated by 16 bytes. The maximum value allowed for these registers is 32767.

REG_BCNT 12 Yes

Number of iterations:

This register specifies the number of times (minus 1) the processing should be repeated. This register can be used to process the four RX chains back-to-back – for example, a value of REG_BCNT = 3 means that the processing (say first dimension FFT processing) is repeated four times. Note the distinction between the NLOOPS register of the state machine block and the REG_BCNT register of the input formatter block. The NLOOPS register specifies how many times the state machine loops through all the configured parameter sets (with each time possibly awaiting a trigger), whereas the register REG_BCNT specifies how many times the input formatter and the computational processing of the accelerator is iterated back-to-back for the current parameter set (without any intermediate triggers).

Table 4.2: Input and output formatters registers. [4]

Hardware Accelerator description

SRCBINDX

DSTBINDX

16 Yes

Source and destination offset per iteration:

These registers specify the number of bytes separating the starting address of samples for successive iterations. For example, when us-ing four iterations to process the four RX chains, these registers can be used to specify the offset in the address between the successive RX chains. Note the distinction that SRCAINDX and DSTAINDX specify the number of bytes separating successive samples for a particular iteration, whereas SRCBINDX and DSTBINDX specify the number of bytes separating the starting address of the first sample for successive iterations. The maximum value allowed for these registers is 32767.

SRCREAL

DSTREAL

1 Yes

Complex or real input and output:

These registers specify whether the input and output samples are real or complex. A value of 0 implies a complex sample and a value of 1 implies real input or output. When real input is selected, the input formatter block automatically feeds zero for the imaginary part, while if real output is selected, the output formatter module automatically stores only the real part into the destination memory.

This is useful when the core computational unit is configured to output magnitude or log-magnitude values.

SRC16b32b

DST16b32b

1 Yes

Software trigger bit:

These registers specify whether the samples are to be read or written as 16-bits or 32-bits wide. A value of 0 implies that the samples are 16-bits wide each (in case of complex data, real and imaginary parts are each 16 bits wide). A value of 1 implies that the samples are 32-bits wide each.

SRCSIGNED

DSTSIGNED

1 Yes

Input and output sign-extension mode:

These registers, when set, specify that the samples are signed numbers and hence, sign-extension or signed-saturation at the MSB is required when converting 16- bit or 32-bit words to the computational unit’s 24-bit wide samples or vice versa.

Table 4.2: Input and output formatters registers. [4]

Hardware Accelerator description

Nel documento Automotive radar processing optimization by exploiting the hardware accelerator of radar sensor chip (pagine 49-55)