# **AXI Protocol Introduction**

Speaker: Jia-Ming Lin

### Outline

- What is AXI?
- AXI Interconnection
- AXI Stream Interface
- Labs
  - $\circ$  ~ Using AXI Stream in HLS and PYNQ ~

#### Today's System-On-Chips



- Large number of various modules
- Each for a special action, Example
  - Run computational intensive tasks on Accelerator
  - Use share memory to share data
- Complex ensemble of basic IP units
  - How to connect them together?
  - How to adding new modules? 0

### Connectivity

- A standard
  - All modules talk based on that standard
  - All modules can talk easily to each other
- Maintenance
  - Design is easily maintained/updated and debugged
- Re-use
  - Units can be easily re-used in other design

#### System-on-Chip Buses



- All modules are connected by the bus
- Through the bus
  - One module can talk to other modules
- Talking should obey a standard rule
- Famous SoC Buses
  - IBM Core-Connect, ARM AXI

#### AXI Master / AXI Slave

- Transaction:
  - Transfer of data from one point in the hardware to another point
- Master: Initiates the transaction
- Slave: Response to the initiated transaction



- AXI Master: CPU
- AXI Slave: Memory
- CPU init tran to mem
- Memory response these reads and write
- Master rectangle
- Slave circle

#### AXI Master / AXI Slave



#### The 5 Channels of AXI Interface



- Each channel contains a set of signals
- Example: Write
  - Initiate: Master puts memory address on address channel
  - Data transfer: Master puts data on Data Channel
  - When complete, slave response results to master through Response Channel.



http://www.googoolia.com/wp/wp-content/uploads/2014/04/designing\_with\_axi\_in\_vivado\_part\_01.pdf

#### Burst AXI Transaction: When Access to Consecutive Data

#### Implementation in HLS





## **AXI Interconnect**

#### **Connecting Masters and Slaves**



- What if we have multiple Masters and Slaves?
- Each Master should be able to initiate transaction to each Slave
- How to connect them?

#### **AXI Interconnect: Connecting Masters and Slaves**



#### AXI Interconnection: Flexibility

- Different number of Masters and Slaves Ports
- Width Conversion
  - 32 bits convert to 16 bits
- AXI3 to AXI4
- Clock Domain Transformation

## **AXI Stream Interface**

#### **Two Types of AXI Interfaces**

- Memory Mapped
  - Read / Write transactions contain destination address
- Streaming
  - One AXI channel







**Applications of AXI Stream** 





#### **AXI Ports Naming Styles**

Memory Mapped



• Streaming



#### AXI Data Mover



#### Data Mover

- Gets configured by the host CPU
- Interrupts when a transfer task is done
  - Example: a frame is transferred completely
- Gets triggered by the host CPU

#### Example:

• AXI Central DMA engine

http://www.googoolia.com/wp/wp-content/uploads/2014/04/axi\_stream.pdf

### Summary

- Different modules are connected by Buses, e.g. AXI interface
  - Connectivity: Standard, Maintenance, Re-use
- Types of AXI interface
  - Memory Mapped
    - Full AXI (Burst Capable )
    - AXI Lite (Single Beat)
  - Stream
- Next time
  - Text book(Parallel Programming for FPGA) chapter 3 CORDIC.

## Labs

#### Prepare the HLS Project

- Download files, link
- Create a HLS project and import files
  - **fir.cpp**: top function
  - fir.h: header file
  - **fir\_testbench.cpp**: testbench.
- We aims to using **AXI stream interface**, so leave N is small size(=21) for simplicity.
- You should further
  - Validate the results from Python and hardware design
  - Compare the performances of CPU and hardware design

#### Using AXI Stream in HLS(1/2)

- AXIS package
   struct axis\_t{
   data\_t data;
   bool last; // important!
   };
- Top function interfaces

void fir\_hw(hls::stream<axis\_t> &input\_val, hls::stream<axis\_t> &output\_val);

• HLS Pragma

#pragma HLS INTERFACE axis port=input\_val bundle=INPUT\_STREAM
#pragma HLS INTERFACE axis port=output\_val bundle=OUTPUT\_STREAM

#### Using AXI Stream in HLS(1/2)

- AXIS read input data
   sample = input\_val.read().data;
- AXIS write output data

```
axis_t result;
result.data = (data_t)acc;
result.last = (i == RUN_LENGTH-1)? 1:0;
output_val.write(result);
```

#### Prepare Bitstream File

- Create a Vivado Project
- I will show how to construct the block diagram in live.



#### **PYNQ Scripts**

In [1]: # Import libraries import pynq import numpy as np from pyng import Overlay from pyng import Xlnk depth = 100000input arr = Xlnk().cma array(shape=(depth,1), dtype=np.int32) output arr = Xlnk().cma array(shape=(depth,1), dtype=np.int32) In [2]: # Load bitstream bitstream path = 'design 1.bit' overlay = Overlay(bitstream path) overlay? In [3]: # Using the hardware modules dma = overlay.axi dma 0 #inpl arr,vx arr fir hw = overlay.fir hw 0In [4]: # Generate dummy input data for i in range(depth): input arr[i] = i+1 input arr[:5] Out[4]: PynqBuffer([[1], [2], [3], [4], [5]])

In [5]: # DMA initiate dma.sendchannel.stop() dma.sendchannel.start() dma.recvchannel.stop() dma.recvchannel.start() In [6]: # Transfering data dma.sendchannel.transfer(input arr) dma.recvchannel.transfer(output arr) fir hw.write(0x00, 0x81) dma.recvchannel.wait() In [7]: # Print output output arr Out[7]: PyngBuffer([[ 6], [-196600], 15], ..., 0], 0], 0]])

#### Trigger FIR IP to start

- The control bus of "fir\_hw"
  - // CONTROL\_BUS
    // 0x0 : Control signals
    // bit 0 ap\_start (Read/Write/COH)
    // bit 1 ap\_done (Read/COR)
    // bit 2 ap\_idle (Read)
    // bit 3 ap\_ready (Read)
    // bit 7 auto\_restart (Read/Write)
    // others reserved
- fir\_hw.write(0x00, 0x81)
  - $\circ$  0x81 means bit 0 and bit 7 are high