文献阅读（43）

文章目录

1 缩写 & 引用
2 abstract & introduction
3 指令集架构

3.1 有条件指令

4 微架构

题目：OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
时间：2019
期刊：TVLSI
研究机构：UCLA/Lei He

1 缩写 & 引用

RME: runtime multiplication and accumulation unit efficiency
OPU: overlay domain-specific processor unit
TCI: trigger condition index

DLA: Compiler and FPGA overlay for neural network inference acceleration 2018 FPL
Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs 2017 DAC

2 abstract & introduction

这篇论文提出里一个手写RTL的OPU，CNN网络可以编译成指令，跑在这个OPU上，而硬件不需要更改
编译器可以进行指令集的优化，包括operation fusion和定点化

3 指令集架构

所有的指令分成有条件指令和无条件指令，每个指令块包括一个有条件指令和0或0以上的无条件指令，指令都是按照指令块的单位fetch并分配到PE模块中执行
每个PE计算两个长度为N的一维向量的内积，本次设计中N=16

3.1 有条件指令

memory读：从片外存储读到片上
memory写：把结果写到片外
data fetch：把数据从片上buffer喂到计算单元里
compute：控制所有的PE
post process：包括池化、**、数据量化、中间结果累加、residual操作
指令读：读一个新的指令块

采用动态的流水

4 微架构

文献阅读（43）