Need help with AIChip_Paper_List?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

BirenResearch
261 Stars 58 Forks 308 Commits 1 Opened issues

Services available

!
?

Need anything else?

Contributors list

AI Chip Paper List

Table of Contents

About This Project

​ This project aims to help engineers, researchers and students to easily find and learn the good thoughts and designs in AI-related fields, such as AI/ML/DL accelerators, chips, and systems, proposed in the top-tier architecture conferences (ISCA, MICRO, ASPLOS, HPCA).

​ This project is initiated by the Advanced Computer Architecture Lab (ACA Lab) in Shanghai Jiao Tong University in collaboration with Biren Research. Articles from additional sources is being added. Please let us know if you have any comments or willing to contribute.

The Listing of Tags

​ For guidance and searching purposes, Tags and/or notes are assigned to all these papers . We will use the following tags to annotate these papers.

Tags

The Chronological Listing of Papers

​ We list all AI related articles collected. The links of paper/slides/note are provided under the title of each article If available. Updating is in progress

ISCA

2020

| Tags | - | Title | Authors | Affiliations | | ------------------------------------------------------------ | ---- | ------------------------------------------------------------ | --------------------------------------------------- | ------------------------------------------------------------ | | Inference; SIMD | | High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs
paper note|Glenn Henry; Parviz Palangpour | Centaur Technology | | Inference; dataflow | |Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workload
paper note|Dennis Abts; Jonathan Ross | Groq Inc. | | Spiking; dataflow; Sparsity | |SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks
paper note |Surya Narayanan; Karl Taht | University of Utah | | Inference; benchmarking | |MLPerf Inference Benchmark
paper note|Vijay Janapa Reddi; Lingjie Xu, etc. | | | GPU; Compression | |Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
paper note |Esha Choukse; Michael Sullivan | University of Texas at Austin; NVIDIA | | Inference; runtime | |A Multi-Neural Network Acceleration Architecture
paper note |Eunjin Baek; Dongup Kwon; Jangwoo Kim |Seoul National University| | Inference; Dynamic fixed-point | |DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration
paper note| Zhuoran Song; Naifeng Jing; Xiaoyao Liang |Shanghai Jiao Tong University| | Training; LSTM; GPU | |Echo: Compiler-Based GPU Memory Footprint Reduction for LSTM RNN Training
paper note|Bojian Zheng; Nandita Vijaykumar |University of Toronto| | Inference | |DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation
paper note |Udit Gupta; Samuel Hsia; Vikram Saraph |Harvard University; Facebook Inc|

2019

| Tags | - | Title | Authors | Affiliations | | ------------------------------------------------------------ | ---- | ------------------------------------------------------------ | --------------------------------------------------- | ------------------------------------------------------------ | | Inference, Dataflow | | 3D-based Video Recognition Acceleration by Leveraging Temporal Locality
paper note| Huixiang Chen; Tao Li | University of Florida | | Inference; Quantumn | | A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron Superconducting Technology
paper note | Ruizhe Cai; Ao Ren; Nobuyuki Yoshikawa; Yanzhi Wang | Northeastern University | | Training; Reinforcement Learning; Distributed training | | Accelerating Distributed Reinforcement Learning with In-Switch Computing
paper note | Youjie Li; Jian Huang | UIUC | | Training; Sparsity | | Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks
paper note | Jiaqi Zhang; Tao Li | University of Florida | | Inference; Sparsity; Bit-serial | | Laconic Deep Learning Inference Acceleration
paper note | Sayeh Sharify; Andreas Moshovos | University of Toronto | | Inference; Memory; bandwidth-saving; large-scale networks; compression | | MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks
paper note | Hanhwi Jang; Jangwoo Kim | POSTECH; Seoul National University | | Inference; ReRAM; Sparsity | | Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks
paper note | Tzu-Hsien Yang | National Taiwan University; Academia Sinica; Macronix International. | | Infernce; Redundant computing | | TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network
paper note | Chunhua Deng; Bo Yuan | Rutgers University | | Training; CNN; floating point | | FloatPIM_ in-memory acceleration of deep neural network training with high precision
paper note | Mohsen Imani; Tajana Rosing | UC San Diego | | Training; Programming model | | Cambricon-F_ machine learning computers with fractal von neumann architecture
paper note | Yongwei Zhao; Yunji Chen | ICT; Cambricon |

2018

| Tags | - | Title | Authors | Affiliations | |------------------------------------------------|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|----------------------------------------------------------------------| | Training;CNN; RNN | | A Configurable Cloud-Scale DNN Processor for Real-Time AI
paper note | Jeremy Fowers; Doug Burger | Microsoft | | Inference; ReRAM | | PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine- Learning Algorithms
paper note | Prakalp Srivastava; Mingu Kang | University of Illinois at Urbana-Champaign; IBM | | Inference; Dataflow | | Computation Reuse in DNNs by Exploiting Input Similarity
paper slides note | Marc Riera; Antonio Gonza ?lez | Universitat Polite ?cnica de Catalunya | | Spiking | | Flexon: A Flexible Digital Neuron for Efficient Spiking Neural Network Simulations
paper note slides | Dayeol Lee; Jangwoo Kim | Seoul National University; University of California | | Space-time computing | | Space-Time Algebra: A Model for Neocortical Computation
paper slides note | James E. Smith | University of Wisconsin-Madison | | Inference; Cross-module optimization | | RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM
paper note | Fengbin Tu; Shaojun Wei | Tsinghua University | | Inference;Datapath: bit-serial | | Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
paper note | Charles Eckert; Reetuparna Das | University of Michigan; Intel Corporation | | Inference;Cross-module optimization | | EVA2: Exploiting Temporal Redundancy in Live Computer Vision
paper note slides | Mark Buckler; Adrian Sampson | Cornell University | | Inference;CNN; Cross-module optimization; Power optimization | | Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision
paper slides note | Yuhao Zhu; Paul Whatmough | University of Rochetster; ARM Research | | Inference;GAN; Sparsity; MIMD; SIMD | | GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks
paper note | Amir Yazdanbakhsh; Hadi Esmaeilzadeh | Georgia Institute of Technology; UC San Diego; Qualcomm Technologies | | Inference; CNN; Approximate | | SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks
paper note | Vahideh Akhlaghi; Hadi Esmaeilzadeh | Georgia Institute of Technology; UC San Diego; Qualcomm . | | Inference;CNN; Sparsity; | | UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition
paper note | Kartik Hegde; Christopher W. Fletche | University of Illinois at Urbana-Champaign; NVIDIA | | Inference; Non-uniform | | Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation
paper note | Eunhyeok Park; Sungjoo Yoo | Seoul National University | | Inference; Dataflow: Dynamic | | Prediction Based Execution on Deep Neural Networks
paper note | Mingcong Song; Tao Li | University of Flirida | | Inference; Datapath: bit-serial | | Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
paper note | Hardik Sharma; Hadi Esmaeilzadeh | Georgia Institute of Technology; University of California | | Training; memory: bandwith-saving | | Gist: Efficient Data Encoding for Deep Neural Network Training
paper note | Animesh Jain; Gennady Pekhimenko | Microsoft Research; University of Toronto; Univerity of Michigan | | Inference; Cross-module optimization | | The Dark Side of DNN Pruning
paper note | Reza Yazdani; Antonio Gonza ?lez | Universitat Polite ?cnica de Catalunya
|

2017

| Tags | - | Title | Authors | Affiliations | | ------------------------------------------ | ---- | ------------------------------------------------------------ | ---------------------------------------- | ------------------------------------------------------------ | | Inference | | In-Datacenter Performance Analysis of a Tensor Processing Unit
paper note | Norman P. Jouppi | Google | | Inference; Dataflow | | Maximizing CNN Accelerator Efficiency Through Resource Partitioning
paper note| Yongming Shen | Stony Brook University | | Training | | SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
paper note| Swagath Venkataramani; Anand Raghunathan | Purdue University; Parallel Computing Lab; Intel Corporation | | Inference; Algorithm-architecture-codesign | | Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
paper note | Jiecao Yu; Scott Mahlke | University of Michigan; ARM | | Inference; Sparsity | | SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
paper note | Angshuman Parashar; William J. Dally | NVIDIA; MIT; UC-Berkeley; Stanford University | | Training; Low-bit | | Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
paper note | Christopher De Sa; Kunle Olukotun | Stanford University |

2016

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Inference;Sparsity | | Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
paper note | Jorge Albericio; Tayler Hetheringto | University of Toronto; University of British Columbia | | Inference; Analog | | ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
paper note | Ali Shafiee; Vivek Srikumar | University of Utah,Hewlett Packard Labs | | Inference; PIM | | PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory
paper note | Ping Chi; Yuan Xie | University of California | | Inference; Sparsity | | EIE: Efficient Inference Engine on Compressed Deep Neural Network
paper note | Song Han; William J. Dally | Stanford University; NVIDIA | | Inference; Analog | | RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile
paper note | Robert LiKamWa; Lin Zhong | Rice University | | Inference; Architecture-Physical-Co-design | | Minerva: Enabling Low-Power; Highly-Accurate Deep Neural Network Accelerators
paper note | Brandon Reagen; David Brooks | Harvard University | | Inference; Dataflow | | Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
paper note | Yu-Hsin Chen; Vivienne Sze | MIT; NVIDIA | | Inference; 3D integration | | Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory
paper note | Duckhwan Kim; Saibal Mukhopadhyay | Georgia Institute of Technology | | Inference | | Cambricon: An Instruction Set Architecture for Neural Networks
paper note | Shaoli Liu; Tianshi Chen | CAS; Cambricon Ltd. |

2015

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Inference; Cross-module optimization | | ShiDianNao: Shifting Vision Processing Closer to the Sensor
paper note | Zidong Du | ICT |

ASPLOS

2020

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Inference; Security | | Shredder: Learning Noise Distributions to Protect Inference Privacy
paper note | Fatemehsadat Mireshghallah; Mohammadkazem Taram; et.al. |UCSD | | Algorithm-Architecture co-design; Security | | DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks
paper note| Xingbin Wang; Rui Hou; Boyan Zhao; et.al. | CAS; USC | | programming model; Algorithm-Architecture co-design | | Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators
paper note| Xuan Yang; Mark Horowitz; et.al. | Stanford; THU | | Algorithm-Architecture co-design; security | | DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints
paper note codes | Xing Hu; Yuan Xie; et.al. | UCSB | | Training; distributed computing | | Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training
paper note | Qinyi Luo; Jiaao He; Youwei Zhuo; Xuehai Qian | USC | | compression | | PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
paper | Wei Niu; Xiaolong Ma; Sheng Lin; et.al. | College of William and Mary; Northeastern ; USC | | Power optimization; compute-memory trade-off | | Capuchin: Tensor-based GPU Memory Management for Deep Learning
paper note| Xuan Peng; Xuanhua Shi; Hulin Dai; et.al.| HUST; MSRA; USC | | Compute-memory trade-off | | NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
paper | Bongjoon Hyun; Youngeun Kwon; Yujeong Choi; et.al. | KAIST | | Algorithm-Architecture co-design | | FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
paper note codes | Size Zheng; Yun Liang; Shuo Wang; et.al. | PKU |

2019

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Inference, ReRAM | | PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
paper note | Aayush Ankit; Dejan S Milojičić; et.al. | Purdue; UIUC; HP | |Reinforcement Learning | | FA3C: FPGA-Accelerated Deep Reinforcement Learning
paper note | Hyungmin Cho; Pyeongseok Oh; Jiyoung Park; et.al.| Hongik University; SNU | | Inference, ReRAM | | FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
paper note | Yu Ji; Yuan Xie; et.al. | THU; UCSB | | Inference, Bit-serial | | Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks
paper note| Alberto Delmas Lascorz; Andreas Ioannis Moshovos; et.al. | Toronto; NVIDIA | | Inference, Dataflow | | TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators
paper note codes | Mingyu Gao; Xuan Yang; Jing Pu; et.al. | Stanford | | Inference, CNN, Systolic, Sparsity | | Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
paper codes note | Hsiangtsung Kung;Bradley McDanel; Saiqian Zhang | Harvard | | Training, CNN, Distributed computing | | Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization
paper note | Tian Jin; Seokin Hong | IBM; Kyungpook National University | | Training, Distributed computing | | HOP: Heterogeneity-Aware Decentralized Training
paper note | Qinyi Luo; Jinkun Lin; Youwei Zhuo; Xuehai Qian | USC; THU | | Training, Compiler | | Astra: Exploiting Predictability to Optimize Deep Learning
paper note| Muthian Sivathanu; Tapan Chugh; Sanjay S Singapuram; Lidong Zhou | Microsoft | | Training, Quantization, Compression | | ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers
paper note | Ao Ren; Tianyun Zhang; Shaokai Ye; et.al.| Northeastern; Syracuse; SUNY; Buffalo; USC | | Security | | DeepSigns: An End-to-End Watermarking Framework for Protecting the Ownership of Deep Neural Networks
paper note| Bita Darvish Rouhani; Huili Chen; Farinaz Koushanfar | UCSD |

2018

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Compiler | | Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler
paper slides note | Yu Ji; Youhui Zhang; Wenguang Chen; Yuan Xie | Tsinghua; UCSB | | Inference, Dataflow, NoC | | MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects
paper note slides | Hyoukjun Kwon; Ananda Samajdar; Tushar Krishna | Georgia Tech | | Bayesian | | VIBNN: Hardware Acceleration of Bayesian Neural Networks
paper note| Ruizhe Cai; Ao Ren; Ning Liu; et.al. | Syracuse University; USC |

2017

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Dataflow, 3D Integration| | Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory
paper note | Mingyu Gao; Jing Pu; Xuan Yang | Stanford University |
| CNN; Algorithm-Architecture co-design | | SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
paper note | Ao Ren; Zhe Li; Caiwen Ding| Syracuse University; USC; The City College of New York |

2015

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Inference | | In-Datacenter Performance Analysis of a Tensor Processing Unit
paper note | Daofu Liu; Tianshi Chen; Shaoli Liu | CAS; USTC; Inria |

2014

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | Inference | | DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
paper note | Tianshi Chen; Zidong Du; Ninghui Sun | CAS; Inria |

MICRO

2020

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | PIM/CIM; systolic | | Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration
paper note | Akshay Krishna Ramanathan1 | The Pennsylvania State University ; Intel | | PIM; cache; reconfigurable | | FReaC Cache: Folded-Logic Reconfigurable Computing in the Last Level Cache
paper note | Ashutosh Dhar | University of Illinois; Urbana-Champaign; †IBM Research; | | Bayesian; sparsity | | Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks
paper note | Qiyu Wan | ECOMS Lab; University of Houston | | low-bit | | Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks
paper note | Gil Shomron; Uri Weiser | Faculty of Electrical Engineering; Technion — Israel Institute of Technology | | compiler | | ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
paper note | Sheng-Chun Kao; Geonhwa Jeong; Tushar Krishna | Georgia Institute of Technology | | algorithm-architecture co-design; cross-module optimization | | VR-DANN: Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration
paper note | Zhuoran Song; Feiyang Wu; Xueyuan Liu1 | Shanghai Jiao Tong University; Biren Research | |PIM/CIM| | Newton: A DRAM-Maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning
paper note | Mingxuan He | Purdue University |
| || Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
paper note | Soroush Ghodrati ;Byung Hoon Ahn ;Joon Kyung Kim | Bigstream Inc. ;Kansas University;University of Illinois Urbana-Champaign;NVIDIA Research;Google Inc. |
|training; sparsity| | Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training
paper note | Dingqing Yang; Amin Ghasemazar; Xiaowei Ren | The University of British Columbia; Microsoft Corporation |
|GPU; tensor core; compiler; bandwidth saving|| Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores
paper note | Hyeonjin Kim; Sungwoo Ahn; Yunho Oh | Yonsei University; EcoCloud |
|algorithm-architecture co-design; compute-memory tradeoff| | DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture
paper note | Liu Liu | UC Santa Barbara |
|inference; compression| | TFE: Energy-Efficient Transferred Filter-Based Engine to Compress and Accelerate Convolutional Neural Networks
paper note | Huiyu Mo; Leibo Liu; Wenjing Hu | Tsinghua University;Intel |
|training; sparsity| | TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
paper note | Mostafa Mahmoud; Isak Edo; Ali Hadi Zadeh | University of Toronto;Cerebras Systems;Vector Institute |
|training; inference; sparsity; CPU| | SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs
paper note | Zhangxiaowen Gong; Houxiang Ji| University of Illinois at Urbana-Champaign; Intel |
|NLP; sparsity; bandwidth saving| | GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
paper note | Ali Hadi Zadeh; Isak Edo; Omar Mohamed Awad | University of Toronto |
|training; cross-module optimization| | TrainBox: An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations
paper note | Pyeongsu Park; Heetaek Jeong; Jangwoo Kim| Seoul National University |

2019

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | | compute-memory trade-off; Dataflow | | Wire-Aware Architecture and Dataflow for CNN Accelerators
paper note | Sumanth Gudaparthi; Surya Narayanan; Rajeev Balasubramonian ; Edouard Giacomin ; Hari Kambalasubramanyam; Pierre-Emmanuel Gaillardon | Utah | | security; compute-memory trade-off | | ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning
paper note | Shang-Tse Chen; Cory Cornelius; Jason Martin; Duen Horng Chau | Georgia tech; intel | | Inference; NoC; Cross-Module optimization | | Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
paper note slides | Yakun Sophia Shao;Jason Clemons; Rangharajan Venkatesan; et. al. | NVIDIA | | compression; ISA; Cross-Module optimization | | ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions
paper note | Berkin Akin; Zeshan A. Chishti; Alaa R. Alameldeen | Google; Intel | | Algorithm-Architecture co-design | | Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating
paper note| Weizhe Hua; Yuan Zhou; Christopher De Sa; et.al. | Cornell | | Sparsity | | SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks
paper note | Ashish Gondimalla; Noah Chesnu; Noah Chesnu; et.al. | Purdue | | Power-optimization; Approximate; | | EDEN: Enabling Approximate DRAM for DNN Inference using Error-Resilient Neural Networks
paper note | Skanda Koppula; Lois Orosa; A. Giray Yağlıkçı; et.al. | ETHZ | | inference; CNN | | eCNN: a Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
paper note | Chao-Tsung Huang; Yu-Chun Ding;Huan-Ching Wang; et. al. | NTHU | | Architecture-Physical co-design | | TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
paper note | Youngeun Kwon; Yunjae Lee; Minsoo Rhu | KAIST | | Architecture-Physical co-design; dataflow | | Understanding Reuse; Performance; and Hardware Cost of DNN Dataflows: A Data-Centric Approach
paper note | Hyoukjun Kwon; Prasanth Chatarasi; Michael Pellauer; et.al. | Georgia Tech; NVIDIA | | sparsity; inference; | | MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
paper note| Lillian Pentecost, Marco Donato, Brandon Reagen; et.al. | Harvard; Facebook | | RNN; Special operation; | | Neuron-Level Fuzzy Memoization in RNNs
paper note | Franyell Silfa;Gem Dot; Jose-Maria Arnau; et.al. | UPC | | inference; Algorithm-Architecture co-design; | | Manna: An Accelerator for Memory-Augmented Neural Networks
paper note | Jacob R. Stevens; Ashish Ranjan; Dipankar Das; et.al. | Purdue; Intel | | PIM | | eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing
paper note | Elaheh Sadredini; Reza Rahimi; Vaibhav Verma;et.al. | Virginia | | Sparsity | | ExTensor: An Accelerator for Sparse Tensor Algebra
paper note | Kartik Hegde; Hadi Asghari-Moghaddam; Michael Pellauer | UIUC; NVIDIA | | Sparsity; Algorithm-Architecture co-design | | Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization
paper note| Fazle Sadi; Joe Sweeney; Tze Meng Low; et.al. | CMU | | sparsity; Algorithm-Architecture co-design; compression | | Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs
paper note| Maohua Zhu; Tao Zhang; Tao Zhang; Yuan Xie | UCSB; Alibaba | | special operation; inference | | ASV: Accelerated Stereo Vision System
paper note codes1 codes2 | Yu Feng; Paul Whatmough; Yuhao Zhu | Rochester | | Algorithm-Architecture co-design; special operation | | Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach
paper note | Mingyu Yan;Xing Hu; Shuangchen Li; et.al. | UCSB; ICT |

2018

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Sparsity| | Cambricon-s: Addressing Irregularity in Sparse Neural Networks: A Cooperative Software/Hardware Approach
paper note| Xuda Zhou ; Zidong Du ; Qi Guo ; Shaoli Liu ; Chengsi Liu ; Chao Wang ; Xuehai Zhou ; Ling Li ; Tianshi Chen ; Yunji Chen | USTC; CAS | | Inference; CNN; spatial correlation | | Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator
paper note| Mostafa Mahmoud ; Kevin Siu ; Andreas Moshovos | University of Toronto | | Distributed computing | | Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
paper note| Youngeun Kwon; Minsoo Rhu | KAIST | | RNN | | Towards Memory Friendly Long-Short Term Memory Networks(LSTMs) on Mobile GPUs
paper note | Xingyao Zhang; Chenhao Xie; Jing Wang; et.al. | University of Houston; Capital Normal University | | Training, distributed computing, compression | | A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks
paper note| Youjie Li; Jongse Park; Mohammad Alian; et.al. | UIUC; THU; SJTU; Intel; UCSD | | Inference, sparsity, compression | | PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices
paper note| Chunhua Deng; Siyu Liao; Yi Xie; et.al. | City University of New York; University of Minnesota; USC | | Reinforcement Learning, algorithm-architecture co-design | | GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
paper note| Ananda Samajdar; Parth Mannan; Kartikay Garg; Tushar Krishna | Georgia Tech | | Training, PIM | | Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach
paper note| Jiawen Liu; Hengyu Zhao; et.al. | UCM; UCSD; UCSC | | GAN, PIM | | LerGAN: A Zero-free; Low Data Movement and PIM-based GAN Architecture
paper note | Haiyu Mao; Mingcong Song; Tao Li; et.al. | THU; University of Florida | | Training, special operation, dataflow | | Multi-dimensional Parallel Training of Winograd Layer on Memory-centric Architecture
paper note | Byungchul Hong; Yeonju Ro; John Kim | KAIST | | PIM/CIM | | SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator
paper note| Shuangchen Li; Alvin Oliver Glova; Xing Hu; et.al.| UCSB; Samsung | | Inference, algorithm-architecture co-design | | Morph: Flexible Acceleration for 3D CNN-based Video Understanding
paper note| Kartik Hegde; Rohit Agrawal; Yulun Yao; Christopher W Fletcher | UIUC |

2017

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Bit-serial| | Bit-Pragmatic Deep Neural Network Computing
paper note|Jorge Albericio; Alberto Delmás; Patrick Judd; et.al. | NVIDIA; University of Toronto | | CNN, Special computing | | CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
paper note| Caiwen Ding; Siyu Liao; Yanzhi Wang; et.al.| Syracuse University; City University of New York; USC; California State University; Northeastern University | | PIM | | DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
paper note| Shuangchen Li; Dimin Niu; et.al. | UCSB; Samsung | | Distributed computing | | Scale-Out Acceleration for Machine Learning
paper note| Jongse Park; Hardik Sharma; Divya Mahajan; et.al. | Georgia Tech; UCSD | | DNN, Sparsity, Bandwidth saving| | DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission
paper note | Parker Hill; Animesh Jain; Mason Hill; et.al. | Univ. of Michigan; Univ. of Nevada |

2016

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |DNN, compiler, Dataflow | | From High-Level Deep Neural Models to FPGAs
paper note| Hardik Sharma; Jongse Park; Divya Mahajan; et.al. | Georgia Institute of Technology; Intel | | DNN, Runtime, training | | vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
paper note| Minsoo Rhu; Natalia Gimelshei; Jason Clemons; et.al. | NVIDIA | | Bit-serial | | Stripes: Bit-Serial Deep Neural Network Computing
paper note| Patrick Judd; Jorge Albericio; Tayler Hetherington; et.al.| University of Toronto; University of British Columbia | | Sparsity | | Cambricon-X: An Accelerator for Sparse Neural Networks
paper note| Shijin Zhang; Zidong Du; Lei Zhang; et.al. | Chinese Academy of Sciences | | Neuromorphic, Spiking, programming model| | NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints
paper note | Yu Ji; YouHui Zhang; ShuangChen Li; et.al.| Tsinghua University; UCSB | | Cross Module optimization | | Fused-Layer CNN Accelerators
paper note| Manoj Alwani; Han Chen; Michael Ferdman; Peter Milder | Stony Brook University | | power optimization, cross module optimization | | A Patch Memory System For Image Processing and Computer Vision
paper note | Jason Clemons; Chih-Chi Cheng; Iuri Frosio; Daniel Johnson; Stephen W. Keckler | NVIDIA; Qualcomm | | power optimization | | An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
paper note | Reza Yazdani; Albert Segura; Jose-Maria Arnau; Antonio Gonzalez | Universitat Politecnica de Catalunya |

2014

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Inference, CNN | | DaDianNao: A Machine-Learning Supercomputer
paper note| Yunji Chen; Tao Luo; Shaoli Liu; et.al. | CAS; Inria; Inner Mongolia University |

HPCA

2020

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |ReRam | | Deep Learning Acceleration with Neuron-to-Memory Transformation
Paper note | Mohsen Imani; Mohammad Samragh Razlighi; Yeseong Kim; et.al. | UCSD | | graph network | | HyGCN: A GCN Accelerator with Hybrid Architecture
Paper note | Mingyu Yan; Lei Deng; Xing Hu; et.al. | ICT; UCSB | | training; sparsity | | SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
Paper note Slides | Eric Qin; Ananda Samajdar; Hyoukjun Kwon; et.al. | Georgia Tech | | Programming model; DNN | | PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs
Paper note | Yujeong Choi; Minsoo Rhu | KAIST | | sparsity; compute-memory trade-off | | ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
Paper note | Bahar Asgari; Ramyad Hadidi; Tushar Krishna; et.al. | Georgia Tech | | sparsity;Algorithm-Architecture co-design | | SpArch: Efficient Architecture for Sparse Matrix Multiplication
Paper note Project | Zhekai Zhang; Hanrui Wan; Song Han ; William J. Dally | MIT; NVIDIA | | Algorithm-Architecture co-design; Approximation | | A3: Accelerating Attention Mechanisms in Neural Networks with Approximation
Paper note | Tae Jun Ham; Sung Jun Jung; Seonghak Kim; et.al. | SNU | | training; Architecture-Physical co-design | | AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays
Paper note | Linghao Song; Fan Chen; Youwei Zhuo; et.al. | Duke; USC | | Special operation, architecture-physical co-design | | PIXEL: Photonic Neural Network Accelerator
Paper note | Kyle Shiflett; Dylan Wright; Avinash Karanth; Ahmed Louri | Ohio; George Washington | | Capasule; PIM | | Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
Paper note | Xingyao Zhang; Shuaiwen Leon Song; Chenhao Xie; et.al. | Houston | | Bandwidth saving | | Communication Lower Bound in Convolution Accelerators
Paper note | Xiaoming Chen; Yinhe Han; Yu Wang | ICT; THU | | Training, Distributed computing; algorithm-architecture co-design | | EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
Paper note | Jianbo Dong; Zheng Cao; Tao Zhang; et.al. | Alibaba | | NoC; | | Experiences with ML-Driven Design: A NoC Case Study
Paper note | Jieming Yin; Subhash Sethumurugan; Yasuko Eckert; et.al. | AMD | | sparsity | | Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations
Paper note | Nitish Srivastava; Hanchen Jin; Shaden Smith; et.al. | Cornell; Intel | | algorithm-architecture co-design | | A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms
Paper note | Jian Weng; Sihao Liu; Zhengrong Wang; et.al. | UCLA | | Reinforcement Learning; NoC; algorithm-architecture co-design | | A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study
Paper note | Ting-Ru Lin; Drew Penney; Massoud Pedram; Lizhong Chen | USC; OSU | | power optimization | | Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices
Paper note | Jawad Haj-Yahya; Yanos Sazeides; Mohammed Alser; et.al. | ETHZ; Cyprus; CMU |

2019

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |training; compute-memory trade-off | | HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
paper note | Linghao Song; Jiachen Mao; Yiran Chen; et.al. | Duke; USC | | RNN; algorithm-architecture co-design | | E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
paper note | Zhe Li; Caiwen Ding; Siyue Wang | Syracuse University; Northeastern University; Florida International University; USC; University at Buffalo | | CNN, Bit-serial, Sparsity | | Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
paper note | Xiaowei Wang; Jiecao Yu; Charles Augustine; et.al.| Michigan; Intel | | cross-Module optimization | | Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators
paper note| Arash Azizimazreah; Lizhong Chen | OSU | | PIM/CIM, low-bit, binary | | NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks
paper note| Hyeonuk Kim; Jaehyeong Sim; Yeongjae Choi; Lee-Sup Kim | KAIST | | Accuracy-Latency trade-off | | Kelp: QoS for Accelerators in Machine Learning Platforms
paper note| Haishan Zhu; David Lo; Liqun Cheng | Microsoft; Google; UT Austin | | inference | | Machine Learning at Facebook: Understanding Inference at the Edge
paper note | Carole-Jean Wu; David Brooks; Kevin Chen; et.al. | Facebook | | Architecture-Physical co-design | | The Accelerator Wall: Limits of Chip Specialization
paper note codes | Adi Fuchs; David Wentzlaff | Princeton |

2018

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |special operation; approximate | | Making Memristive Neural Network Accelerators Reliable
paper note| Ben Feinberg; Shibo Wang; Engin Ipek | University of Rochester | | Algorithm-Architecture co-design; GAN | | Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning
papernote | Mingcong Song; Jiaqi Zhang; Huixiang Chen; Tao Li | University of Florida | | compression; sparsity | | Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
paper note | Minsoo Rhu; Mike O'Connor; Niladrish Chatterjee; et.al. | POSTECH; NVIDIA; UT-Austin | | architecture-psychical co-design; inference | | In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems
paper note| Mingcong Song; Kan Zhong; Tao li; et.a. | University of Florida; Chongqing University; Capital Normal University | | Special operation; ReRam | | GraphR: Accelerating Graph Processing Using ReRAM
paper note| Linghao Song; Youwei Zhuo; Xuehai Qian | Duke; USC; | | pim; Special operation; datafow | | GraphP: Reducing Communication of PIM-based Graph Processing with Efficient Data Partition
paper note | Mingxing Zhang; Youwei Zhuo; Chao Wang; et.al. | THU; USC; Stanford | | Power optimization; PIM | | PM3: Power Modeling and Power Management for Processing-in-Memory
paper note| Chao Zhang; Tong Meng; Guangyu Sun | PKU |

2017

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Inference, CNN, Dataflow | | FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
paper note | Wenyan Lu; Guihai Yan; Jiajun Li; et.al. | Chinese Academy of Sciences | | Inference, ReRAM | | PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
paper note| Linghao Song; Xuehai Qian; Hai Li; Yiran Chen | University of Pittsburgh; University of Southern California | | Training | | Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
paper note| Mingcong Song; Yang Hu; Huixiang Chen; Tao Li | University of Florida |

2016

| Tags | - | Title | Authors | Affiliations | | ----------------------------- | ---- | ------------------------------------------------------------ | --------- | ------------ | |Programming model, training | | TABLA: A Unified Template-based Architecture for Accelerating Statistical Machine Learning
paper note | Divya Mahajan; Jongse Park; Emmanuel Amaro | Georgia Institute of Technology | |ReRam; Boltzmann | | Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning
paper note | Mahdi Nazm Bojnordi; Engin Ipek | University of Rochester |

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.