Inference on Embeded Systems
paper-reading
Lastmod: 2019-02-21

Sparsification and Separation.

key Contributions: * A sparse coding-based approach to optimize depp learning inference execution. * Convolution kernel separation technique * The first to do Deep Learning Model on severely constrained wearable hardware. * implement a prototype, * experiment with for CNN and DNN models, 11.3x improvements in memory and 13.3x inexecution time. accuracy loss 5%

background: DNN, CNN(DNN with some convolutional layers)

local computation: privacy, independent of network condition.

Design Goals: * No re-training * No Cloud Offloading * Low-resource platforms. only a few MBs of RAM. * Minimize Model Changes

Three technique: * Layer Compression Compiler, for Fully-connected layers * Sparse Inference Runtime, load active layers only * Convolution Separation Runtime, for Convolutional layers.

Fully-connected layers:

$f(\mathbf{b} + \mathbf{W} \cdot \mathbf{x})$

Convolutional Layers:

$f(\sum\mathbf{x} \ast \mathcal{K} + \mathbf{b} )$

Weight Factorization:

$W \cdot x$ , $W$ is m x n, if we factorize $M = U \cdot V$, $U$ and $V$ is m x k, then compressed. Previous work use SVD.

This paper improved this by using Dictionary Learning. factorized $M = B \cdot A$, $A$ is a sparse matrix.

Convolutional Layers:

d x d convolutional kernels factorized to d x k and k x d, use SVD.

choose k for specified accuracy, memory, and computation time: for each layer, use binary search to find one.

evaluation: …