This introduces a suggestion of mathematical notation protocol for machine learning.
This introduces a suggestion of mathematical notation protocol for machine learning.
The field of machine learning has evolved rapidly in recent years. Communication between different researchers and research groups has become increasingly important. A key challenge for communication arises from inconsistent notation usages among different papers. This proposal suggests a standard for commonly used mathematical notation for machine learning. In this first version, only some notations are mentioned and more notations are left to be done. This proposal will be regularly updated based on the progress of the field. We look forward to more suggestions to improve this proposal in future versions.
Dataset is sampled from a distribution
over a domain
.
Usually, is a subset of
and
is a subset of
, where
is the input dimension,
is the ouput dimension.
is the number of samples. Without specification,
and
are for the training set.
A hypothesis space is denoted by . A hypothesis function is denoted by
or
with
.
denotes the set of parameters of
.
If there exists a target function, it is denoted by or
satisfying
for
.
A loss function, denoted by , measures the difference between a predicted label and a true label, e.g.,
Empirical risk or training loss for a set is denoted by
or
or
or
,
The population risk or expected loss is denoted by or
,
where follows the distribution
.
An activation function is denoted by .
Example 1. Some commonly used activation functions are
The neuron number of the hidden layer is denoted by , The two-layer neural network is
where is the activation function,
is the input weight,
is the output weight,
is the bias term. We denote the set of parameters by
The counting of the layer number excludes the input layer. An -layer neural network is denoted by
where ,
,
,
,
is a scalar function and "
" means entry-wise operation. We denote the set of parameters by
This can also be defined recursively,
The VC-dimension of a hypothesis class is denoted as VCdim(
).
The Rademacher complexity of a hypothesis space on a sample set
is denoted by
or
. The complexity
is random because of the randomness of
. The expectation of the empirical Rademacher complexity over all samples of size
is denoted by
Gradient Descent is often denoted by GD. Stochastic Gradient Descent is often denoted by SGD.
A batch set is denoted by and the batch size is denoted by
.
The learning rate is denoted by .
The discretized frequency is denoted by , and the continuous frequency is denoted by
.
The convolution operation is denoted by .
| symbol | meaning | Latex | simplied |
| ------------------------------------------------------------------------------------------------ | ----------------------------------------------------- | ------------------ | --------------------- |
| | input |
\bm{x}|
\mathbf{x}| |
\bm{y}|
\vy| |
d| | |
d_{\rm o}| | |
n| |
\mathcal{X}|
\fX| |
\mathcal{Y}|
\fY| |
\mathcal{Z}|
\fZ| |
\mathcal{H}|
\mathcal{H}| |
\bm{\theta}|
\mathbf{\theta}| |
\f_{\bm{\theta}}|
f_{\mathbf{\theta}}| |
f, f^*| |
\ell| |
\mathcal{D}|
\fD| |
\sigma| |
\bm{w}_j|
\mathbf{w}_j| |
a_j| |
b_j| |
f_{\bm{\theta}}|
f_{\mathbf{\theta}}| |
B| | |
b| | |
\eta| |
\bm{k}|
\mathbf{k}| |
\bm{\xi}|
\mathbf{xi}| | |
*|
| symbol | meaning | Latex | simplied |
| --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | -------------- | ------------------ |
| | input dimension |
d| | |
d_{\rm o}| | |
m_l| |
\bm{W}^{[l]}|
\mathbf{W}^{[l]}| |
\bm{b}^{[l]}|
\mathbf{b}^{[l]}| |
\circ| |
\sigma| |
\bm{\theta}|
\mathbf{\theta}| |
Please cite this repository in your publications if it helps your research.
@misc{beijing2020Suggested, title = {Suggested Notation for Machine Learning}, author = {Beijing Academy of Artificial Intelligence}, howpublished = {\url{https://github.com/Mayuyu/suggested-notation-for-machine-learning}}, year=2020 }
Chenglong Bao (Tsinghua), Zhengdao Chen (NYU), Bin Dong (Peking), Weinan E (Princeton), Quanquan Gu (UCLA), Kaizhu Huang (XJTLU), Shi Jin (SJTU), Jian Li (Tsinghua), Lei Li (SJTU), Tiejun Li (Peking), Zhenguo Li (Huawei), Zhemin Li (NUDT), Shaobo Lin (XJTU), Ziqi Liu (CSRC), Zichao Long (Peking), Chao Ma (Princeton), Chao Ma (SJTU), Yuheng Ma (WHU), Dengyu Meng (XJTU), Wang Miao (Peking), Pingbing Ming (CAS), Zuoqiang Shi (Tsinghua), Jihong Wang (CSRC), Liwei Wang (Peking), Bican Xia (Peking), Zhouwang Yang (USTC), Haijun Yu (CAS), Yang Yuan (Tsinghua), Cheng Zhang (Peking), Lulu Zhang (SJTU), Jiwei Zhang (WHU), Pingwen Zhang (Peking), Xiaoqun Zhang (SJTU), Chengchao Zhao (CSRC), Zhanxing Zhu (Peking), Chuan Zhou (CAS), Xiang Zhou (cityU).