K-SVD

Dictionary learning algorithm

Machine learning and data mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Online learning Batch learning Meta-learning Semi-supervised learning Self-supervised learning Reinforcement learning Curriculum learning Rule-based learning Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Cognitive computing Deep learning DeepDream Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Restricted Boltzmann machine GAN Diffusion model SOM Convolutional neural network U-Net Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine-learning venues ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

In applied mathematics, k-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. k-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. It is structurally related to the expectation–maximization (EM) algorithm.^[1]^[2] k-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

k-SVD algorithm

k-SVD is a kind of generalization of k-means, as follows. The k-means clustering can be also regarded as a method of sparse representation. That is, finding the best possible codebook to represent the data samples $\{y_{i}\}_{i=1}^{M}$ by nearest neighbor, by solving

\quad \min \limits _{D,X}\{\|Y-DX\|_{F}^{2}\}\qquad {\text{subject to }}\forall i,x_{i}=e_{k}{\text{ for some }}k.

which is nearly equivalent to

\quad \min \limits _{D,X}\{\|Y-DX\|_{F}^{2}\}\qquad {\text{subject to }}\quad \forall i,\|x_{i}\|_{0}=1

which is k-means that allows "weights".

The letter F denotes the Frobenius norm. The sparse representation term $x_{i}=e_{k}$ enforces k-means algorithm to use only one atom (column) in dictionary $D$ . To relax this constraint, the target of the k-SVD algorithm is to represent the signal as a linear combination of atoms in $D$ .

The k-SVD algorithm follows the construction flow of the k-means algorithm. However, in contrast to k-means, in order to achieve a linear combination of atoms in $D$ , the sparsity term of the constraint is relaxed so that the number of nonzero entries of each column $x_{i}$ can be more than 1, but less than a number $T_{0}$ .

So, the objective function becomes

\quad \min \limits _{D,X}\{\|Y-DX\|_{F}^{2}\}\qquad {\text{subject to }}\quad \forall i\;,\|x_{i}\|_{0}\leq T_{0}.

or in another objective form

\quad \min \limits _{D,X}\sum _{i}\|x_{i}\|_{0}\qquad {\text{subject to }}\quad \forall i\;,\|Y-DX\|_{F}^{2}\leq \epsilon .

In the k-SVD algorithm, the $D$ is first fixed and the best coefficient matrix $X$ is found. As finding the truly optimal $X$ is hard, we use an approximation pursuit method. Any algorithm such as OMP, the orthogonal matching pursuit can be used for the calculation of the coefficients, as long as it can supply a solution with a fixed and predetermined number of nonzero entries $T_{0}$ .

After the sparse coding task, the next is to search for a better dictionary $D$ . However, finding the whole dictionary all at a time is impossible, so the process is to update only one column of the dictionary $D$ each time, while fixing $X$ . The update of the $k$ -th column is done by rewriting the penalty term as

\|Y-DX\|_{F}^{2}=\left\|Y-\sum _{j=1}^{K}d_{j}x_{j}^{\text{T}}\right\|_{F}^{2}=\left\|\left(Y-\sum _{j\neq k}d_{j}x_{j}^{\text{T}}\right)-d_{k}x_{k}^{\text{T}}\right\|_{F}^{2}=\|E_{k}-d_{k}x_{k}^{\text{T}}\|_{F}^{2}

where $x_{k}^{\text{T}}$ denotes the k-th row of X.

By decomposing the multiplication $DX$ into sum of $K$ rank 1 matrices, we can assume the other $K-1$ terms are assumed fixed, and the $k$ -th remains unknown. After this step, we can solve the minimization problem by approximate the $E_{k}$ term with a $rank-1$ matrix using singular value decomposition, then update $d_{k}$ with it. However, the new solution for the vector $x_{k}^{\text{T}}$ is not guaranteed to be sparse.

To cure this problem, define $\omega _{k}$ as

\omega _{k}=\{i\mid 1\leq i\leq N,x_{k}^{\text{T}}(i)\neq 0\},

which points to examples $\{y_{i}\}_{i=1}^{N}$ that use atom $d_{k}$ (also the entries of $x_{i}$ that is nonzero). Then, define $\Omega _{k}$ as a matrix of size $N\times |\omega _{k}|$ , with ones on the $(i,\omega _{k}(i)){\text{th}}$ entries and zeros otherwise. When multiplying ${\tilde {x}}_{k}^{\text{T}}=x_{k}^{\text{T}}\Omega _{k}$ , this shrinks the row vector $x_{k}^{\text{T}}$ by discarding the zero entries. Similarly, the multiplication ${\tilde {Y}}_{k}=Y\Omega _{k}$ is the subset of the examples that are current using the $d_{k}$ atom. The same effect can be seen on ${\tilde {E}}_{k}=E_{k}\Omega _{k}$ .

So the minimization problem as mentioned before becomes

\|E_{k}\Omega _{k}-d_{k}x_{k}^{\text{T}}\Omega _{k}\|_{F}^{2}=\|{\tilde {E}}_{k}-d_{k}{\tilde {x}}_{k}^{\text{T}}\|_{F}^{2}

and can be done by directly using SVD. SVD decomposes ${\tilde {E}}_{k}$ into $U\Delta V^{\text{T}}$ . The solution for $d_{k}$ is the first column of U, the coefficient vector ${\tilde {x}}_{k}^{\text{T}}$ as the first column of $V\times \Delta (1,1)$ . After updating the whole dictionary, the process then turns to iteratively solve X, then iteratively solve D.

Limitations

Choosing an appropriate "dictionary" for a dataset is a non-convex problem, and k-SVD operates by an iterative update which does not guarantee to find the global optimum.^[2] However, this is common to other algorithms for this purpose, and k-SVD works fairly well in practice.^[2]^{[better source needed]}

References

^ Michal Aharon; Michael Elad; Alfred Bruckstein (2006), "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation" (PDF), IEEE Transactions on Signal Processing, 54 (11): 4311–4322, Bibcode:2006ITSP...54.4311A, doi:10.1109/TSP.2006.881199, S2CID 7477309
^ ^a ^b ^c Rubinstein, R., Bruckstein, A.M., and Elad, M. (2010), "Dictionaries for Sparse Representation Modeling", Proceedings of the IEEE, 98 (6): 1045–1057, CiteSeerX 10.1.1.160.527, doi:10.1109/JPROC.2010.2040551, S2CID 2176046{{citation}}: CS1 maint: multiple names: authors list (link)