Open In App

Continuous Kernel Convolution

Improve
Improve
Like Article
Like
Save
Share
Report

Continuous Kernel convolution was proposed by the researcher of Verije University Amsterdam in collaboration with the University of Amsterdam in a paper titled ‘CKConv: Continuous Kernel Convolution For Sequential Data‘. The motivation behind that is to propose a model that uses the properties of convolution neural networks and Recurrent Neural networks in order to process a long sequence of image data.

Continuous Kernel Convolution (CKC) is a type of convolution operation used in deep learning for processing continuous input signals such as audio, speech, or sensor data. Unlike the standard discrete convolution operation used for processing digital signals, CKC uses a continuous kernel that can smoothly interpolate between different points in the input signal.

  1. In CKC, the continuous kernel is defined as a function of time, which can be represented using a Gaussian or a polynomial function. The kernel is then convolved with the input signal at each time step, producing an output signal that is also continuous.
  2. One of the advantages of CKC over standard discrete convolution is that it can capture fine-grained temporal information in the input signal, allowing the network to learn more precise patterns and features. CKC can also be used to interpolate and upsample signals, which is useful for tasks such as audio generation or speech synthesis.
  3. However, CKC can be computationally expensive and requires specialized hardware and software for efficient implementation. It also requires careful tuning of the kernel parameters to achieve good performance.

CKC has been used in various applications such as speech recognition, audio synthesis, and sensor data processing. It has also been extended to other types of continuous input signals such as video and 3D point clouds.

References:

  1. Binkowski, M., Donahue, J., & Simonyan, K. (2019). High-Resolution Image Synthesis and Semantic Manipulation with Conditional Continuous Convolutional
  2. Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6728-6737).
  3. Kim, Y., & Lee, H. (2021). Continuous Kernel Convolutional Neural Networks for Audio Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 459-471.
  4. Kondor, R., & Trivedi, S. (2018). On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Advances in neural information processing systems (pp. 8533-8543).

Advantages:

  1. Continuous kernel convolution can capture fine-grained temporal information in the input signal, allowing the network to learn more precise patterns and features.
  2. It can be used to interpolate and upsample signals, which is useful for tasks such as audio generation or speech synthesis.
  3. It can handle continuous signals such as audio, speech, or sensor data more effectively than discrete convolution.
  4. CKC can also be used to model variable-length input sequences, making it suitable for tasks such as speech recognition or natural language processing.


Disadvantages:

  1. CKC is computationally expensive and requires specialized hardware and software for efficient implementation.
  2. It requires careful tuning of the kernel parameters to achieve good performance.
  3. CKC is not suitable for processing discrete or categorical input data such as images or text.
  4. It is a relatively new and less explored technique in deep learning, which means that there is less research and practical experience available for practitioners compared to traditional convolutional neural networks.

Convolution Operation

Let x ∶R → RNc and ψ ∶ R → RNc be a vector-valued signal and kernel on R, such that x = {xc}Nc and ψ = {ψc} NC c=1. The convolution operation can be defined as:

(x * \psi)(t) = \sum_{c=1}^{N_c} \int_\mathbb{R} x_c(\tau)\psi_c(t-\tau)d\tau

However, practically the input signal x is gather from sampling. Thus, the input signal and convolution can be defined as  

  • Input signal:\chi = {x(\tau )}_{\tau=0}^{N_x}
  • Convolution:\kappa = {\psi(\tau)}_{\tau=0}^{N_x}

and the equation that is centered around t is given by: 

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )\psi_c\left ( t-\tau \right )

Now for continuous kernel convolution, we will use a convolution kernel ψ as continuous function parameterized over a small NN called MLPψ. It takes (t−τ ) as input and outputs the value of the convolution kernel at that position ψ(t−τ ). The continuous kernel can be formulated by:

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )MLP^{\psi_c}\left ( t-\tau \right )

If the sampling factor is different from training sampling factor, then we can perform convolution operation in following ways: 

(\chi * \psi)(t)_{sr2} \approx  \frac{sr2}{sr1}(\chi * \psi)(t)_{sr1}       

Recurrent Unit

For the input sequence 

\chi = {x(\tau )}_{\tau=0}^{N_x}          . The recurrent unit is given by:

h(\tau) = \sigma(Wh(\tau-1) + U_x(\tau))

\tilde{y} (\tau) = softmax (Vh(\tau))

where U, W, V are the input-to-hidden, hidden-to-hidden and hidden-to-output connections of the unit. h(τ ), y˜(τ ) depict the hidden representation and the output at time-step τ, and σ represents a point-wise non-linearity.

Now, we unroll the above equation for t steps: 

h(t) = W^{t+1}h(-1) + \sum_{\tau =0 } ^{t} W^{\tau} U x(t - \tau)

where h(−1) is the initial state of the hidden representation. h(t) can also be represented in following way: 

x =[x(0), x(1), x(2) .....x(t-1),x(t)] \\ \psi =[U, WU, .... W^{t-1}U, W^{t}U] \\ h(t) = \sum_{\tau =0 }^{t}x(\tau)\psi(t-\tau) + \sum_{\tau =0 }^{t}x(t - \tau)\psi(\tau)

The above equation provides us with following conclusion: 

  • Vanishing gradient and exploding gradient problem in RNN is caused by the term x(t-Ï„) Ï„ steps back in the past being multiplied with an effective convolution weight ψ(Ï„ )=WÏ„U.
  • Linear recurrent unit can be defined as the convolution of input and exponential convolution functions.

MLP Continuous Kernel

Let {∆ti=(t − Ï„i)}N i=0 be a sequence of relative positions. The convolution kernel MLPψ is parameterized by a conventional L-layer neural network: 

MLP

h^{(1)}(\Delta \tau_i ) = \sigma\left ( w^{1} \Delta \tau_i +  b^{(1)} \right ) \\ h^{(1)}(\Delta \tau_i ) = \sigma\left ( W^{l} * h* (l-1) \Delta \tau_i +  b^{(1)} \right ) \\ \psi(\Delta \tau_i) = W^{(L)} * h{(L-1)} (\Delta \tau_i) + b^{(L)}

where, 

\sigma           is used to add non-linearity such as ReLU.

Implementation

  • In this implementation, we will be training the CKconv model on the sMNIST dataset, for this implementation, we will be using colaboratory that is provided to us by Google.

Python3

# first, we need to clone the ckconv repository from Github
! git clone https://github.com/dwromero/ckconv
 
# Now, we need to change the pwd to ckconv directory
cd ckconv
 
# Before actually train the model,
# we need to first install the required modules and libraries
! pip install -r requirements.txt
 
# if the above command fails, please make sure that following modules installed using
# command below
pip install ml-collections torchaudio mkl-random sktime wandb
 
# Now to train the model on sMNIst dataset, run the following commands
! python run_experiment.py --config.batch_size=64 --config.clip=0 \
--config.dataset=MNIST --config.device=cuda --config.dropout=0.1 \
--config.dropout_in=0.1 --config.epochs=200 --config.kernelnet_activation_function=Sine \
--config.kernelnet_no_hidden=32 --config.kernelnet_norm_type=LayerNorm \
--config.kernelnet_omega_0=31.09195739463897 --config.lr=0.001 --config.model=CKCNN \
--config.no_blocks=2 --config.no_hidden=30 --config.optimizer=Adam \
--config.permuted=False --config.sched_decay_factor=5  --config.sched_patience=20 \
--config.scheduler=plateau

                    
  • Below are the results of the above training of CKCNN on sMNIST data:

CkConv training Result

Conclusion

  • Ckconv is able to very complex and non-linear function easily.
  • Contrary to RNNs, CKConvs do not rely on any form of recurrence for considering large memory horizons and have global long-term dependencies.
  • CKCNNs do not make use of Back-Propagation Through Time(BPTT). Consequently, CKCNNs can be trained in parallel.
  • CKCNNs can also be deployed at resolutions other than the resolution on which it is trained.

References:



Last Updated : 14 Apr, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads