Skip to content
Related Articles

Related Articles

Improve Article

Continuous Kernel Convolution

  • Last Updated : 14 Aug, 2021

Continuous Kernel convolution was proposed by the researcher of Verije University Amsterdam in collaboration with the University of Amsterdam in a paper titled ‘CKConv: Continuous Kernel Convolution For Sequential Data‘. The motivation behind that is to propose a model that uses the properties of convolution neural networks and Recurrent Neural networks in order to process a long sequence of image data.

Convolution Operation

Let x ∶R → RNc and ψ ∶ R → RNc be a vector-valued signal and kernel on R, such that x = {xc}Nc and ψ = {ψc} NC c=1. The convolution operation can be defined as:

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

(x * \psi)(t) = \sum_{c=1}^{N_c} \int_\mathbb{R} x_c(\tau)\psi_c(t-\tau)d\tau

However, practically the input signal x is gather from sampling. Thus, the input signal and convolution can be defined as  

  • Input signal:\chi = {x(\tau )}_{\tau=0}^{N_x}
  • Convolution:\kappa = {\psi(\tau)}_{\tau=0}^{N_x}

and the equation that is centered around t is given by: 

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )\psi_c\left ( t-\tau \right )

Now for continuous kernel convolution, we will use a convolution kernel ψ as continuous function parameterized over a small NN called MLPψ. It takes (t−τ ) as input and outputs the value of the convolution kernel at that position ψ(t−τ ). The continuous kernel can be formulated by:

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )MLP^{\psi_c}\left ( t-\tau \right )

If the sampling factor is different from training sampling factor, then we can perform convolution operation in following ways: 

(\chi * \psi)(t)_{sr2} \approx  \frac{sr2}{sr1}(\chi * \psi)(t)_{sr1}     

Recurrent Unit

For the input sequence 

\chi = {x(\tau )}_{\tau=0}^{N_x}        . The recurrent unit is given by:

h(\tau) = \sigma(Wh(\tau-1) + U_x(\tau))

\tilde{y} (\tau) = softmax (Vh(\tau))

where U, W, V are the input-to-hidden, hidden-to-hidden and hidden-to-output connections of the unit. h(τ ), y˜(τ ) depict the hidden representation and the output at time-step τ, and σ represents a point-wise non-linearity.

Now, we unroll the above equation for t steps: 

h(t) = W^{t+1}h(-1) + \sum_{\tau =0 } ^{t} W^{\tau} U x(t - \tau)

where h(−1) is the initial state of the hidden representation. h(t) can also be represented in following way: 

x =[x(0), x(1), x(2) .....x(t-1),x(t)] \\ \psi =[U, WU, .... W^{t-1}U, W^{t}U] \\ h(t) = \sum_{\tau =0 }^{t}x(\tau)\psi(t-\tau) + \sum_{\tau =0 }^{t}x(t - \tau)\psi(\tau)

The above equation provides us with following conclusion: 

  • Vanishing gradient and exploding gradient problem in RNN is caused by the term x(t-τ) τ steps back in the past being multiplied with an effective convolution weight ψ(τ )=WτU.
  • Linear recurrent unit can be defined as the convolution of input and exponential convolution functions.

MLP Continuous Kernel

Let {∆ti=(t − τi)}N i=0 be a sequence of relative positions. The convolution kernel MLPψ is parameterized by a conventional L-layer neural network: 


h^{(1)}(\Delta \tau_i ) = \sigma\left ( w^{1} \Delta \tau_i +  b^{(1)} \right ) \\ h^{(1)}(\Delta \tau_i ) = \sigma\left ( W^{l} * h* (l-1) \Delta \tau_i +  b^{(1)} \right ) \\ \psi(\Delta \tau_i) = W^{(L)} * h{(L-1)} (\Delta \tau_i) + b^{(L)}


\sigma         is used to add non-linearity such as ReLU.


  • In this implementation, we will be training the CKconv model on the sMNIST dataset, for this implementation, we will be using colaboratory that is provided to us by Google.


# first, we need to clone the ckconv repository from Github
! git clone
# Now, we need to change the pwd to ckconv directory
cd ckconv
# Before actually train the model,
# we need to first install the required modules and libraries
! pip install -r requirements.txt
# if the above command fails, please make sure that following modules installed using
# command below
pip install ml-collections torchaudio mkl-random sktime wandb
# Now to train the model on sMNIst dataset, run the following commands
! python --config.batch_size=64 --config.clip=0 \
--config.dataset=MNIST --config.device=cuda --config.dropout=0.1 \
--config.dropout_in=0.1 --config.epochs=200 --config.kernelnet_activation_function=Sine \
--config.kernelnet_no_hidden=32 --config.kernelnet_norm_type=LayerNorm \
--config.kernelnet_omega_0=31.09195739463897 --config.model=CKCNN \
--config.no_blocks=2 --config.no_hidden=30 --config.optimizer=Adam \
--config.permuted=False --config.sched_decay_factor=5  --config.sched_patience=20 \
  • Below are the results of the above training of CKCNN on sMNIST data:

CkConv training Result


  • Ckconv is able to very complex and non-linear function easily.
  • Contrary to RNNs, CKConvs do not rely on any form of recurrence for considering large memory horizons and have global long-term dependencies.
  • CKCNNs do not make use of Back-Propagation Through Time(BPTT). Consequently, CKCNNs can be trained in parallel.
  • CKCNNs can also be deployed at resolutions other than the resolution on which it is trained.


My Personal Notes arrow_drop_up
Recommended Articles
Page :