Skip to content
Related Articles

Related Articles

Improve Article

Continuous Kernel Convolution

  • Last Updated : 14 Aug, 2021
Geek Week

Continuous Kernel convolution was proposed by the researcher of Verije University Amsterdam in collaboration with the University of Amsterdam in a paper titled ‘CKConv: Continuous Kernel Convolution For Sequential Data‘. The motivation behind that is to propose a model that uses the properties of convolution neural networks and Recurrent Neural networks in order to process a long sequence of image data.

Convolution Operation

Let x ∶R → RNc and ψ ∶ R → RNc be a vector-valued signal and kernel on R, such that x = {xc}Nc and ψ = {ψc} NC c=1. The convolution operation can be defined as:

(x * \psi)(t) = \sum_{c=1}^{N_c} \int_\mathbb{R} x_c(\tau)\psi_c(t-\tau)d\tau

However, practically the input signal x is gather from sampling. Thus, the input signal and convolution can be defined as  

  • Input signal:\chi = {x(\tau )}_{\tau=0}^{N_x}
  • Convolution:\kappa = {\psi(\tau)}_{\tau=0}^{N_x}

and the equation that is centered around t is given by: 

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )\psi_c\left ( t-\tau \right )

Now for continuous kernel convolution, we will use a convolution kernel ψ as continuous function parameterized over a small NN called MLPψ. It takes (t−τ ) as input and outputs the value of the convolution kernel at that position ψ(t−τ ). The continuous kernel can be formulated by:

(\chi * \psi)(t) = \sum_{c=1}^{N_c} \sum_{\tau = 0}^{t}x_c\left ( \tau \right )MLP^{\psi_c}\left ( t-\tau \right )

If the sampling factor is different from training sampling factor, then we can perform convolution operation in following ways: 

(\chi * \psi)(t)_{sr2} \approx  \frac{sr2}{sr1}(\chi * \psi)(t)_{sr1}     

Recurrent Unit

For the input sequence 

\chi = {x(\tau )}_{\tau=0}^{N_x}        . The recurrent unit is given by:

h(\tau) = \sigma(Wh(\tau-1) + U_x(\tau))

\tilde{y} (\tau) = softmax (Vh(\tau))

where U, W, V are the input-to-hidden, hidden-to-hidden and hidden-to-output connections of the unit. h(τ ), y˜(τ ) depict the hidden representation and the output at time-step τ, and σ represents a point-wise non-linearity.

Now, we unroll the above equation for t steps: 

h(t) = W^{t+1}h(-1) + \sum_{\tau =0 } ^{t} W^{\tau} U x(t - \tau)

where h(−1) is the initial state of the hidden representation. h(t) can also be represented in following way: 

x =[x(0), x(1), x(2) .....x(t-1),x(t)] \\ \psi =[U, WU, .... W^{t-1}U, W^{t}U] \\ h(t) = \sum_{\tau =0 }^{t}x(\tau)\psi(t-\tau) + \sum_{\tau =0 }^{t}x(t - \tau)\psi(\tau)

The above equation provides us with following conclusion: 

  • Vanishing gradient and exploding gradient problem in RNN is caused by the term x(t-τ) τ steps back in the past being multiplied with an effective convolution weight ψ(τ )=WτU.
  • Linear recurrent unit can be defined as the convolution of input and exponential convolution functions.

MLP Continuous Kernel

Let {∆ti=(t − τi)}N i=0 be a sequence of relative positions. The convolution kernel MLPψ is parameterized by a conventional L-layer neural network: 


h^{(1)}(\Delta \tau_i ) = \sigma\left ( w^{1} \Delta \tau_i +  b^{(1)} \right ) \\ h^{(1)}(\Delta \tau_i ) = \sigma\left ( W^{l} * h* (l-1) \Delta \tau_i +  b^{(1)} \right ) \\ \psi(\Delta \tau_i) = W^{(L)} * h{(L-1)} (\Delta \tau_i) + b^{(L)}


\sigma         is used to add non-linearity such as ReLU.


  • In this implementation, we will be training the CKconv model on the sMNIST dataset, for this implementation, we will be using colaboratory that is provided to us by Google.


# first, we need to clone the ckconv repository from Github
! git clone
# Now, we need to change the pwd to ckconv directory
cd ckconv
# Before actually train the model,
# we need to first install the required modules and libraries
! pip install -r requirements.txt
# if the above command fails, please make sure that following modules installed using
# command below
pip install ml-collections torchaudio mkl-random sktime wandb
# Now to train the model on sMNIst dataset, run the following commands
! python --config.batch_size=64 --config.clip=0 \
--config.dataset=MNIST --config.device=cuda --config.dropout=0.1 \
--config.dropout_in=0.1 --config.epochs=200 --config.kernelnet_activation_function=Sine \
--config.kernelnet_no_hidden=32 --config.kernelnet_norm_type=LayerNorm \
--config.kernelnet_omega_0=31.09195739463897 --config.model=CKCNN \
--config.no_blocks=2 --config.no_hidden=30 --config.optimizer=Adam \
--config.permuted=False --config.sched_decay_factor=5  --config.sched_patience=20 \
  • Below are the results of the above training of CKCNN on sMNIST data:

CkConv training Result


  • Ckconv is able to very complex and non-linear function easily.
  • Contrary to RNNs, CKConvs do not rely on any form of recurrence for considering large memory horizons and have global long-term dependencies.
  • CKCNNs do not make use of Back-Propagation Through Time(BPTT). Consequently, CKCNNs can be trained in parallel.
  • CKCNNs can also be deployed at resolutions other than the resolution on which it is trained.


Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up
Recommended Articles
Page :