Skip to content
Related Articles

Related Articles

Deep parametric Continuous Convolutional Neural Network

Improve Article
Save Article
  • Last Updated : 15 Sep, 2021
Improve Article
Save Article

Deep Parametric Continuous Kernel convolution was proposed by researchers at Uber Advanced Technologies Group. The motivation behind this paper is that the simple CNN architecture assumes a grid-like architecture and uses discrete convolution as its fundamental block. This inhibits their ability to perform accurate convolution to many real-world applications. Therefore, they propose a convolution method called Parametric Continuous Convolution.

Parametric Continuous Convolution: 

Parametric Continuous Convolution is a learnable operator that operates over non-grid structured data and explores parameterized kernels that span over full continuous vector space. It can handle arbitrary data structures as far as the support structure is computable. The continuous convolution operator is approximated to a discrete  by Monte Carlo sampling:

h(x) =  \int_{-\infty}^{\infty}f(y)g(x-y) \approx \sum_{i}^{N} \frac{1}{N}f(y_i)g(x - y_i )

The next challenge is to define g, which is parameterized in such a way that each point in the support domain is assigned a value. This is impossible since it requires g to be defined over infinite points of a continuous domain.

Grid vs Continuous conv

Instead, the authors use multi-layer perceptron as an approximate parametric continuous convolution function because they are expressive and able to approximate the continuous functions. 

g(z,\theta) = MLP(z, \theta)

The kernel g(z,∅ ): RD→ R spans over full continuous support domains while remaining parameterized by a finite number of computations 

Parametric continuous Convolution Layer: 

The Parametric continuous convolution layer has 3 parts:

  • Input Feature Vector F = \left \{ f_{in,j} \in \mathbb{R}^{F} \right \}
  • Associated Location in Support domain S = {y_j}
  • Output domain location O = {x_i}

For each layer, we first evaluate the kernel function: 

g_{d,k}\left ( y_i - x_j ; \theta \right ) \forall x_j \in S; \, and \, y_i \in O     ; given parameter \theta     . Each element of the output vector can be calculated as:

h_{k,i} = \sum_{d}^{F}\sum_{j}^{N} g_{d,k} (y_i- x_j)f_{d,j}

where, N be the number of input points, M be the number of output points, and D the dimensionality of the support domain and F and O be predefined input and output feature dimensions respectively. Here, we can observe the following difference from discrete convolution: 

  • The kernel function is a continuous function given the relative position in the support domain.
  • The (input, output) points could be any points in the continuous domain as well and can be different.


The network takes the input feature and their associated position in the support domain as input. Following standard CNN architecture, we can add batch normalization, non-linearities, and the residual connection between layers which was critical to helping convergence. Pooling can be employed over the support domain to aggregate information. 

Deep Para CKConv architecture

Locality Enforcing Convolution

The standard convolution computed over a limited kernel size M to enforce locality in the discrete scenarios.  However, the continuous function can enforce locality by computing the function that finds the points closer to x.

g(z,\theta) = MLP(z, \theta)w(\theta)

Where, w() is a modulating window function to enforce locality. It uses the k-Nearest Neighbor in its algorithm. 


Since, all the building blocks of the model can be differentiable within their domain, so, we can write the backpropagation function as:

\frac{\partial h}{\partial \theta} = \frac{\partial h}{\partial g} \cdot \frac{\partial g}{\partial \theta} =  \sum_{d}^{F} \sum_{j}^{N} f\cdot d_j \frac{\partial g}{\partial \theta}



My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!