Grammar denotes the syntactical rules for conversation in natural language. But in the theory of formal language, grammar is defined as a set of rules that can generate strings. The set of all strings that can be generated from a grammar is called the language of the grammar.
Context Free Grammar:
We are given a Context Free Grammar G = (V, X, R, S) and a string w, where:
 V is a finite set of variables or nonterminal symbols,
 X is a finite set of terminal symbols,
 R is a finite set of rules,
 S is the start symbol, a distinct element of V, and
 V and X are assumed to be disjoint sets.
The Membership problem is defined as: Grammar G generates a language L(G). Is the given string a member of L(G)?
Chomsky Normal Form:
A Context Free Grammar G is in Chomsky Normal Form (CNF) if each rule if each rule of G is of the form:
 A –> BC, [ with at most two nonterminal symbols on the RHS ]
 A –> a, or [ one terminal symbol on the RHS ]
 S –> nullstring, [ null string ]
CockeYoungerKasami Algorithm
It is used to solves the membership problem using a dynamic programming approach. The algorithm is based on the principle that the solution to problem [i, j] can constructed from solution to subproblem [i, k] and solution to sub problem [k, j]. The algorithm requires the Grammar G to be in Chomsky Normal Form (CNF). Note that any ContextFree Grammar can be systematically converted to CNF. This restriction is employed so that each problem can only be divided into two subproblems and not more – to bound the time complexity.
How does the CYK Algorithm work?
For a string of length N, construct a table T of size N x N. Each cell in the table T[i, j] is the set of all constituents that can produce the substring spanning from position i to j. The process involves filling the table with the solutions to the subproblems encountered in the bottomup parsing process. Therefore, cells will be filled from left to right and bottom to top.
1 
2 
3 
4 
5 


1  [1, 1]  [1, 2]  [1, 3]  [1, 4]  [1, 5] 
2  [2, 2]  [2, 3]  [2, 4]  [2, 5]  
3  [3, 3]  [3, 4]  [3, 5]  
4  [4, 4]  [4, 5]  
5  [5, 5] 
In T[i, j], the row number i denotes the start index and the column number j denotes the end index.
The algorithm considers every possible subsequence of letters and adds K to T[i, j] if the sequence of letters starting from i to j can be generated from the nonterminal K. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks if there is a rule of the form A ? BC in the grammar where B and C can generate the two parts respectively, based on already existing entries in T. The sentence can be produced by the grammar only if the entire string is matched by the start symbol, i.e, if S is a member of T[1, n].
Consider a sample grammar in Chomsky Normal Form:
NP > Det  Nom Nom > AP  Nom AP > Adv  A Det > a  an Adv > very  extremely AP > heavy  orange  tall A > heavy  orange  tall  muscular Nom > book  orange  man
Now consider the phrase, “a very heavy orange book“:
a(1) very(2) heavy (3) orange(4) book(5)
Let us start filling up the table from left to right and bottom to top, according to the rules described above:
1 
2 
3 
4 
5 


1 
Det 
– 
– 
NP 
NP 
2 
Adv 
AP 
Nom 
Nom 

3 
A, AP 
Nom 
Nom 

4 
Nom, A, AP 
Nom 

5 
Nom 
The table is filled in the following manner:
 T[1, 1] = {Det} as Det –> a is one of the rules of the grammar.
 T[2, 2] = {Adv} as Adv –> very is one of the rules of the grammar.
 T[1, 2] = {} as no matching rule is observed.
 T[3, 3] = {A, AP} as A –> very and AP –> very are rules of the grammar.
 T[2, 3] = {AP} as AP –> Adv (T[2, 2]) A (T[3, 3]) is a rule of the grammar.
 T[1, 3] = {} as no matching rule is observed.
 T[4, 4] = {Nom, A, AP} as Nom –> orange and A –> orange and AP –> orange are rules of the grammar.
 T[3, 4] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[3, 4]) is a rule of the grammar.
 T[2, 4] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 4]) is a rule of the grammar.
 T[1, 4] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 4]) is a rule of the grammar.
 T[5, 5] = {Nom} as Nom –> book is a rule of the grammar.
 T[4, 5] = {Nom} as Nom –> AP (T[4, 4]) Nom (T[5, 5]) is a rule of the grammar.
 T[3, 5] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[4, 5]) is a rule of the grammar.
 T[2, 5] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 5]) is a rule of the grammar.
 T[1, 5] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 5]) is a rule of the grammar.
We see that T[1][5] has NP, the start symbol, which means that this phrase is a member of the language of the grammar G.
The parse tree of this phrase would look like this:
Let us look at another example phrase, “a very tall extremely muscular man”:
a(1) very(2) tall(3) extremely(4) muscular(5) man(6)
We will now use the CYK algorithm to find if this string is a member of the grammar G:
1 
2 
3 
4 
5 
6 


1 
Det 
– 
– 
– 
– 
NP 
2 
Adv 
AP 
– 
– 
Nom  
3 
AP, A 
– 
– 
Nom 

4 
Adv 
AP 
Nom 

5 
A 
– 

6 
Nom 
We see that T[1][6] has NP, the start symbol, which means that this phrase is a member of the language of the grammar G.
Below is the implementation of the above algorithm:
Python3
# Python implementation for the # CYK Algorithm # Nonterminal symbols non_terminals = [ "NP" , "Nom" , "Det" , "AP" , "Adv" , "A" ] terminals = [ "book" , "orange" , "man" , "tall" , "heavy" , "very" , "muscular" ] # Rules of the grammar R = { "NP" : [[ "Det" , "Nom" ]], "Nom" : [[ "AP" , "Nom" ], [ "book" ], [ "orange" ], [ "man" ]], "AP" : [[ "Adv" , "A" ], [ "heavy" ], [ "orange" ], [ "tall" ]], "Det" : [[ "a" ]], "Adv" : [[ "very" ], [ "extremely" ]], "A" : [[ "heavy" ], [ "orange" ], [ "tall" ], [ "muscular" ]] } # Function to perform the CYK Algorithm def cykParse(w): n = len (w) # Initialize the table T = [[ set ([]) for j in range (n)] for i in range (n)] # Filling in the table for j in range ( 0 , n): # Iterate over the rules for lhs, rule in R.items(): for rhs in rule: # If a terminal is found if len (rhs) = = 1 and \ rhs[ 0 ] = = w[j]: T[j][j].add(lhs) for i in range (j,  1 ,  1 ): # Iterate over the range i to j + 1 for k in range (i, j + 1 ): # Iterate over the rules for lhs, rule in R.items(): for rhs in rule: # If a terminal is found if len (rhs) = = 2 and \ rhs[ 0 ] in T[i][k] and \ rhs[ 1 ] in T[k + 1 ][j]: T[i][j].add(lhs) # If word can be formed by rules # of given grammar if len (T[ 0 ][n  1 ]) ! = 0 : print ( "True" ) else : print ( "False" ) # Driver Code # Given string w = "a very heavy orange book" .split() # Function Call cykParse(w) 
True
Time Complexity: O(N^{3})
Auxiliary Space:O(N^{2})
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a studentfriendly price and become industry ready.
Recommended Posts:
 Floyd Warshall Algorithm  DP16
 Bellman–Ford Algorithm  DP23
 Maximum Subarray Sum using Divide and Conquer algorithm
 A* Search Algorithm
 DDA Line generation Algorithm in Computer Graphics
 Line Clipping  Set 1 (Cohen–Sutherland Algorithm)
 Bresenham’s Line Generation Algorithm
 MidPoint Line Generation Algorithm
 Point Clipping Algorithm in Computer Graphics
 Commonly Asked Algorithm Interview Questions  Set 1
 Antialiased Line  Xiaolin Wu's algorithm
 MidPoint Circle Drawing Algorithm
 Exact Cover Problem and Algorithm X  Set 1
 Exact Cover Problem and Algorithm X  Set 2 (Implementation with DLX)
 Shortest Path Faster Algorithm
 Peterson's Algorithm in Process Synchronization
 Program for SSTF disk scheduling algorithm
 Dekker's algorithm in Process Synchronization
 Extended Mo's Algorithm with ≈ O(1) time complexity
 Boundary Fill Algorithm
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.