# Cocke–Younger–Kasami (CYK) Algorithm

Grammar denotes the syntactical rules for conversation in natural language. But in the theory of formal language, grammar is defined as a set of rules that can generate strings. The set of all strings that can be generated from a grammar is called the language of the grammar.

Context Free Grammar:
We are given a Context Free Grammar G = (V, X, R, S) and a string w, where:

• V is a finite set of variables or non-terminal symbols,
• X is a finite set of terminal symbols,
• R is a finite set of rules,
• S is the start symbol, a distinct element of V, and
• V and X are assumed to be disjoint sets.

The Membership problem is defined as: Grammar G generates a language L(G). Is the given string a member of L(G)?

Chomsky Normal Form:
A Context Free Grammar G is in Chomsky Normal Form (CNF) if each rule if each rule of G is of the form:

• A –> BC,        [ with at most two non-terminal symbols on the RHS ]
• A –> a, or      [ one terminal symbol on the RHS ]
• S –> nullstring,             [ null string ]

Cocke-Younger-Kasami Algorithm
It is used to solves the membership problem using a dynamic programming approach. The algorithm is based on the principle that the solution to problem [i, j] can constructed from solution to subproblem [i, k] and solution to sub problem [k, j]. The algorithm requires the Grammar G to be in Chomsky Normal Form (CNF). Note that any Context-Free Grammar can be systematically converted to CNF. This restriction is employed so that each problem can only be divided into two subproblems and not more – to bound the time complexity.

How does the CYK Algorithm work?

For a string of length N, construct a table T of size N x N. Each cell in the table T[i, j] is the set of all constituents that can produce the substring spanning from position i to j. The process involves filling the table with the solutions to the subproblems encountered in the bottom-up parsing process. Therefore, cells will be filled from left to right and bottom to top.

In T[i, j], the row number i denotes the start index and the column number j denotes the end index. The algorithm considers every possible subsequence of letters and adds K to T[i, j] if the sequence of letters starting from i to j can be generated from the non-terminal K.  For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks if there is a rule of the form A ? BC in the grammar where B and C can generate the two parts respectively, based on already existing entries in T. The sentence can be produced by the grammar only if the entire string is matched by the start symbol, i.e, if S is a member of T[1, n].

Consider a sample grammar in Chomsky Normal Form:

NP   -->  Det | Nom
Nom  -->  AP | Nom
Det  -->  a | an
AP   -->  heavy | orange | tall
A   -->  heavy | orange | tall | muscular
Nom -->  book | orange | man


Now consider the phrase, “a very heavy orange book“:

a(1) very(2) heavy (3) orange(4) book(5)


Let us start filling up the table from left to right and bottom to top, according to the rules described above:

The table is filled in the following manner:

1. T[1, 1] = {Det} as Det –> a is one of the rules of the grammar.
2. T[2, 2] = {Adv} as Adv –> very is one of the rules of the grammar.
3. T[1, 2] = {} as no matching rule is observed.
4. T[3, 3] = {A, AP} as A –> very and AP –> very are rules of the grammar.
5. T[2, 3] = {AP} as AP –> Adv (T[2, 2]) A (T[3, 3]) is a rule of the grammar.
6. T[1, 3] = {} as no matching rule is observed.
7. T[4, 4] = {Nom, A, AP} as Nom –> orange and A –> orange and AP –> orange are rules of the grammar.
8. T[3, 4] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[3, 4]) is a rule of the grammar.
9. T[2, 4] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 4]) is a rule of the grammar.
10. T[1, 4] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 4]) is a rule of the grammar.
11. T[5, 5] = {Nom} as Nom –> book is a rule of the grammar.
12. T[4, 5] = {Nom} as Nom –> AP (T[4, 4]) Nom (T[5, 5]) is a rule of the grammar.
13. T[3, 5] = {Nom} as Nom –> AP (T[3, 3]) Nom (T[4, 5]) is a rule of the grammar.
14. T[2, 5] = {Nom} as Nom –> AP (T[2, 3]) Nom (T[4, 5]) is a rule of the grammar.
15. T[1, 5] = {NP} as NP –> Det (T[1, 1]) Nom (T[2, 5]) is a rule of the grammar.

We see that T has NP, the start symbol, which means that this phrase is a member of the language of the grammar G

The parse tree of this phrase would look like this: Let us look at another example phrase, “a very tall extremely muscular man”:

a(1) very(2) tall(3) extremely(4) muscular(5) man(6)


We will now use the CYK algorithm to find if this string is a member of the grammar G:

We see that T has NP, the start symbol, which means that this phrase is a member of the language of the grammar G.

Below is the implementation of the above algorithm:

## Python3

 # Python implementation for the  # CYK Algorithm     # Non-terminal symbols  non_terminals = ["NP", "Nom", "Det", "AP",                     "Adv", "A"]  terminals = ["book", "orange", "man",                "tall", "heavy",                "very", "muscular"]     # Rules of the grammar  R = {       "NP": [["Det", "Nom"]],       "Nom": [["AP", "Nom"], ["book"],                ["orange"], ["man"]],       "AP": [["Adv", "A"], ["heavy"],               ["orange"], ["tall"]],       "Det": [["a"]],       "Adv": [["very"], ["extremely"]],       "A": [["heavy"], ["orange"], ["tall"],              ["muscular"]]      }     # Function to perform the CYK Algorithm  def cykParse(w):      n = len(w)             # Initialize the table      T = [[set([]) for j in range(n)] for i in range(n)]         # Filling in the table      for j in range(0, n):             # Iterate over the rules          for lhs, rule in R.items():              for rhs in rule:                                     # If a terminal is found                  if len(rhs) == 1 and \                  rhs == w[j]:                      T[j][j].add(lhs)             for i in range(j, -1, -1):                                 # Iterate over the range i to j + 1                 for k in range(i, j + 1):                          # Iterate over the rules                  for lhs, rule in R.items():                      for rhs in rule:                                                     # If a terminal is found                          if len(rhs) == 2 and \                          rhs in T[i][k] and \                          rhs in T[k + 1][j]:                              T[i][j].add(lhs)         # If word can be formed by rules       # of given grammar      if len(T[n-1]) != 0:          print("True")      else:          print("False")         # Driver Code     # Given string  w = "a very heavy orange book".split()     # Function Call  cykParse(w)

Output:

True


Time Complexity: O(N3)
Auxiliary Space:O(N2)

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.