Open In App
Related Articles

# Star Height of Regular Expression and Regular Language

The star height relates to the field of theoretical of computation (TOC). It is used to indicate the structural complexity of regular expressions and regular languages. Here complexity, relates to the maximum nesting depth of Kleene stars present in a regular expression. It may be noted here that a regular language may be represented by regular expressions that are not unique, yet equivalent. These regular expressions may have different star heights depending on their structural complexity(i.e nesting). But star height of a regular language is a unique number and is equal to the least star height of any regular expression representing that language. In this context generalized star height is an appropriate terminology, that defines the minimum nesting depth of Kleene stars to describe the language by means of a generalized regular expression. For example: The language “aba” over the set of alphabets {a, b} can be generated using regular expressions,

(a + b)* ...... (1) Star height = 1
(a* b*)* ...... (2) Star height = 2

But we consider the least star height. Therefore the star height of the regular language “aba” is one. Star height is also defined for regular expressions as the maximum nesting depth of Kleene stars appearing in that expression. In order to state star height, “h” of a regular expression formally, one can write as, h( ) = 0, where is the empty set h( ) = 0, where is the empty string h(t) = 0, where t may be any terminal symbol of an alphabet set h(EF) = max(h(E), h(F)), where E, F denotes regular expressions h(E*) = h(E) + 1 Some examples are:

• h(a*(b a*)*) = 2
• h((a b*) + ((a* a b*)*b)*) = 3
• h(a) = 0

Prof. Eggan tried to give a relationship between the loop complexity of an automaton that accepts a language L, and the star height of the language, L. The star height of a language, L is equal to the minimal loop complexity of an automaton that accepts, L. It can also be stated that the loop complexity of an automata that is both accessible (i.e automata constructed by deleting all non-accessible states and any transitions to or from them) and co-accessible (i.e a state q of an automaton is said to be co-accessible if there is a string s that takes us from q to a marked state) is the minimum of star heights of regular expressions obtained by different possible executions of the state elimination method (or BMC algorithm). It is apparent that a regular language of star height zero can represent only a finite number of regular languages. The generalized star height considers that taking complement of a regular expression would not lead to an increase in star height. This consideration yields interesting results at its disposal. For example – consider the set of alphabets {x, y}. The star height of the regular expression for the regular language “all strings beginning and ending with x”, i.e

h(x(x + y)*x+x) = 1, since only one level of Kleene nesting exists

But the same language can also be represented by the regular expression x ^c x + x, because ^c denotes set of all strings over the input alphabets.

Now, h(x ^c x + x) = 0, as no Kleene nesting present

Therefore the generalized star height of the language is 0 even though its star height is 1.

### Advantages of considering the star height:

Complexity analysis: The star height provides a measure of the complexity of repetition in regular expressions or languages. By considering the star height, we can assess the complexity of operations, such as pattern matching, parsing, and automaton construction, associated with regular expressions or languages. It helps in analyzing the efficiency and feasibility of algorithms and implementations.

Expressive power assessment: The star height provides insights into the expressive power of regular expressions or languages. Higher star heights indicate greater nesting of repetition, allowing for the representation of more complex patterns or languages. By considering the star height, we can understand the expressive capabilities of regular expressions or languages and their ability to represent various language structures.

Design considerations: The star height affects the design considerations for implementing algorithms and data structures for regular expressions or languages. The choice of algorithm or data structure may differ based on the star height. For example, finite automata may be sufficient for regular languages with low star height, while more expressive formalisms like pushdown automata may be needed for regular languages with higher star height.

### Disadvantages of considering the star height:

Limited to regular languages: The concept of star height is only applicable to regular expressions and regular languages. It cannot be directly used for analyzing or comparing the complexity of languages beyond the regular language class, such as context-free or context-sensitive languages. For languages of higher complexity, different measures or techniques need to be employed.

Overemphasis on repetition: The star height focuses specifically on the complexity of repetition in regular expressions or languages. While repetition is a crucial aspect, other language structures, such as nested parentheses, balancing rules, or context-sensitivity, may be equally important in certain applications. The star height may not capture these aspects, leading to a partial understanding of the language complexity.

Computational limitations: Determining the star height of a regular expression or language can be computationally expensive. The star height problem is undecidable in general, meaning that there is no algorithm that can decide the star height for any arbitrary regular expression or language. While there are techniques available for specific cases, the computational complexity can become a limitation, particularly for large or complex regular expressions or languages.

Complexity may not correlate with real-world applications: While the star height provides insights into the complexity of repetition, it may not always correlate directly with real-world applications. The practical complexity of regular expressions or languages can depend on various factors, including the nature of the input data, the specific problem domain, and the efficiency of algorithmic optimizations. The star height alone may not fully capture these aspects.