logoHome
Regex Parser cover

Regex Parser

Regex Parser

Features

  • Implement an epsilon non-deterministic finite state automaton (ε-NFA) as the core component of the regular expression parser.
  • Utilize depth first search to retrieve all states within the ε-closure and construct the transition table for the input regular expression.
  • Incorporate stack-based checking to verify balanced brackets within the input regular expression.

Environment

  • Java Version: JDK 11
  • External Libraries: Test code would be run using JUnit 4

Outline

  1. Read in the regular expression
  2. Check the validity of the input regular expression and throw an exception if invalid
  3. Transform the regular expression from infix to postfix
  4. Build the ε-NFA
  5. Put the testing string into the ε-NFA built from the regex

Building Blocks

ε (epsilon)

Epsilon transition block

Symbol Block

Symbol transition block

Union Block

Union transition block

Concatenation Block

Concatenation transition block

Repetition Block (Kleene Star)

Repetition transition block

One-Or-More Block (Kleene Plus)

One-Or-More transition block

Verbose Mode Example

Regular Expression = (a^+|b)(a^*|c)

Verbose Mode Example

Example Diagram

  1. Omit epsilon transition symbol for simplicity
  2. Turn symbol transition red for readability
Example Diagram

Future Improvement

Currently, the ε-NFA interprets the . character as the concatenation operator and does not allow it to appear as a literal symbol. Handling dot symbols with an escape character can resolve this issue.