Skip to content

Regular Expressions

One of the unsung successes in standardization of computer science has been the Regular Expressions (RE), a language for specifying text search strings. This practical language is in every computer language, word processor, and text processing tools like the Unix's grep.

🏆 A regular expression is an algebraic notation for characterizing a set of strings.

They are particularly useful for searching in texts, when we have a pattern to search for and a corpus of texts to search through. A regex search function will search through the corpus returning all teexts that match the pattern. The corpus can be a single document or a collection. For example, the Unix cmd tool grep takes a regex and returns every line of the input document that matches the pattern in regex.

🚨 RegEx comes in many variants. In his article we will be discussing extended regex.

Basic Patterns

References:


  1. "Speech & Language Processing" by Jurafsky et al.; 2021 

  2. https://regex101.com/