1 of 1

3.4. Regular Expression

How to use regular expressions for matching in Natex.

Regular expressions provide powerful ways to match strings and beyond:

Chapter 2.1: Regular Expressions, Chapter 2.1, Speech and Language Processing (3rd ed.), Jurafsky and Martin.
Regular Expression HOWTO, Python Documentation

Syntax

Grouping

Syntax

Description

Repetitions

Syntax

Description

Non-greedy

Special Characters

Syntax

Description

Functions

Several functions are provided in Python to match regular expressions.

match()

Let us create a regular expression that matches "Mr." and "Ms.":

#1: imports the .
#3: the regular expression into the RE_MR.

A regular expression is represented by r'expression' where the expression is in a string preceded by the special character r.

The above code prints None, indicating that the value of m is None, because the regular expression does not match the string.

#1: since RE_MR matches the string, m is a match object.
#3: true since m is a match object.

Currently, no are specified in RE_MR:

#1: returns an empty ().

What are the differences between a list and a tuple in Python?

It is possible to specific patterns using parentheses:

#1: there are two groups in this regular expression, (M[rs]) and (\.).
#3: returns a of matched substrings ('Ms', '.') for the two groups in #1.

The above RE_MR matches "Mr." and "Ms." but not "Mrs." Modify it to match all of them (Hint: use a non-capturing group and |).

The non-capturing group (?:[rs]|rs) matches "r", "s", or "rs" such that the first group matches "Mr", "Ms", and "Mrs", respectively.

Since we use the non-capturing group, the following code still prints a tuple of two strings:

search()

Let us match the following strings with RE_MR:

#4: matches "Mr." but not "Ms."
#5: matches neither "Mr." nor "Mrs."

For s1, only "Mr." is matched because match() stops matching after finding the first pattern. For s2 on the other hand, even "Mr." is not matched because match() requires the pattern to be at the beginning of the string.

To match a pattern anywhere in the string, we need to for the pattern instead:

search() returns a match object as match() does.

findall()

search() still does not return the second substrings, "Ms." and "Mrs.". The following shows how to substrings that match the pattern:

findall() returns a list of tuples where each tuple represents a group of matched results.

finditer()

Since findall() returns a list of tuples instead of match objects, there is no definite way of locating the matched results in the original string. To return match objects instead, we need to the pattern:

#1: finditer() returns an that keeps matching the pattern until it no longer finds.

You can use a to store the match objects as a list:

#1: returns a list of all m (in order) matched by finditer().

How is the code above different from the one below?

What are the advantages of using a list comprehension over a for-loop other than it makes the code shorter?

Write regular expressions to match the following cases:

Abbreviation: Dr., U.S.A.
Apostrophe: '80, '90s

Natex Integration

The nesting example in has a condition as follows (#4):

Write a regular expression that matches the above condition.

It is possible to use regular expressions for matching in Natex. A regular expression is represented by forward slashes (/../):

#4: true if the entire input matches the regular expression.

You can put the expression in a sequence to allow it a partial match:

#4: the regular expression is put in a sequence [].

When used in Natex, all literals in the regular expression (e.g., "so", "good" in #4) must be lowercase because Natex matches everything in lowercase. The design choice is made because users tend not to follow typical capitalization in a chat interface, whether it is text- or audio-based.

Variable

It is possible to store the matched results of a regular expression to variables. A variable in a regular expression is represented by angle brackets (<..>) inside a capturing group ((?..)).

The following transitions take the user name and respond with the stored first and last name:

#4: matches the first name and the last name in order and stores them in the variables FIRSTNAME and LASTNAME.
#5: uses FIRSTNAME and LASTNAME in the response.

3.4. Regular Expression

How to use regular expressions for matching in Natex.

Regular expressions provide powerful ways to match strings and beyond:

Chapter 2.1: Regular Expressions, Chapter 2.1, Speech and Language Processing (3rd ed.), Jurafsky and Martin.
Regular Expression HOWTO, Python Documentation

Syntax

Grouping

Syntax

Description

Repetitions

Syntax

Description

Non-greedy

Special Characters

Syntax

Description

Functions

Several functions are provided in Python to match regular expressions.

match()

Let us create a regular expression that matches "Mr." and "Ms.":

#1: imports the .
#3: the regular expression into the RE_MR.

A regular expression is represented by r'expression' where the expression is in a string preceded by the special character r.

The above code prints None, indicating that the value of m is None, because the regular expression does not match the string.

#1: since RE_MR matches the string, m is a match object.
#3: true since m is a match object.

Currently, no are specified in RE_MR:

#1: returns an empty ().

What are the differences between a list and a tuple in Python?

It is possible to specific patterns using parentheses:

#1: there are two groups in this regular expression, (M[rs]) and (\.).
#3: returns a of matched substrings ('Ms', '.') for the two groups in #1.

The above RE_MR matches "Mr." and "Ms." but not "Mrs." Modify it to match all of them (Hint: use a non-capturing group and |).

The non-capturing group (?:[rs]|rs) matches "r", "s", or "rs" such that the first group matches "Mr", "Ms", and "Mrs", respectively.

Since we use the non-capturing group, the following code still prints a tuple of two strings:

search()

Let us match the following strings with RE_MR:

#4: matches "Mr." but not "Ms."
#5: matches neither "Mr." nor "Mrs."

To match a pattern anywhere in the string, we need to for the pattern instead:

search() returns a match object as match() does.

findall()

search() still does not return the second substrings, "Ms." and "Mrs.". The following shows how to substrings that match the pattern:

findall() returns a list of tuples where each tuple represents a group of matched results.

finditer()

#1: finditer() returns an that keeps matching the pattern until it no longer finds.

You can use a to store the match objects as a list:

#1: returns a list of all m (in order) matched by finditer().

How is the code above different from the one below?

What are the advantages of using a list comprehension over a for-loop other than it makes the code shorter?

Write regular expressions to match the following cases:

Abbreviation: Dr., U.S.A.
Apostrophe: '80, '90s

Natex Integration

The nesting example in has a condition as follows (#4):

Write a regular expression that matches the above condition.

It is possible to use regular expressions for matching in Natex. A regular expression is represented by forward slashes (/../):

#4: true if the entire input matches the regular expression.

You can put the expression in a sequence to allow it a partial match:

#4: the regular expression is put in a sequence [].

Variable

It is possible to store the matched results of a regular expression to variables. A variable in a regular expression is represented by angle brackets (<..>) inside a capturing group ((?..)).

The following transitions take the user name and respond with the stored first and last name:

#4: matches the first name and the last name in order and stores them in the variables FIRSTNAME and LASTNAME.
#5: uses FIRSTNAME and LASTNAME in the response.

3.4. Regular Expression

hashtagSyntax

hashtagGrouping

hashtagRepetitions

hashtagSpecial Characters

hashtagFunctions

hashtagmatch()

hashtagsearch()

hashtagfindall()

hashtagfinditer()

hashtagNatex Integration

hashtagVariable

3.4. Regular Expression

hashtagSyntax

hashtagGrouping

hashtagRepetitions

hashtagSpecial Characters

hashtagFunctions

hashtagmatch()

hashtagsearch()

hashtagfindall()

hashtagfinditer()

hashtagNatex Integration

hashtagVariable

Syntax

Grouping

Repetitions

Special Characters

Functions

match()

search()

findall()

finditer()

Natex Integration

Variable

Syntax

Grouping

Repetitions

Special Characters

Functions

match()

search()

findall()

finditer()

Natex Integration

Variable