Available with Data Reviewer license.
With the Regular Expression check, you can check both the values and formats of string values. String fields use alphanumeric strings as their values. They include fields that contain the feature's name, measurements (height, length, width, and area), z-values, and metadata such as a feature's creation date. For example, if you want to find records that have incorrect Social Security Number format, you could type "\b[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]\b" in the SSN field. The check will return records that have values such as 123456789, 123-ab-4567, 1123-34-12345, and 123-4567.
The Regular Expression check can also be used to check the text strings based on ranges of values. To format these values, a variety of parameters can be defined. These parameters consist of metacharacters and abbreviations. The metacharacters help you add complexity to your query, while the abbreviations provide shortcuts you can use to include specific types of values in the query.
Learn more about the Regular Expression check
The metacharacters described below are operators that can be used in the query to determine what is to be matched using the Regular Expression check.
Metacharacter | Description | Example |
---|---|---|
. | Matches individual characters. | x.y.z matches a string such as x1y0z or xaybz. |
[ ] | Contains individual characters and value ranges to be matched. | [xyz] matches strings that contain x, y, or z. |
^ | Matches starting input when it is at the beginning of the expression. When inside brackets and followed by characters, it negates the characters that follow. | [^abc] matches strings that do not contain any combination of a, b, and c. Strings that would match include bat and bar, but not cab. ^[xyz] matches strings that start with x, y, or z. |
- | Indicates a range of values to be matched. | [1–5] matches strings such as 12345 or 26589, but not 6789. |
? | Preceding characters or value ranges are an optional part of the expression to be matched. | Sept? matches Sept and September, but not December. |
+ | Preceding characters or value ranges can be matched one or more times. | [0–9]+ matches 1, 11, 456, and so forth. |
* | Preceding characters or value ranges can be matched zero or more times. | 12*3 matches 1223 and 123, but not 223 or 23. |
?? | Matches a minimal part of the optional characters or value ranges. | 6(th)?? matches 6th. |
+? | Matches a minimal part of the characters or range values that can be repeated. | Ju+? matches June and July, but not January. |
*? | Matches a minimal part of the characters or range values that can be repeated. | ea*? matches strings such as each, era, and fare. |
( ) | Contains a group of expressions and values. | (cat) matches strings such as category and concatenate, but not cart. |
\ | Allows a metacharacter to be used as a literal character. | \+ allows the plus sign to be recognized as such. |
$ | Matches the input based on the last character. | [123]$ matches strings that end with 1, 2, or 3. |
| | Matches an alternative phrase or spelling. | I|international matches International and international. |
! | Indicates what characters are not included in the match. | c(a!b) matches cat or can, but not cab. |
The abbreviations provide shortcuts for value ranges.
Abbreviation | Description |
---|---|
\a | Any alphanumeric character (a–zA–Z0–9) |
\b | White space |
\c | Any alphabetic character (a–zA–Z) |
\d | Any decimal digit (0–9) |
\h | Any hexadecimal digit |
\n | New line |
\q | A quoted string |
\w | A simple word ([a–zA–Z]+) |
\z | An integer ([0–9]+) |
Examples of regular expressions are as follows:
String to find | Regular expression |
---|---|
A date in yyyy-mm-dd format that is between 1900-01-01 and 2099-12-31 | ((19)|(20))\d\d((0[1-9])|(1[012]))((0[1-9])|([12][0-9])|(3[01])) |
Parts of a line before and after a person's name | ^.*Chris.*$ |
A string field that contains only alphabetic characters | [A-Za-z]* |