Available with Data Reviewer license.
With the Regular Expression check, you can check both the values and formats of string values. String fields use alphanumeric strings as their values.
They include fields that contain the feature's name, measurements (height, length, width, and area), z-values, and metadata such as a feature's creation date. When a string is found that does not match the format or values specified, the table row is returned as an error. For example, to find records that have an incorrect Social Security number (SSN) format, you can type \b[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]\b in the SSN field. The check returns features or rows that have incorrect SSN field values, such as 123456789, 123-ab-4567, 1123-34-12345, or 123-4567.
The Regular Expression check can also be used to check the text strings based on ranges of values. To format these values, a variety of parameters can be defined. These parameters consist of metacharacters and abbreviations. The metacharacters help you add complexity to your query, while the abbreviations provide shortcuts you can use to include specific types of values in the query.
Learn more about the Regular Expression check
Metacharacter descriptions
The metacharacters described in the following table are operators that can be used in the query to determine what is to be matched using the Regular Expression check.
Metacharacter | Description | Example |
---|---|---|
. | Matches individual characters. | x.y.z matches a string such as x1y0z or xaybz. |
[ ] | Contains individual characters and value ranges to be matched. | [xyz] matches strings that contain x, y, or z. |
^ | Matches starting input when it is at the beginning of the expression. When inside brackets and followed by characters, it negates the characters that follow. | ^[xyz] matches strings that start with x, y, or z. [^abc] matches strings that do not contain any combination of a, b, and c. Strings that would match include bat and bar, but not cab. |
- | Indicates a range of values to be matched. | [1–5][1-9][1-9] [1-9][1-9] matches strings such as 12345 or 26589, but not 67890. |
? | Preceding characters or value ranges are an optional part of the expression to be matched. | Sept? matches Sept and September, but not December. |
+ | Preceding characters or value ranges can be matched one or more times. | [0–9]+ matches 1, 11, 456, and so forth. |
* | Preceding characters or value ranges can be matched zero or more times. | 12*3 matches 1223 and 123, but not 223 or 23. |
?? | Matches a minimal part of the optional characters or value ranges. | 6(th)?? matches 6th. |
+? | Matches a minimal part of the characters or range values that can be repeated. The minimal part can be repeated one or more times. | Ju+? matches June and July, but not January. |
*? | Matches a minimal part of the characters or range values that can be repeated. The minimal part can be repeated zero or more times. | ea*? matches strings such as each, era, and fare. |
( ) | Contains a group of expressions and values. | (cat) matches strings such as category and concatenate, but not cart. |
\ | Allows a metacharacter to be used as a literal character. | \+ allows the plus sign to be recognized as such. |
$ | Matches the input based on the last character. | [123]$ matches strings that end with 1, 2, or 3. |
| | Matches an alternative phrase or spelling. | I|international matches International and international. |
Shortcuts for value ranges
The following abbreviations provide shortcuts for value ranges:
Abbreviation | Description |
---|---|
\a | Any alphanumeric character (a–zA–Z0–9) |
\b | White space |
\c | Any alphabetic character (a–zA–Z) |
\d | Any decimal digit (0–9) |
\h | Any hexadecimal digit |
\n | New line |
\q | A quoted string |
\w | A simple word ([a–zA–Z]+) |
\z | An integer ([0–9]+) |
Examples of regular expressions
Examples of regular expressions are as follows:
String to find | Regular expression |
---|---|
A date in yyyy-mm-dd format that is between 1900-01-01 and 2099-12-31 | ((19)|(20))\d\d((0[1-9])|(1[012]))((0[1-9])|([12][0-9])|(3[01])) |
Parts of a line before and after a person's name | ^.*Chris.*$ |
A string field that contains only alphabetic characters | [A-Za-z]* |