Pattern Matching

Overview

xTuple ERP supports the use of "regular expressions" (also known as "Regex") in fields where pattern matching is called for. This regular expression support gives you tremendous flexibility and control whenever you want to retrieve unique patterns from within various data sets (e.g., class codes, product categories, customer types, etc.).

Let's say, for example, you want to generate a report showing Internet sales during a given period. To access this data, you would need to look at sales made to your Internet customers. Since "Internet Customers" are a subset of customer types, you would create a regular expression to retrieve this subset and send it to the report. 

Characters and Meta-characters

Regular expressions are created using different combinations of characters and meta-characters. A character is defined as any alphanumeric character — both upper and lower case — including punctuation marks, white space, and other keyboard symbols. Meta-characters, sometimes referred to as "wildcards," are special characters used to facilitate pattern matching. The most common meta-characters are described in the tables below.

Info: The more you understand the role of meta-characters, the more you will be able to control your pattern matching.

The first thing to understand is that regular expressions match "sub-strings." A sub-string is a subset of a "string"—a string being a sequence of characters arranged in a line. For example, the numbers, "123" are a sub-string of the string 012345. Similarly, the letters EAT are a sub-string of the strings MEAT, EATERY, and THEATER. Numbers and letters can also be combined to form a sub-string. The pattern, INTCUST4, is a sub-string of a particular set of Internet customer strings: INTCUST400, INTCUST401, INTCUST402, and so on. As you can see, sub-strings may appear anywhere in a string—at the beginning, the middle, or the end.

Info: Regular expressions are case-sensitive: this means there is a difference between "a" and "A." Keep this in mind when using regular expressions for pattern matching.

Meta-characters give you even more control over your sub-string definitions. With meta-characters, you can specify the exact location of a sub-string: beginning of a word or line, end of a word or line, etc. You can also specify ranges of data: customers A-Z, items 1-9, etc. This sort of control is especially vital when searching through large quantities of data. As you can imagine, meta-characters will not only save you time, they will also increase your precision. The following tables describe meta-characters in more detail:

Single Character Meta-characters

.

Matches any single character.
Example: 
* The regular expression 'b.t' would match 'bat,' 'bet,' 'bit,' and 'bot,' but not 'boot,' 'BAT,' 'BET,' etc. Since the sub-string might be part of a longer word, 'b.t' would also match 'bottle,' 'batch,' 'abbot,' etc.

[...]

Matches any single character listed between the brackets.
Examples: 
* The regular expression b[aeo]t would match 'bat,' 'bet,' and 'bot.' It would not match 'bit.'
* Likewise, the expression CUST[A-M] would match any strings containing the range of patterns from CUSTA to CUSTM . This would include CUSTABC, CUSTA123, CUSTBCD, CUSTB123, etc. 
* To match both upper case and lower case occurrences, use the expression CUST[A-Ma-m]. 
* You may also combine ranges of letters and numbers in the same expression. To locate product categories beginning with A-B and/or 1-5, you would use PROD CAT[A-B1-5]. This expression would match PRODCATABC, PRODCATB123, PRODCAT1ABC, PRODCAT2123, etc.

[^...]

Matches any single character except those listed between the brackets.
Examples: 
* Functions like the previous meta-character, except that here the caret symbol: "^" excludes characters from the pattern. For example, the expression b[^ae]t would match the sub-strings bit and bot, but not bat and bet. 
* To locate all product categories except those at the end of the alphabet, you would use PRODCAT[X-Z]. This expression would return the same results as PROD CAT[A-W].

Note: xTuple ERP supports pattern matching with regular expressions in accordance with the "Portable Operating System Interface" (POSIX) standard.

Quantifiers

?

Matches the preceding element zero or one time.
Definition: The term "element" describes both single characters (" A ") and also ranges or lists of characters (" [A-Z]," " [acfrg], " " [^2-7], " etc.).
Examples: 
* The expression bo?ut would match two separate sub-strings: those containing one instance of the preceding "o" and those containing zero instances of the preceding "o." As a result, the following matches would be returned: bout, but, about, butter, etc. 
* CLASS[A-C]? would match any string containing the sub-strings CLASSA ..., CLASSB ..., CLASSC .... Because zero instances of [A-C ] would also be included, matches would be returned for any string containing the root CLASS — for example, CLASS123, CLASSDEF, etc.

*

Matches the preceding element zero or more times.
Examples: 
* Use a period followed by an asterisk (".*") to match all existing patterns. For example, enter .* to show all customer types where pattern matching by customer type is called for. 
* The expression CUST* would return customer type records for all customer types containing the sub-string CUST. Records would also be returned for all customer types containing the sub-string CUS, since the asterisk also matches the preceding element (in this case, the character "T") zero times. To avoid matching the sub-string CUS, insert a period before the asterisk. The expression CUST.* will match all strings containing CUST, but not all strings containing CUS

+

Matches the preceding element one or more times.
Example: 
* The expression PRODCAT10+ would match the "0" at the end of the pattern one or more times. Product categories beginning with the numbers "10," "100," "1000," etc. would be found, as in PRODCAT10ABC, PRODCAT100ABC, PRODCAT1000ABC, and so on.

| Operates as a choice between alternatives, equivalent to "or".
Example: 
* The expression abc|def would match "abc" or  "def."

{num}

Matches the preceding element num times.
Example: 
* The expression CLASSA{3} would match the "A" at the end of the pattern three times. The following class codes would be found: CLASSAAA1, CLASSAAA2, CLASSAAA3, etc.

{min,max}

Matches the preceding element at least "min" times, but not more than "max" times.
Example: 
* The expression CLASSA{2,3} would match the "A" at the end of the pattern a minimum of two times and a maximum of three times. The following class codes would be found: CLASSAA1, CLASSAA2, etc. and also CLASSAAA1, CLASSAAA2, etc.

Info: A space between characters is considered a character itself. You should avoid using spaces when writing regular expressions, unless the pattern you are matching expressly calls for them.

Anchors

^

Matches at the start of the line.
Example: 
* Use the caret symbol "^" to specify that the sub-string you are trying to match occurs specifically at the beginning of a line. For example, the expression ^ITEM would match all lines beginning with the characters "ITEM," as in ITEM100, ITEMABC, etc. The expression ITEM would not match the following, since "ITEM" does not occur at the beginning of these lines: MFGITEM, PURCHITEM, REFITEM, etc.

$

Matches at the end of the line.
Example: 
* Use the dollar symbol "$" to specify that the sub-string you are trying to match occurs specifically at the end of a line. For example, the expression 999$ would match all lines ending with the characters "999," as in ITEM100999, ITEM200999, ITEM300999, etc. The expression 999$ would not match the following, since "999" does not occur at the end of these lines: ITEM 999123, ITEM999ABC, etc.

Info: Build regular expressions using trial and error. If your first attempt doesn't yield the desired results, modify the expression and try again.

Hierarchical Structure

To simplify your pattern matching efforts, you should organize your groupings according to a hierarchical structure. Again, the following groupings support pattern matching using regular expressions:

  • Class Codes
  • Planner Codes
  • Item Groups
  • Product Categories
  • Customer Types
  • Customer Groups

Let's consider the customer type grouping to illustrate this point about hierarchies. If your customer types have been arranged hierarchically, the naming convention will exhibit a logical, sequential order. The following list shows an orderly, hierarchical arrangement of customer types:

  • CUSTUSA100, 101, 102, ...
  • CUSTEUROPE100, 101, 102, ...
  • CUSTASIA100, 101, 102, ...

Regular expressions will find and match any pattern, but writing them will be easier if you arrange your groupings hierarchically, as shown in the example above. For more detailed information about pattern matching there are numerous websites available, including this one.