1.2.9. Topic: Pattern Matching with Regular Expressions xTuple :
xTuple Logo Solutions About Us Resources News

Our Products
xTuple Open Source ERP Products
PostBooks
Standard Edition
OpenMFG Edition

More Open Source
 OpenRPT Report Writer
 PostgreSQL Database
 Pricing for all services


xTuple ERP 3.0 won the LinuxWorld product excellence award for best Business Application! Read more here!

Our Community
 xTuple.org Home
 Forums & Mail
 Issue/Bug Tracker
 Search xTuple.org
 Downloads

Industry case studies
 Automotive aftermarket
 Bearings and pulleys
 Fresh/frozen foods
 Garments (make to order)
 Inks and ink technology
 Pumps and valves
  - View all case studies

Free Demo Download
 Please login:
username:
password:
 ... or click here to register:

  ... End Users
  ... Solution Providers

1.2.9. Topic: Pattern Matching with Regular Expressions

xTuple ERP supports the use of Regular Expressions (also known as "Regex") in fields where pattern matching is called for. This Regular Expression support gives you tremendous flexibility and control whenever you want to retrieve unique patterns from within any of the following groupings:

  • Class Codes

  • Planner Codes

  • Item Groups

  • Product Categories

  • Customer Types

  • Customer Groups

  • Vendor Types

Let's say, for example, you want to generate a report showing Internet sales during a given period. To access this data, you would need to look at sales made to your Internet Customers. Because Internet Customers are a subset of Customer Types, you would create a Regular Expression to retrieve this subset and send it to the report.

1.2.9.1. Characters and Metacharacters

Regular Expressions are created using different combinations of characters and metacharacters. A character is defined as any alphanumeric character—both upper and lower case—including punctuation marks, white space, and other keyboard symbols. Metacharacters, sometimes referred to as " wildcards," are special characters used to facilitate pattern matching. The most common metacharacters are described in the tables below.

Tip

The more you understand the role of metacharacters, the more you will be able to control your pattern matching.

The first thing to understand is that Regular Expressions match " substrings." A substring is a subset of a " string"--a string being a sequence of characters arranged in a line. For example, the numbers 123 are a substring of the string 012345. Similarly, the letters EAT are a substring of the strings MEAT, EATERY, and THEATER. Numbers and letters can also be combined to form a substring. The pattern INTCUST4 is a substring of a particular set of Internet Customer strings: INTCUST400, INTCUST401, INTCUST402, and so on. As you can see, substrings may appear anywhere in a string—at the beginning, the middle, or the end.

Tip

Regular Expressions are case-sensitive. This means there is a difference between "a" and "A". Keep this in mind when using Regular Expressions for pattern matching.

Metacharacters give you even more control over your substring definitions. With metacharacters, you can specify the exact location of a substring: beginning of a word or line, end of a word or line, etc. You can also specify ranges of data: Customers A-Z, Items 1-9, etc. This sort of control is especially vital when searching through large quantities of data. As you can imagine, metacharacters will not only save you time, they will also increase your precision. The following tables describe metacharacters in more detail.

Table 1.3. Single Character Metacharacters

.

Matches any single character.

Example:

  • The Regular Expression b.t would match bat , bet , bit , and bot , but not boot , BAT , BET , etc. Because the substring might be part of a longer word, b.t would also match bottle , batch , abbot , etc.

[...]

Matches any single character listed between the brackets.

Examples:

  • The Regular Expression b[aeo]t would match bat , bet , and bot . It would not match bit .

  • Likewise, the expression CUST[A-M] would match any strings containing the range of patterns from CUSTA to CUSTM . This would include CUSTABC , CUSTA123 , CUSTBCD , CUSTB123 , etc.

  • To match both upper case and lower case occurrences, use the expression CUST[A-Ma-m] .

  • You may also combine ranges of letters and numbers in the same expression. To locate product categories beginning with A-B and/or 1-5, you would use PROD CAT[A-B1-5] . This expression would match PRODCATABC , PRODCATB123 , PRODCAT1ABC , PRODCAT2123 , etc.

[^...]

Matches any single character except those listed between the brackets.

Examples:

  • Functions like the previous metacharacter, except that here the caret symbol "^" excludes characters from the pattern. For example, the expression b[^ae]t would match the substrings bit and bot , but not bat and bet .

  • To locate all product categories except those at the end of the alphabet, you would use PRODCAT[^X-Z] . This expression would return the same results as PROD CAT[A-W].


Note

xTuple ERP supports pattern matching with Regular Expressions in accordance with the Portable Operating System Interface (POSIX) standard.

Table 1.4. Quantifiers

?

Matches the preceding element zero or one time.

Definition: The term "element" describes both single characters (" A ") and also ranges or lists of characters (" [A-Z] ", " [acfrg] ", " [^2-7] ", etc.).

Examples:

  • The expression bo?ut would match two separate substrings: those containing one instance of the preceding "o" and those containing zero instances of the preceding "o". As a result, the following matches would be returned: bout , but , about , butter , etc.

  • CLASS[A-C]? would match any string containing the substrings CLASSA ..., CLASSB ..., CLASSC .... Because zero instances of [A-C ] would also be included, matches would be returned for any string containing the root CLASS -- for example, CLASS123 , CLASSDEF , etc.

*

Matches the preceding element zero or more times.

Examples:

  • Use a period followed by an asterisk (".*") to match all existing patterns. For example, enter .* to show all Customer Types where pattern matching by Customer Type is called for.

  • The expression CUST* would return Customer Type records for all Customer Types containing the substring CUST . Records would also be returned for all Customer Types containing the substring CUS , since the asterisk also matches the preceding element (in this case, the character "T") zero times. To avoid matching the substring CUS , insert a period before the asterisk. The expression CUST.* will match all strings containing CUST , but not all strings containing CUS

+

Matches the preceding element one or more times.

Example:

  • The Regular Expression PRODCAT10+ would match the "0" at the end of the pattern one or more times. Product Categories beginning with the numbers "10", "100", "1000", etc. would be found, as in PRODCAT10ABC , PRODCAT100ABC , PRODCAT1000ABC , and so on.

{num}

Matches the preceding element num times.

Example:

  • The expression CLASSA{3} would match the "A" at the end of the pattern three times. The following Class Codes would be found: CLASSAAA1 , CLASSAAA2 , CLASSAAA3 , etc.

{min,max}

Matches the preceding element at least min times, but not more than max times.

Example:

  • The expression CLASSA{2,3} would match the "A" at the end of the pattern a minimum of two times and a maximum of three times. The following Class Codes would be found: CLASSAA1 , CLASSAA2 , etc. and also CLASSAAA 1, CLASSAAA2 , etc.


Tip

A space between characters is considered a character itself. You should avoid using spaces when writing Regular Expressions, unless the pattern you are matching expressly calls for them.

Table 1.5. Anchors

^

Matches at the start of the line.

Example:

  • Use the caret symbol "^" to specify that the substring you are trying to match occurs specifically at the beginning of a line. For example, the expression ^ITEM would match all lines beginning with the characters "ITEM", as in ITEM100 , ITEMABC , etc. The expression ^ITEM would not match the following, since "ITEM" does not occur at the beginning of these lines: MFGITEM , PURCHITEM , REFITEM , etc.

$

Matches at the end of the line.

Example:

  • Use the dollar symbol "$" to specify that the substring you are trying to match occurs specifically at the end of a line. For example, the expression 999$ would match all lines ending with the characters "999", as in ITEM100999 , ITEM200999 , ITEM300999 , etc. The expression 999$ would not match the following, since "999" does not occur at the end of these lines: ITEM 999123, ITEM999ABC , etc.


Tip

Build Regular Expressions using trial and error. If your first attempt doesn"t yield the desired results, modify the expression and try again.

1.2.9.2. Hierarchical Structure

To simplify your pattern matching efforts, you should organize your groupings according to a hierarchical structure. Again, the following groupings support pattern matching using Regular Expressions:

  • Class Codes

  • Planner Codes

  • Item Groups

  • Product Categories

  • Customer Types

  • Customer Groups

Let's consider the Customer Type grouping to illustrate this point about hierarchies. If your Customer Types have been arranged hierarchically, the naming convention will exhibit a logical, sequential order. The following list shows an orderly, hierarchical arrangement of Customer Types:

  • CUSTUSA100, 101, 102, ...

  • CUSTEUROPE100, 101, 102, ...

  • CUSTASIA100, 101, 102, ...

Regular Expressions will find and match any pattern. But writing them will be easier if you arrange your groupings hierarchically, as shown in the example above.



 
Copyright © 1998-2008 by xTuple. All rights reserved. 
 

SourceForge.net Logo