ZennoLab

Automate everything

User Tools

Site Tools


Sidebar

Translations of this page:

en:creating-a-regular-expressions

Regular expressions

What are regular expressions

Regular expression - a search pattern of substring in a string. For instance, you want to find in the text all words beginning with letter 'a' or all words of no less than four letters, etc. Regular expressions in ZennoPoster help you, for example, to find the confirmation link while email processing or text captcha on a web page. No ways to create a parser without using regular expressions.
Regular expressions are actually very simple, you need to know a bit of syntax (since in ZennoPoster there is Regexp designer it is not required).

Where and how are regular expressions used in ZennoPoster

  • Find a substring in the text file.
  • Find a confirmation letter in the email box.
  • Find links to confirm account registration.
  • Find strings to remove from the list.
  • Web page parsing.
  • Other useful features.

How to quickly create a regular expression in ZennoPoster

To create regular expressions, you can use the helper - Regular expressions designer. You can open it by pressing the button Regexp tester in the advanced editor menu, or go from the window to receive mail by pressing This is not what I need.

In the left pane of the opened window you should paste text for parsing. Most often, you can create a regular expression to parse text, by using the start or end of the search text, as well as the text that stands before search text, or after it. To this end, below the regular expression you see four fields, when you edit one of them you see changes in regular expression.

There is an option in the middle of designer to choose the middle of the text, if you select Enable line breaks then line breaks can be in search text. If you do not check this box, the search is within one line. There is also a flag that the Shortest match of search text is taken - the results will result in the shortest substring corresponding to the composed expression. When turned off, respectively - the longest.

Click Test, and in the right pane will be parsing results (if there are matches). Matches can be multiple, they will be divided by the numbers. If everything is correct including the search text then the regular expression can be taken from the top field. If you get something wrong, try changing your search criteria.

Over the field of the regular expression, the History button has appeared, here the received regular expressions are stored, which you can use later.

The program allows you to use Group regular expressions and save several results at once. Further results can be save by groups into variables with the choice of the match number, as well as in a table with the ability to exclude columns.

Note

A regular expression parses as many substrings as there are in the text. If you want to take a specific one of matches, use ranges.

Basic of syntax

You can also try to make a regular expression by yourself, using the following tips:

The simplest regularexpression can bewritten as follows: abc
This expression matches the string abc. That is, regular expression, that consists of letters without any commands, is looking for search text.

The brackets limit search to those symbols which they concluded : [abc]
In this case, the substring is found which consists of only one letter a, b ​​or c. For example the regular expression [abc]d finds ad, bd, cd or nothing (if there is no such a sequence in text).
A point in the regular expression matches any character except '%%\</ nowiki> n' . That is, setting the regular expression '.'you will find any character except linebreak. By setting regular expression “” - any three-letter string. You will find the text substring of 4 letters starting with ab with ab.. regular expression.
In a regular expression, you can use the symbol '|', acting as the operator OR (or). For example, the following regular expression searches in a string for substrings ru, com or net: (ru|com|net)
You should put the ^ symbol at the beginning to eliminate the sequence of characters from the search , for example: [^аbcd] (or[^а-d] ) - it matches any character except a , b, c, d . Note: symbol ^ is inside the square brackets, since only in this case, it is not.
A regular expression can be specified using symbols +, ?, *, for instance:

a+ - one or more letter a (strings aaaa and aa match this expression , but string hello -no)
a? - none or one letter a. For instance, by using regular exptession 123a+ we find any substring, which starts from 123, and, probably, ends with a (or not)
a* - any count of a in a row

You can specify the desired number of characters or a range, for example:
xy{2} - matches a string where x implies two y
xy{2,} - matches a string, where x implies no less two y (can be more)
xy{2,6} - matches a string, where x implies from two to six y

To specify the number of occurrences is not a single character, and their sequence, use parentheses:
x(yz){2,6} -matches a string, where x implies from two to six sequences of yz;
x(yz)* - matches a string, where x implies any count of sequences yz in a row;

In a regular expression, you can specify whether a particular subexpression is encountered in the beginning, the end of the line or at the beginning and end of the line. The character ^ matches the start line, the dollar sign $ matches the end of the line:
^xy - matches any string started with xy. Note, that in this case ^ placed outside of the expression in parentheses, for instance ^[a-z]

xy$ - matches any string ended with xy
In those cases where you need to compare the expression to the line where there are special characters such as $, ^, { etc.,in front of them put a backslash “\”. For instance, to find $ in a line, regular expression has to contain \$


A few more wildcards:
\w Word (digit or letter)
\W Not word (not digit and not letter)
\d Decimal digit
\D Not decimal digit
\s Empty space (space, \f, \n, \r, \t, \v)
\S Not empry space (not space, not \f, not \n, not \r, not \t, not \v)

en/creating-a-regular-expressions.txt · Last modified: 2017/12/15 13:02 by deemer