Execute Multiple Regex / What is the operator ?

gcomm

Client
Регистрация
01.03.2011
Сообщения
332
Благодарностей
93
Баллы
28
I have the source of a webpage;

I would like to execute multiple regex on this page

> pull specific url(s)
> get h1 tags
> get image tags
etc.

How do I format the regex line to get all these seperate regex in one loop. ? What is the operator to facilitate this ?


{-RegExp.RegExp-|-source of webpage-|-regex 1 (operator?) regex 2 (operator?) regex 3 (operator?) etc-|-all-}
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
Код:
http://introcs.cs.princeton.edu/java/72regular/
and
Код:
http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator
Don't know if that's what you are looking for. You could just break them down into individual steps.
 

gcomm

Client
Регистрация
01.03.2011
Сообщения
332
Благодарностей
93
Баллы
28
I'm thinking this is the operator

{DO THIS REGEX} && {DO THIS REGEX} && {DO THIS REGEX} && {DO THIS REGEX} && {DO THIS REGEX}

> TAKE RESULTS > APPEND TO A FILE



MODS ?
 

gcomm

Client
Регистрация
01.03.2011
Сообщения
332
Благодарностей
93
Баллы
28

Stereomike

Client
Регистрация
29.03.2011
Сообщения
221
Благодарностей
30
Баллы
0
If you could &&-combine regexp, it would give less results:

yellow cat
yellow dog
green cat
green dog

regexp *.green:
green dog
green cat

regexp *.green && *.dog:
green dog

|| would lead to more results:
regexp *.green || *.dog:
yellow dog
green dog
green cat

_but_ this would lead to a totally messed up file, everything you searched for in one file without possibility to determine what is what.

A better way is doing this in several passes (on whole page). Or if it's something like ebay, look (regexp) for repeating sections (e.g. one paragraph per offer) and have this momentary section in a branch. Then take this branch and watch closer for specific stuff like your h1 tags, price etc. Take all derived data from that section and save in a single |- delimited line and go for the next paragraph. That's how I do it for really complex stuff (and lots of it).
 
  • Спасибо
Реакции: BobPull

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)