Need help to scrape content, delete duplicate in zenno mp

veeco

Client
Регистрация
27.05.2011
Сообщения
112
Благодарностей
1
Баллы
18
Hello, i'm working on simple zenno project for scraping content, for this purpose i'm thinking to scrape google qualified individual in specific country.
Here's the main flow:
1. go to http://google.starttest.com/, and click for "Search for Qualified Individuals"
2. choose country manually
3. grab certification type in a list variable
4. start to loop on each certification
5. on first selection of certification, select
6. click 'search'
7. scarpe on each row (with regex)
8. remove duplicates
9. click the first name
10. on each detail, grab first name, last name, company
11. back to list
12. do step no. 7 until ends
13. loop/back to step 5 with next selection, until ends

Now i'm stucked in:
no. 4. how to do 'the loop' the list ?
no. 7 , on First Name and Last Name, it is same link (i'm using take source - HTML data), use regex and add to the list. I tested on debug, click on my list name, but it empty ?

that's all for now, as i'm still stucked.. i don't have more question at the moment
 

veeco

Client
Регистрация
27.05.2011
Сообщения
112
Благодарностей
1
Баллы
18
to explain my no.4 problem:
i see the page source goes like this

<select name="Certification">
<option value=""> Select One </option>
<option value="Google Adwords Qualified - Display Advertising">Google Adwords Qualified - Display Advertising</option>
<option value="Google AdWords Qualified - Reporting and Analysis">Google AdWords Qualified - Reporting and Analysis</option>
<option value="Google AdWords Qualified - Search Advertising">Google AdWords Qualified - Search Advertising</option>
<option value="Google Analytics Individual Qualification ">Google Analytics Individual Qualification </option>
<option value="Google Apps Qualified">Google Apps Qualified</option>
</select>

so basically i need zenno to loops for each selection but the skip where value="" (Select One option)
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
682
Баллы
113
(?<=<option\ value=")\w.*(?=")
 

veeco

Client
Регистрация
27.05.2011
Сообщения
112
Благодарностей
1
Баллы
18
thanks for the feedback but that doesn't solve my issue

it should:
- select base on name parameter (name="Certification") , because there are other select dropdown
- count how many <options> in select element
- start the loop

any ideas ?
 

veeco

Client
Регистрация
27.05.2011
Сообщения
112
Благодарностей
1
Баллы
18
i'm pretty sure that it works on regex designer but failed on debug, please see attached file and let me know where did i go wrong ?
i'm thinking to parse this element with 2 phase:
1. get innerhtml of the <select>
2. parse the value
3. count how many value exist

thanks for the feedback but that doesn't solve my issue

it should:
- select base on name parameter (name="Certification") , because there are other select dropdown
- count how many <options> in select element
- start the loop

any ideas ?
 

Вложения

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
682
Баллы
113
New regex....(?<=<option\ value=")Google.*?(?=">). Put all results to a list and then do a list processing to count the rows which will tell you how many loops you need. Also on the page where the names are, use regex to find the ID and the code .... (?<='id=).*(?='). Put these to a list and remove duplicates and then navigate to the page....https://www.starttest.com/9.0.0.0/searchcert.aspx?cmd=detail&id={-Variable.Name-} and scrape. When the list is empty navigate back to the main page and use the next <option> result to go to the next set of people.
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)