Improve scraping from proxy sources

Stickado1

Client
Регистрация
28.05.2011
Сообщения
27
Благодарностей
0
Баллы
1
Hello,

I would love to have support for scraping proxies out of website lists like this one http://www.hidemyass.com/proxy-list/1 or http://www.samair.ru/proxy/proxy-16.htm

Maybe follow some links to scrape more proxies I'm sure you can come up with something innovative to get even more proxies.


Right now I'm using hrefer for that task and then use the xproxy file hrefer generates but that method isn't very good.


//EDIT: mmh I just noticed ZennoPoster already has this feature, I just need to get into that regex thing.
 

Stereomike

Client
Регистрация
29.03.2011
Сообщения
221
Благодарностей
30
Баллы
0
true, just build a template for scraping those lists and run it every x hours a day.
Your template should save them to a .txt file. And in ZP, use the sources wizard to load this file every x hours.
 

gcomm

Client
Регистрация
01.03.2011
Сообщения
332
Благодарностей
93
Баллы
28
Question : Here > http://nntime.com/proxy-country/United-States-01.htm The source shows the IP which is no problem to parse out; However the port # is bolded, in this case representing port :3124.

Off the top of my head I can't think of how to convert... Help ?

<tr class="even"><td><input type="checkbox" name="c12" id="row12" value="13535377.165.1.11391113413124" onclick="choice()" /></td><td>137.165.1.113<script type="text/javascript">document.write(":"+w+p+k+j)</script></td>
<td>CoDeen/PlanetLab?</td>

Problem : If (":"+w+p+k+j) then =3124
 

Stereomike

Client
Регистрация
29.03.2011
Сообщения
221
Благодарностей
30
Баллы
0
-Get page (text)
-Use this to get all IPs (incl port):
{-RegExp.RegExp-|-{-FieldData.FieldData-|-proxy-|-pagetext-}-|-(\d{1,3}\.){3}\d{1,3}\:\d+-|-0;end-}
-Save this to textfile.
-Load with source wizard.
-Done!
 

Stereomike

Client
Регистрация
29.03.2011
Сообщения
221
Благодарностей
30
Баллы
0
In fact I had laugh, cause they obviously try to obfuscate the address, but a simple clean text scrape does the trick!
 

bange

Client
Регистрация
16.04.2011
Сообщения
37
Благодарностей
2
Баллы
8
hi i tried with ur regex
but cant get it to work can u help me please
 

Stereomike

Client
Регистрация
29.03.2011
Сообщения
221
Благодарностей
30
Баллы
0
Sure, what's the problem?
I tried my solution before I posted it, worked for me. Did you alter something? different site?
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)