GET request increases the number of errors

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
Hello,

i have a script that does this:
1) i have a list A of 200 URLs
2) i have a list of B 50 domains

Then for each URL in list A(i open each url), i check if there is any of the domains on list B.
In other words, checking all URL for all domains.

Now to the problem:
If i check this inside zenno with the "browser" mode, i get 100% successful finished threads.
But if i use the GET request instead of the "browser" i got about 20% failure (unfinished threads).

I changed to GET because it should work faster. And indeed i get an 30% increase in speed.
The other thing that i saw is that the CPU is at MAX when i use GET.

The rest of the script is the same. I just changed the "go to browser" to "get" and in both cases i save the results into a variable to process.

Any idea why i get failures with GET?
 
Последнее редактирование:

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113
What error go you get?
 

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
There are no errors displayed while i run the script (not in debug mode).
Its just that there is no result in the results table/file for those "threads" that fail.
 

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
Hi,

i think i found the error. It is when i want to request a file that is not a HTML. Fore example i request an URL that is a PDF file (like: cnn.com/documents/cnn.pdf).
In this case the GET method maxes the CPU and the script starts to produce failures. Suppose the "browser" mode somehow "handled" this, but for now i need to
exclude PDF like URL from my list of URLs to check.
 
  • Спасибо
Реакции: LightWood

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113
Retrieve only headers and make check if this pdf or not.
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
Nice:-)
Can i also check if the file is HTML? ...because maybe there are more those non html types...like .doc etc..
BTW: Is retrieving headers the same amount of "job" (like 5 sec per request) as when retrieving all of the data?
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
For example GET 200 URLs is 200*5sec = 1.000 sec.
GET HEADERS + GET ALL = 200 + 200 = 400 * 5 sec = 2000 sec.

Is that correct?
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113
You GET header, check if it's html, if it is you GET body
 

zenrast

Client
Регистрация
20.03.2013
Сообщения
57
Благодарностей
1
Баллы
8
Yes, i get it. Unfortunately in 99% cases there will be HTML, so i will need to make a 2 GET requests most of the time (so doubling the GET times).
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)