Scraper sometimes repeats same pages

marl0_stanfield · 21.08.2012

i'm pretty sure i've seen the same bug. its probably due to the zp cache bug.

rostonix · 21.08.2012

You can delete duplicates from result list/table for example

CaptainObvious · 21.08.2012

1. Pull the page URL before reload
2. Pull the page URL after reload
3. Add URL variables to logic action {oldurl}=={newurl}
4 If URLS ==, pass thru a counter loop before retrying page reload (saves your ass from a infinite loop).

rostonix · 21.08.2012

Another strange bug I noticed was when I assign URL's to a LIST I was only able to pull 14 URLs into the list... I would swap around the URL's in the list to see what was causing the list truncation but no matter what i did I could only get 14 loaded... this was very strange but separate from the other issues... I have a very, very fast and heavy duty notebook i run this on so I couldnt see this as a memory limitation....

I think you mean previews is settings.
These are just previews.
==
Try to use GET requests instead of opening pages and parse body that you get.

drvosjeca · 21.08.2012

you are looking at wrong get... You need to scroll down and use HTTP: Get Request action block

rostonix · 21.08.2012

You question in unclear. If you just need o scrape Data from URL, you use GET request and then parse the result with regex just like DOM code but via
Text processing - Regex

drvosjeca · 21.08.2012

it is exactley the same...

Even in video tutorial i have made you will see that all grabed text is putted to variable, and then you work from there.

Variable is just a shorter term for us to see, program see everything, all content of variable, so you dont need to worry about that. Check that scraping video again, and now have in mind that here is same, just that you dont see it at start, but you put it to variable same way.

drvosjeca · 21.08.2012

You thought it all wrong...

1. Regex is not pulling anything, it is just like cleaning data (from data you have already pulled, you extract what you need with it)

2. GET is not forcing nothing, it is just making simple request for data, just like when you open site in normal browser.

3. Like said before, variable is same as data, just shortened for your eyes so it can be manipulated with less troubless (like when you put all your cookies in a jar, it is easier to move jar around with all the cookies then moving one by one cookie)

4. GET have no dropdowns with txt, it is encoding! Txt is comming from url which you add there... Again, it is same as opening in browser, just faster.

5. Please try suggested before jumping out with questions, otherwise we can talk for ever, and you will still not have anything done. Trying ==> Learning

rostonix · 22.08.2012

http://www.mediafire.com/?9lob5ldo0raz5j4

1) GET source code of Google.com
2) Parse it with word Google

drvosjeca · 22.08.2012

genetrader сказал(а):
Show me where a specific tutorial or video or FAQ is on using GET versus GOTO PAGE??? I have been trying which is why I am asking for help. There is not downloadable manual for MP and I havent found any information in version 3 that has helped me understand this. I have been trying and trying... it is totally unclear how you use GET once you 'GET' a page and assign it to a variable. Then what? please tell me how I can use a regular expression on the HTML i get from a page that is assigned to a variable???? If it was straightforward I would figure it out.... but its not. Yes I am new to your software but not to Boolean logic and linear step programming.

PLEASE help me and explain what I am doing? I cant figure it out. If its written somewhere, show me where to read or watch. It is NOT explained in your scraper video.

Thanks again in advance for the time and effort taken. I will guarantee you I am not the only one wondering about this but I have yet to see it explained in a thread in this forum.

Thanks for your time and help.

Hey... im sorry if you didnt understand what i was trying to say there...

You can contact me on skype and i will explain you how that works :-)

my skype id: dejan.jugovic1

Stroks · 01.02.2014

Just to add i found scraping google usinng get request is way faster that usual way.

Scraper sometimes repeats same pages

Client

Client

Известная личность

Client

Client

Известная личность

Client

Client

Client

Client

Известная личность

Client

Client

Client

Client

Client

Известная личность

Client

Client

Client

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)