Dynamic Random Inner Link Clicking

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
Good title huh? :P

My objective:
Go on a website, click a random inner link, click a random inner link, ... N times.

The problem:
On each page there's a different number of inner links, this is why it needs to be dynamic.

So far:
Thanks to a previous thread in this group, i know how to get the DOM text and how to get the number of matches.

Main Problem:
When using the Rise on event Click, i don't manage to get the same amount of links with the regex code.
This is what i was using href.*html the parsing gives x number results and the branch builder gives 0

Any ideas tips are welcome, the currency over here are boobs?
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113
1) get source code
2) parse all links with regex (inside <a> tag)
3) save them to txt
4) extract random line from that txt
5) rise event on <a> tag with href as a result of 4th step's macro
6) delete txt with link
7) cycle as many times as you need

Just a theory
 
  • Спасибо
Реакции: reislet

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
Hey thanks for the response, i was out the weekend and didn't check the forum.

The idea is good but I need to modify it a bit.

goes on domain.com
from domain.com i can go to -> domain.com/a.html;domain.com/b.html;domain.com/c.html
randomly picks one (let's say b) from domain.com/b.html we can go to -> domain.com/a.html;domain.com/x.html;domain.com/y.html

So what i'm trying to say is that if a link is found on one page it might not be found on the second one.

I just have to figure out how not to click on rss feed or 301 redirect's
 
Регистрация
26.03.2012
Сообщения
44
Благодарностей
6
Баллы
0
Are you actually wanting it to register as clicks on the site, or just wanting to visit each of the urls on the site in a random order?

If you start on page A and find links B and C, then click on link B but want to follow that by clicking on link C which does not exist on page B, you could:

Save the url of link C before going to page B and then load the url of page C without actually clicking the link
or
Return to page A to click link C

Alternatively, if you don't need to click all links on page A and just want it to continue clicking through whatever links are on the current page, you could just set up a loop that contains the steps Rostonix gave you. With each new page, it would clear out the text file removing old links and repeat the steps to gather the new links and then select one at random to click. Just contain all of this in a loop and set up a counter to exit the loop once it has clicked however many links you want it to visit.


As far as RSS feeds (and redirects depending on how they're formatted), you should be able to set up a regex that will make sure the link does not contain "rss" or "feed" or whatever (maybe the "look ahead" feature). I'm new to regex, so I'd have to mess with it for a bit to figure out exactly what needs to be done.
 
  • Спасибо
Реакции: reislet

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
It would be the B alternative, as in actually registering the click, and only click on the link if that's present on the site. And looping the thing what i was thinking as well.

As for filtering out RSS and 301's they are usually formatted either domain.com/feed or domain.com/rss and the redirect domain.com/redir1 ... Thanks for the idea of the look ahead will check it out :-)
 

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
Hmm kinda got stuck @ step 5.
Rise on event, click,
group: 1; attribute name: href; attribute value:???; seach type: regexp; match#: the random value which i got from prev steps.

EDIT: made a rough simplified template, if i would have boobs I would bribe you with it, but any help is welcomed :-)
 

Вложения

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
Finally I got a solution for this, it ain't pretty, it ain't fast, it does the job.
The steps:
- Get DOM
- Parse Urls and twice (cause i didn't know how to put together into one regex i made two)
- Parse dom again for html links
- Check the gotten links against a black list
- Clean the file from empty lines
- Get the number of links parsed
- Get random link from parsed results
- Turn the random link into regex and feed it to rise click attribute value
- Done :-)
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)