Ranges not Working with Tables?

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
I first tried: "read cell"...

I set the column to "A", and then experimented with ranges for the rows, like "1-5" or "0-end" or "all", but none of these would work...
I got the error "string was not in correct format".

Ok so I tried "take line" instead. I set it to "specify number" and then set that to a range also like I did with "read cell".
Then I set "to variables", and then I set Column A (which is the only column I need) to be stored in a variable, the problem here though is...

Even though I specified a range it only put's 1 row of that column into my variable, but I need all of them. :S

So basically I have excel files that have random number of rows (usually a few hundred) and 10 columns. I'm trying to get all the values of column A in a list. I'm not interested in the other columns...

How can I do this?

And why are those methods above not working?
In the documentation it says you can use ranges with tables so I'm confused and have no idea why it won't work.
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
682
Баллы
113

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
I don't know C# bigcajones. I'm attempting to use regex now to extract the column that I need from the CSV file.

My current (only half-working) match pattern is "http.*?(?=")", this gets all URLs in the CSV file.
The problem though is that I only want the first URL on each line, not the second one (each line has 2 URLs).

Can anyone modify this regex to only grab the first URL from each line in te CSV file? Again my current match pattern is: http.*?(?=")

Would be awesome as I'm struggling with this.

Here is the format of the contents in the CSV file.

"SourceURL","AnchorText","SourceCitationFlow","SourceTrustFlow","FirstIndexedDate","LinkType","LinkSubType","TargetURL","TargetCitationFlow","TargetTrustFlow","FlagRedirect","FlagFrame","FlagNoFollow","FlagImages","FlagDeleted","FlagAltText","LastSeenDate","FlagMention","DateLost","ReasonLost"
"http://twb.oswshop11.nl/shop/pages.php?pageid=8","abc",53,0,08/04/2011 00:00,"TextLink","TextLink_Normal","http://www.abc.nl/",30,16,0,0,0,0,0,0,22/12/2013 00:00,0,,""
"http://likeur.startpagina.nl/","abc",26,24,22/10/2008 00:00,"TextLink","TextLink_Normal","http://www.abc.nl/",30,16,0,0,0,0,0,0,24/12/2013 00:00,0,,""
 

drvosjeca

Client
Регистрация
26.10.2011
Сообщения
512
Благодарностей
455
Баллы
63
Did you try this way?

http:.*?(?=",".*http)

it is simple to limit things if they always come in same format :D
 

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
Just tested it out and it's working mostly but not completely because in some instancs there can be 3 URLs on 1 line, for example this line:

"http://www.kerkgebouwen-in-limburg.nl/view.jsp?content=12427","http://www.elkandre.nl/kapel/kapel.htm",21,10,24/09/2009 00:00,"TextLink","TextLink_Normal","http://www.elkandre.nl/Kapel/kapel.htm",9,6,0,0,0,0,0,0,01/12/2013 00:00,0,,""
Using that modified regex there are two links extracted from that, like this:
http://www.kerkgebouwen-in-limburg.nl/view.jsp?content=12427
http://www.elkandre.nl/kapel/kapel.htm",21,10,24/09/2009 00:00,"TextLink
Any idea how to fix that? I'm trying to understand the regex but not fully grasping it.
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 707
Баллы
113
You can still use "Read cell" action.
But do not use ranges within it.
Just create a loop and use counter's value in it.
 

lokiys

Moderator
Регистрация
01.02.2012
Сообщения
4 770
Благодарностей
1 182
Баллы
113
if i understand you right then you can do like that...

Make one variable with default value = 0

add that variable to take A and your variable in first case it will be 0 (zerro)

then in next action save that A 0 to your pre-defined list.

next add action where you increase your variable by 1 and now it is 1 so and loop it again to take A 1 and again save it in your list...

Hope that helps...
 

drvosjeca

Client
Регистрация
26.10.2011
Сообщения
512
Благодарностей
455
Баллы
63
I dont quite understand why are you guys trying to make this more complicated???

Thing is simple here... all strings in his case have same structure no matter how many URL's are in every line, so only thing you need to do is extend regex a bit and that is all.


Try this Jan Paul and let me know ;-)

http://.*?(?=",".*",".*",")
 

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
Thanks, works like a charm :D
 

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
I'm humping against another regex challenge now, in the HTML below I need to get the first link of this format (domain.com/download_results.php?i=82395&mode=pageranked) after a specified domain in this format (www-domain-nl).

<tr class='row_color1'><td>82395</td>
<td>www-domain-nl.txt</td>// I need the first link after this domain text that has the structure of the link bolded below
<td>87</td>
<td>87</td>
<td>03 Feb</td>
<td>no</td>
<td>finished</td>
<td>
<table cellpadding='0' cellspacing='0'><tr>
<td>download:&nbsp;</td>
<td><a href='http://domain.com/download_results.php?i=82395&mode=all' class='download_link'>all</a></td><td>&nbsp;<a href='http://domain.com/download_results.php?i=82395&mode=pageranked' class='download_link'>pageranked</a></td></tr></table>
</td>
</tr>
<tr class='row_color2'><td>82394</td>
<td>www-domain2-nl.txt</td>
<td>42</td>
<td>42</td>
<td>03 Feb</td>
<td>no</td>
<td>finished</td>
<td>
<table cellpadding='0' cellspacing='0'><tr>
<td>download:&nbsp;</td>
<td><a href='http://domain.com/download_results.php?i=82394&mode=all' class='download_link'>all</a></td><td>&nbsp;<a href='http://domain.com/download_results.php?i=82394&mode=pageranked' class='download_link'>pageranked</a></td></tr></table>
Anyone know how to build this regex?
 

drvosjeca

Client
Регистрация
26.10.2011
Сообщения
512
Благодарностей
455
Баллы
63
I dont see a challenge here...

If this is all you have in your code than take a look again and you will see that in first case you have that "(http removed for formating purposes)" before the link and in other case you dont have it... so that makes it more simple to get what you need.

All you need in that case is: (?<=\)).*?(?=')


Now if there is more to this code and you just wanna pick after specific domain like you said, than all you need is domain and line breaks to count them in.

in that case regex like this should work: (?<=www-domain-nl[\w\W]*?\)).*?(?=')

no magic needed :D
 

JanPaul999

Client
Регистрация
27.06.2013
Сообщения
56
Благодарностей
2
Баллы
8
If this is all you have in your code than take a look again and you will see that in first case you have that "(http removed for formating purposes)" before the link and in other case you dont have it... so that makes it more simple to get what you need.

All you need in that case is: (?<=\)).*?(?=')
With "(http removed for formatting purposes)" I meant in the normal source file there is http:// there but the forum was turning that into a link so I removed the http


Now if there is more to this code and you just wanna pick after specific domain like you said, than all you need is domain and line breaks to count them in.

in that case regex like this should work: (?<=www-domain-nl[\w\W]*?\)).*?(?=')

no magic needed :D
I tried that one on the above code, but it doesn't seem to work, it doesn't grab anything.

I dont see a challenge here...
yeah it's probably really easy if you know regex well. For me it's been years since I worked with regex so I'm very rusty in that department.
 

drvosjeca

Client
Регистрация
26.10.2011
Сообщения
512
Благодарностей
455
Баллы
63
same thing, you just need to remove brackets and ad href in between... :-)

(?<=www-domain-nl[\w\W]*?<a href.*href.*)http.*?pageranked(?=')
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)