Need help capturing table cells

djljzenno

Client
Регистрация
26.12.2013
Сообщения
43
Благодарностей
2
Баллы
8
Hello,

I am trying to capture the text within the table and place into csv file.

Example table..

https://en.m.wikipedia.org/wiki/List_of_Billboard_Hot_100_top_10_singles_in_2007

Text needed...

Single,Artist(s)
songtitle1,artist1
songtitle2,artist2
songtitle3,artist3
and so on...

For the artist field I only need before the "featuring" or "and" text if present. Not the entire line.

I will then collect from all the years with this.

Someone has helped me before using C# and regex. I could not modify to make it work for this again as the page has changed.

Thank you!

LJ
 

Tobbe

Client
Регистрация
01.08.2013
Сообщения
428
Благодарностей
148
Баллы
43
Make a regex to grab everything between <tr> and </tr> in the regex designer and put it to a list.
Now each item in the list would look like this.
Код:
<td>November 11</td>
<td>"<a href="/wiki/Fergalicious" title="Fergalicious">Fergalicious</a>"</td>
<td><a href="/wiki/Fergie_(singer)" title="Fergie (singer)">Fergie</a> featuring <a href="/wiki/Will.i.am" title="Will.i.am">will.i.am</a></td>
<td align="center">2</td>
<td>January 13</td>
<td align="center">14</td>
Take the lines one at the time and use regex again to grab the columns you need and put all info into a new line in another table/list.
When the first list is empty you've processed the whole table. Probably the easiest way to do it yourself beside the other tips.
http://zennolab.com/discussion/threads/trying-to-extract-table-data.15825/
http://zennolab.com/discussion/threads/regex-extraction-help-please.18283/
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)