Help making a google places scraper.

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
I've had zennoposter for around a month or two now. Still getting my head round everything it can do and how to go about doing things.

I'm looking for some guidance in creating a scraper to scrape google places into a spreadsheet.

If anyone has some experience with this I'd love some pointers.

Either reply to this thread or hit me up on skype. ID: grantderepas
 

gcomm

Client
Регистрация
01.03.2011
Сообщения
332
Благодарностей
93
Баллы
28
It is possible, although it is a mega parse job. Zenno can do this no problem. - Do you want to export the entire google places ? :cool: Or a specific area / business genre ?

Give an example of what you want to accomplish -
 

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
Lol entire google places might be a bit much.

Was thinking more along the lines of entering a keyword such as city+profession to output a spreadsheet of business's of X profession in a target area.

Places info could be scraped into a txt document in csv format that could then be opened in excel?
 

dusk

Client
Регистрация
07.06.2011
Сообщения
25
Благодарностей
2
Баллы
0
I was considering creating a places scraper and then wisely (I think) decided it was above my pay grade.

Good Luck!
 

crazyflx

Новичок
Регистрация
23.08.2011
Сообщения
19
Благодарностей
8
Баллы
0
EDIT: To aid in load time, I have all of the following set NOT to load: images, video, sound, run active x, load active x, popup. I then have the following set to make sure they DO load: scripts, java & frames

If you visit this URL: http://maps.google.com/maps

And search something in this format "pizza clarks summit pa" <- clarks summit being a town name & pa being "pennsylvania", then transfer the DOM source to the regex portion of ZP, the following regex returns the following data:

(?<=drg:true\,laddr:\")(.*?)(?=\")

----------------------------------- match # 0 -----------------------------------
100 Old Lackawanna Trl, Clarks Summit, PA 18411-9108 (Fiorillo's Pizza)
----------------------------------- match # 1 -----------------------------------
100 East Grove Street, Clarks Summit, PA 18411-1750 (Colarusso's Cafe)
----------------------------------- match # 2 -----------------------------------
1002 South State Street, Clarks Summit, PA 18411-2249 (Dino \x26 Francesco's Pizza-Pasta)
----------------------------------- match # 3 -----------------------------------
100 Highland Avenue, Clarks Summit, PA 18411-1571 (Basilicos Pizzeria)
----------------------------------- match # 4 -----------------------------------
900 South State Street, Clarks Summit, PA 18411-1756 (Pizza Hut)
----------------------------------- match # 5 -----------------------------------
223 Northern Boulevard, S Abington Twp, PA 18411-9304 (Bellissimo Pizzeria and Ristorante)
----------------------------------- match # 6 -----------------------------------
1121 Northern Blvd, South Abington Township, PA 18411 (Domino's Pizza)
----------------------------------- match # 7 -----------------------------------
926 Lackawanna Trl, Clarks Summit, PA 18411-9278 (Wellington's Pub \x26 Eatery)
----------------------------------- match # 8 -----------------------------------
206 Grand Avenue, Clarks Summit, PA 18411-1402 (Jimmy D's Pizza \x26 More)
----------------------------------- match # 9 -----------------------------------
919 Northern Boulevard, S Abington Twp, PA 18411-2241 (Thick N Thin Pizza)

As you can see, it grabs the first 10 results perfectly. You would obviously be able to work with each of these strings individually to do with what you want (I'm thinking replacing ( with a comma and replacing ) with nothing, then appending the string to a .txt file would make it open perfectly as a CSV within excel).
 

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
Thanks for that crazy.

I'll have a play around with that tonight :-)
 

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
As advised that (?<=drg:true\,laddr:\")(.*?)(?=\") is bringing up the full address and name of the places result. How would I modify that to include phone number?

More importantly how would I go about working out one of these parameters myself? So I don't have to come back asking silly questions :p

Also in my current template It's only taking the first result per page into the txt file. Would I have to set it up to find result (on success) append to text file and then on success back to find result and then when find next result fails go to next page?

Sorry again for all the likely dumb questions!
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
To get the phone # you would have to put in the regex: (?<=drg:true\,laddr:\")(.*?)(?=\")|(?<=sxph:\"\+1).*?(?=\")
The only problem with it is that you will get the address on one match line and the phone# on another. If you want to save all the results on the page, in your step branch you will have to put the 'all' modifier instead of '0'. This will pull all of the matches. You could make a counter loop and save match 0:1 in the first loop, 2:3 in the second...etc.

Here's a template to try out that does exactly that. You will need a Gplaces.txt file in your Resources folder with your keywords that you want to search for in it.

Посмотреть вложение Gplaces.xml

I'm also doing a video on this because I get a lot of questions about scraping pages and using regular expressions. I will show you how crazyflx came up with his regular expression to find the names and addresses. I'll have it uploaded tomorrow.
 
  • Спасибо
Реакции: Dr_Scythe

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
Very much looking forward to that video.

Cheers for the help!
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
Here's the video on Youtube that shows you how to parse a page with multiple regular expressions.

Код:
http://www.youtube.com/watch?v=OFdd91R4L9o
 
  • Спасибо
Реакции: Dr_Scythe

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Благодарностей
0
Баллы
0
Thanks heaps for that video.

Has really helped me start scraping successfully.

Got my google maps scraper working well.

Has also helped me build a personal scraper to get all my latest music download links for me :D
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
Glad to help.
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)