Need help with RegEx

nitin2003

Client
Регистрация
18.09.2012
Сообщения
195
Благодарностей
11
Баллы
18
I'm trying to search for phone numbers. Problem is that phone numbers are in varied formats.
e.g.
+852.25301793
+1.4165385457
1 (800) 745-9229
+44 (0) 203 206 2220
+1.801-572-0021
+358.358405860863
650-316-7640
+61.292793305
(480) 624-2599
+1.6506234000

I'm trying to create a regex like this :
[+0-9./-/(/) ]+

But, in addition to correct results it still gives me some results like this:
+1.801
572
0021
650
etc.

So, I want to limit the size to something > 7 or 8.

So, I tried something like this
[/+0-9./-/(/) ]+{7,}$

But it doesn't seem to be working fine.

Any help guys?

Thanks
Nitin
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
\+[0-9].*|\([0-9].*|[0-9].*[0-9]

Depends a lot on what is in front and behind the phone numbers also on the page.
 
  • Спасибо
Реакции: nitin2003

nitin2003

Client
Регистрация
18.09.2012
Сообщения
195
Благодарностей
11
Баллы
18
Part of the page text is :
Код:
<form method="post" name="purchaseForm" id="purchaseForm" action="http://shop.whois.com/domain-registration/index.php?action=check_availability&amp;formaction=domain-name-registration.php">
      <input name="domainaction" value="register" type="hidden">
      <input name="dom_action" value="register" type="hidden">
      <input id="purchaseDomains" name="txtDomainName" value="" type="hidden">
      </form>
      
              <div id="registryBlk">
          <table class="whois_heading" border="0" cellpadding="0" cellspacing="0">
          <tbody><tr>
            <td><h2>hello.com registry whois</h2></td>
            <td class="whois_update">Updated <span id="registryDataAge">3 hours</span> ago - <span id="refreshLink"><a href="javascript:refreshWhois();">Refresh</a></span><span id="refreshStatus"></span></td>
          </tr>
          </tbody></table>
          <div class="whois_result" id="registryData">Domain Name: HELLO.COM<br>Registrar: MARKMONITOR INC.<br>Whois Server: whois.markmonitor.com<br>Referral URL: http://www.markmonitor.com<br>Name Server: NS1.GOOGLE.COM<br>Name Server: NS2.GOOGLE.COM<br>Name Server: NS3.GOOGLE.COM<br>Name Server: NS4.GOOGLE.COM<br>Status: clientDeleteProhibited<br>Status: clientTransferProhibited<br>Status: clientUpdateProhibited<br>Status: serverDeleteProhibited<br>Status: serverTransferProhibited<br>Status: serverUpdateProhibited<br>Updated Date: 30-mar-2013<br>Creation Date: 30-apr-1997<br>Expiration Date: 01-may-2014<br></div>
        </div>
        
        <div id="registrarBlk" style="display: block">
          <table class="whois_heading" border="0" cellpadding="0" cellspacing="0">
          <tbody><tr>
            <td><h2>hello.com registrar whois</h2></td>
            <td class="whois_update">Updated <span id="registrarDataAge">3 hours</span> ago</td>
          </tr>
          </tbody></table>
          <div class="whois_result" id="registrarData">MarkMonitor is the Global Leader in Online Brand Protection.<br><br>MarkMonitor Domain Management(TM)<br>MarkMonitor Brand Protection(TM)<br>MarkMonitor AntiPiracy(TM)<br>MarkMonitor AntiFraud(TM)<br>Professional and Managed Services<br><br>Visit MarkMonitor at www.markmonitor.com<br>Contact us at 1 (800) 745-9229<br>In Europe, at +44 (0) 203 206 2220<br><br>The Data in MarkMonitor.com's WHOIS database is provided by MarkMonitor.com<br>for information purposes, and to assist persons in obtaining information<br>about or related to a domain name registration record.&nbsp;&nbsp;MarkMonitor.com<br>does not guarantee its accuracy.&nbsp;&nbsp;By submitting a WHOIS query, you agree<br>that you will use this Data only for lawful purposes and that, under no<br>circumstances will you use this Data to: (1) allow, enable, or otherwise<br>support the transmission of mass unsolicited, commercial advertising or<br>solicitations via e-mail (spam); or&nbsp;&nbsp;(2) enable high volume, automated,<br>electronic processes that apply to MarkMonitor.com (or its systems).<br>MarkMonitor.com reserves the right to modify these terms at any time.<br>By submitting this query, you agree to abide by this policy.<br><br>Registrant:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DNS Admin<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Google Inc.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1600 Amphitheatre Parkway<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mountain View CA 94043<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;US<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="/eimg/4/bd/4bdbb992077193a4d9911876b53a4825c65bfd30.png" class="whois_email">@google.com +1.6506234000 Fax: +1.6506188571<br><br>&nbsp;&nbsp;&nbsp;&nbsp;Domain Name: hello.com<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Registrar Name: Markmonitor.com<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Registrar Whois: whois.markmonitor.com<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Registrar Homepage: http://www.markmonitor.com<br><br>&nbsp;&nbsp;&nbsp;&nbsp;Administrative Contact:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DNS Admin<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Google Inc.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1600 Amphitheatre Parkway<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mountain View CA 94043<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;US<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="/eimg/4/bd/4bdbb992077193a4d9911876b53a4825c65bfd30.png" class="whois_email">@google.com +1.6506234000 Fax: +1.6506188571<br>&nbsp;&nbsp;&nbsp;&nbsp;Technical Contact, Zone Contact:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DNS Admin<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Google Inc.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1600 Amphitheatre Parkway<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Mountain View CA 94043<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;US<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="/eimg/4/bd/4bdbb992077193a4d9911876b53a4825c65bfd30.png" class="whois_email">@google.com +1.6506234000 Fax: +1.6506188571<br><br>&nbsp;&nbsp;&nbsp;&nbsp;Created on..............: 1997-04-30.<br>&nbsp;&nbsp;&nbsp;&nbsp;Expires on..............: 2014-04-30.<br>&nbsp;&nbsp;&nbsp;&nbsp;Record last updated on..: 2013-03-30.<br><br>&nbsp;&nbsp;&nbsp;&nbsp;Domain servers in listed order:<br><br>&nbsp;&nbsp;&nbsp;&nbsp;ns3.google.com<br>&nbsp;&nbsp;&nbsp;&nbsp;ns2.google.com<br>&nbsp;&nbsp;&nbsp;&nbsp;ns1.google.com<br>&nbsp;&nbsp;&nbsp;&nbsp;ns4.google.com<br>&nbsp;&nbsp;&nbsp;&nbsp;<br><br><br><br>MarkMonitor is the Global Leader in Online Brand Protection.<br><br>MarkMonitor Domain Management(TM)<br>MarkMonitor Brand Protection(TM)<br>MarkMonitor AntiPiracy(TM)<br>MarkMonitor AntiFraud(TM)<br>Professional and Managed Services<br><br>Visit MarkMonitor at www.markmonitor.com<br>Contact us at 1 (800) 745-9229<br>In Europe, at +44 (0) 203 206 2220<br><br><br></div>
        </div>
        
                <div id="xdomainsBlk">
          <h2 class="whois_heading">related domain names</h2>
          <div class="whois_xdomain_list">
                        <div class="whois_xdomain_item">
              <a href="/whois/markmonitor.com">markmonitor.com</a>
            </div>
                        <div class="whois_xdomain_item">
              <a href="/whois/google.com">google.com</a>
            </div>
                      </div>
        </div>
                    
</div>
I am looking to scrape the phone numbers. The regex given by you scrapes alphabets too. Not sure where to make the changes. Can someone please help?

Thanks
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
In this case Nitin, you need to do a scrape for the div that has only the info you want and then out of that scrape the phone number. I came up with for the first regex to get the line the phone number is on...

(?<=id="registrarData">).*?(?=</div>)

And then to get the phone number from that line...

(?<=<br>.*?)(\d+|\+\d+).*?\d+(?=<br>)

Unfortunately, you will have to get the first match for this and not take 'All' or you will end up with other crap in there that you don't want. I'm at work and just took a glance at it, but if you need more than that, I will study it further at home and see what comes up.

Template included:

Посмотреть вложение whoIsPhone.xmlz
 

nevadahsot

Client
Регистрация
27.04.2012
Сообщения
106
Благодарностей
5
Баллы
18

Вложения

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)