Regex String Replace Doesn't Work After a Certain Amount of Char?

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
I was working with the string replace with regex and found an interesting bug.
The template removes black listed urls from an original file (blk.txt)
Everything works like a peach until the long urls show up, they can't be parsed out :(
Plus the line which comes immediately after it.

blk.txt
Код:
http://thedomainxxxxxx.org/wp-content/themes/socrates/style.css
http://thedomainxxxxxx.org/wp-content/themes/socrates/css/styleRightSide300.css
http://thedomainxxxxxx.org/wp-content/plugins/seo-pressor/templates/css/styles.css?ver=3.3.2
http://thedomainxxxxxx.org/xmlrpc.php?rsd
http://thedomainxxxxxx.org/wp-includes/wlwmanifest.xml
http://thedomainxxxxxx.org/
http://thedomainxxxxxx.org/fish-oil-dosage
http://thedomainxxxxxx.org/xtend-life-scam
http://thedomainxxxxxx.org/feed
http://thedomainxxxxxx.org/feed/rss
http://thedomainxxxxxx.org/feed/atom
http://thedomainxxxxxx.org/comments/feed
http://thedomainxxxxxx.org/xtend-life-scam
http://thedomainxxxxxx.org/fantastic-asthma-ideas-that-can-certainly-help-you
http://thedomainxxxxxx.org/zsa-zsa-cream-a-revolutionary-anti-aging-creme
http://thedomainxxxxxx.org/perhaps-you-have-thought-about-taking-a-look-at-the-turbulence-training-system-you-might-want-to-check-this-out-first
http://thedomainxxxxxx.org/ways-to-havehappier-smiles
http://thedomainxxxxxx.org/the-blackhead-blitz-program-top-product-high-conversions
http://thedomainxxxxxx.org/category/dha
http://thedomainxxxxxx.org/category/fish-oil
http://thedomainxxxxxx.org/category/health-and-fitness
http://thedomainxxxxxx.org/category/xtend-life-2
http://thedomainxxxxxx.org/category/dha
http://thedomainxxxxxx.org/category/fish-oil
http://thedomainxxxxxx.org/category/health-and-fitness
http://thedomainxxxxxx.org/category/xtend-life-2
http://thedomainxxxxxx.org/fantastic-asthma-ideas-that-can-certainly-help-you
http://thedomainxxxxxx.org/zsa-zsa-cream-a-revolutionary-anti-aging-creme
http://thedomainxxxxxx.org/perhaps-you-have-thought-about-taking-a-look-at-the-turbulence-training-system-you-might-want-to-check-this-out-first
http://thedomainxxxxxx.org/ways-to-havehappier-smiles
http://thedomainxxxxxx.org/the-blackhead-blitz-program-top-product-high-conversions
http://thedomainxxxxxx.org
http://thedomainxxxxxx.org
http://thedomainxxxxxx.org/xtend-life-scam
http://thedomainxxxxxx.org/fish-oil-dosage
http://thedomainxxxxxx.org/fish-oil-benefits
http://thedomainxxxxxx.org/fish-oil-side-effects
http://thedomainxxxxxx.org/fish-oil-review
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://thedomainxxxxxx.org/fishoil.php
http://dan.xtend-life.com/product/Omega_3_DHA_Fish_Oil.aspx?id=100111
http://dan.xtend-life.com/product/Omega_3_DHA_Fish_Oil.aspx?id=100111
http://www.socratestheme.com/cb/go.php?id=something
http://thedomainxxxxxx.org/about
http://thedomainxxxxxx.org/contact
http://thedomainxxxxxx.org/disclaimer
http://thedomainxxxxxx.org/privacy-policy
http://thedomainxxxxxx.org/terms-of-use
Problematic urls
Код:
http://thedomainxxxxxx.org/wp-content/plugins/seo-pressor/templates/css/styles.css?ver=3.3.2
http://thedomainxxxxxx.org/xmlrpc.php?rsd
http://dan.xtend-life.com/product/Omega_3_DHA_Fish_Oil.aspx?id=100111
http://dan.xtend-life.com/product/Omega_3_DHA_Fish_Oil.aspx?id=100111
http://www.socratestheme.com/cb/go.php?id=something
Also I've attached the template as well.

Question is there a limit to the regex lenght, or the long URL is the culprit?
 

Вложения

  • 1,8 КБ Просмотры: 347
Регистрация
26.03.2012
Сообщения
44
Благодарностей
6
Баллы
0
I haven't looked at the template yet, but the first thing I notice is that those are not the longest URLs. For example:

Код:
http://thedomainxxxxxx.org/perhaps-you-have-thought-about-taking-a-look-at-the-turbulence-training-system-you-might-want-to-check-this-out-first
Strikes me as the longest URL. However, a quick glance at the problem URLs and it looks like they all have a question mark (?) in them. And I think (unless I missed one) those are the only ones in our list with question marks. Which makes me wonder if that could be causing your problem. I don't think it can be a length issue since there are longer ones you're not having issue with - but perhaps something in the regex is not agreeing with the question marks?
 
Регистрация
26.03.2012
Сообщения
44
Благодарностей
6
Баллы
0
I believe I fixed it for you. I'll attach it here. Посмотреть вложение ish-2.xml

The problem was not an issue with Regex but rather you weren't actually using Regex to identify the string you were replacing. It looks like you were just feeding in an exact string to be replaced rather than using any Regex code, and since that string contained characters with different functions within a Regex code it threw off the Regex macro.

That's my take on it anyway.

Instead of the RegExp.Replace macro I replaced it with the String.Replace macro and tested it with this URL:

Код:
http://thedomainxxxxxx.org/wp-content/plugins/seo-pressor/templates/css/styles.css?ver=3.3.2
It seems to have worked.

Also, just for the sake of consistency and so that I could test the fix more easily, I changed the last step. You had set the template to determine the location of the text file to read from based on the location of the template itself - but then it had an exact location set to save the file to. I went ahead and changed that to be based on the location of the template as well.

I actually hadn't even noticed that option was available before so I learned something useful from that :D
 
  • Спасибо
Реакции: reislet

reislet

Client
Регистрация
02.04.2012
Сообщения
33
Благодарностей
1
Баллы
0
Hey thanks for the fast response, it seems it was a mutual learning experience, I didn't know that String.Replace can do that.
Yeah I've tested it and it works fine with the latest method.

BTW project.directory is great if you are adapting your template for multiple uses, and you can save a lot of time if you are using files.

Thanks again.
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)