Automate everything

User Tools

Site Tools


Translations of this page:


6. Fixing errors and making improvements.

After training module, you should test it. Why it is necessary?
Module test will help you find out its recognition rate and speed, as well as check recognition results, i.e. what text for which captcha it outputs.
This information is very important, knowing it you will be able to improve recognition and speed up its work.

Test settings

You can customize test settings and perform test several times, it also affects recognition rate. Adjusting settings you will be able to change recognition rate. In this case the core doesn't need to be retrained, you should just save the module with testing results.

The following parameters can be customized for testing:

1) Threads count
2) Threshold filter value
3) Min distance between symbols - very important parameter! Try increasing/decreasing it if you get extra/missing symbols in recognition results.
4) Comparison type: full matching - recognition result is checked for full matching with captcha text. Partial matching or substring matching can be selected if website with captcha accepts partial matching for captcha answer. Then you can get the module to count correct answers, which are actually only partially correct.
5) The value of range if you use the substring as the type of comparison. The number of correctly recognized symbols in a row, which we consider as correct answer to captcha. 6) You may try enabling/disabling quick recognition. It also may affect recognition result.

Remember types of recognition error

Before proceeding to captcha recognition errors, let's remember types of recognition error:

1) Wrong recognition - when symbols exists, but recognized wrongly. For example, you get “c” instead of “a”.
2) Missed symbol - when symbols exists, but module does not see any symbol. In other words, we get nothing instead of “a”.
3) False recognition - when symbol doesn't exist, for example between two symbols, but module finds something there.

How to improve recognition rate

The main idea on how to improve recognition rate is to balance these three types of recognition errors. I.e. ideally, when you get incorrect recognition, your module should be able to do the following things:

1) Replace correct symbol with wrong one. For example, answer should be “cagtcha” instead of “captcha”.
2) Not see symbol when it actually exists. For example, answer should be “cptcha” instead of “captcha”.
3) Output extra symbol in captcha text. For example, answer should be “camptcha” instead of “captcha”.
And each error should occur approximately the same number of times.

Common errors and how to avoid them

Number of symbols in answer for catpcha is to little or even zero.
1) Despite that, training was succesfull, i.e. the green line was near the maximum, and the red and yellow dropped to zero.
*Probably, you have set to high value for min distance between centers parameter. In this case, just few symbols will be correcly recognized, i.e. the answer for captcha «amcaptchatext» will be something like «aathet».
*Perhaps you have incorrectly applied filters or set mass centers. You created the module which is trained well to recognize symbols, but for some reasons doesn't see them in captcha or they look differently from collected samples. For example, in filters you have changed captcha size, but forgot to apply it to symbols. As a result, module trained to recognize small characters in captcha, but tested on large symbols.
*Perhaps, mass centers are not where you set them when collected symbols. If it is the case, you can check it at mass centers test clicking on captcha where mass centers are set. There won't be a response, but mass centers may be located below or above it. It is possible to adjust mass centers without module retraining - you should increase mass center variation parameter and points quantity, so that they would cover the area where your module sees symbols. Or else you can train your module again increasing symbol centers variation parameter instead of (and) mass center variation.
2) While training, green line was located close to yellow line. Red line dropped almost to zero. To fix this problem see Training module paragraph.

Number of symbols in answer to captcha is correct, but they all (or most) wrong.
If module training was succesfull, but you get this problem, it may be false recognition error (3). In this case, at mass center test, you will get correct response clicking symbol centers, but clicking mass center which located close to them you will get a lot of false symbols. This can be fixed retraining your module with increased parameter for false recognition.

Too many symbols in answer to captcha and all incorrect. Some symbols repeated.
The same problem as above, but “min distance between symbols parameter” is too small.

Recognition result contains correct answer with extra symbols.
1) “Min distance between symbols” parameter value is too low.
2) Too many recognitio errors of type (3). See how to decrease their number in Module training paragraph.
3) Too low recognition threshold.
4) Combination of 1,2 and 3.

en/addons/capmonster/learning/recognition-testing.txt · Last modified: 2015/07/14 15:51 (external edit)