For the first, you should load solved catpcha images basing on which you will train your module. There are two ways to do it:
1) Load only captcha images and solve them via recognition services… For it you should enter login and password for one of the services in program settings and select recognition results. In this case we recommended to recognize captchas by groups. Collection for catching symbols can be recognized in standard way and collections for training and testing should be recognized with 100% accuracy. It can be achived by sending captcha images to several people at the same time, some services, for example Anti-Gate, provide this facility.
2) Create project for ZennoPoster which allows to download and solve multiple captcha images. As a result you should get captcha images with their text in text file with same name.
Or captcha images should be stored with their text value as file name, i.e. if captcha text is “qwe”, captcha image should have “qwe.jpg” file name. The program accepts such images as well.
If you create module for simple captcha with low symbol distortion or without distortion, 300 images will be enough. For complex captchas - 1000 images required. All these captchas should be recognized by services, which will cost you from several cents till a couple of dollars. But you can count how many captcha images are needed exactly and save few tens of cents.
Captcha images will be sorted into collections automatically after loading. But you can do it manually. Re-sorting is not possible, so, if you are not sure how to sort images into groups, use automatic option.