Rutube Parser

socnet

Client
Регистрация
02.12.2014
Сообщения
135
Благодарностей
119
Баллы
43
The parser includes two templates:
  1. Video description and title parser.
  2. Recommended videos parser.
Rutube Video Parser
It navigates to the video page in the browser, extracts the description, and title.

Rutube ID Parser
To kickstart, it feeds the ID of one video. Then the template parses the IDs of recommended videos.

First, you need to parse the IDs. I managed to get around 2.5 million in a week. Then, run the template for video parsing. Ideally, both templates can work together - parsing IDs will always be faster than descriptions and titles.

The data is stored in a MySQL database because storing such a huge amount of information in files is simply not optimal. To work, you will need to install the database itself and phpMyAdmin (optional, for convenience).

Installation method #1:
Download the installer from https://dev.mysql.com/downloads/installer/. For phpMyAdmin, you will need PHP and a web server (Apache or nginx). You can download ready-made LAMP bundles like Wamp, Xampp, etc.

Installation method #2:
Install Docker Desktop. Prepare the docker-compose.yml file, navigate to the directory with this file, and execute the command docker-compose up. After starting the containers, phpMyAdmin will be available at localhost:8080.

Create a table with the following structure:

118485


Create a unique index for video_id to avoid duplicates. And, of course, a primary index for auto-increment.

Naturally, indexes need to be created before populating the table with data.

The database is ready. Add one entry manually with the first ID. https://rutube.ru/video/b95b7041ad1c3dbde2730be0496ab7cb/ - extract the ID. Take any video and copy the ID. Then run Rutube ID Parser. Once at least the first 1000 are available, you can start Rutube Video Parser.
 
Тема статьи
Парсинг

Вложения

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)