Text Comparaison/Difference Finder

BenP6

Новичок
Регистрация
08.07.2019
Сообщения
7
Благодарностей
0
Баллы
1
Hello,

I am scrapping websites through their sitemaps. To keep updated version of my databases containing these websites content I need to "compare" files containing the sitemaps.

When I say "compare" I mean that I need to find the differences between those two files. Because these differences will contain the new URLs/pages I'll have to scrape.
Thus, the script will find for example 100 URLs in a 6 million URLs file which aren't already in the previous file stored. This means it's new pages to scrape. Then, it will send those new URLs to a variable for the 'scrapping script to process.

However, I haven't find any feature which could handle that in Zenno text processing.
How can I manage do to it?

Thanks in advance.
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
524
Благодарностей
112
Баллы
43

BenP6

Новичок
Регистрация
08.07.2019
Сообщения
7
Благодарностей
0
Баллы
1
Thanks, however I don't see how this is of any help. The code you sent me in this thread is to process large lists. Even though my lists are large too, my main trouble is to compare them. To return the differences between two large URLs lists in a variable.
I didn't find a solution through Zenno text processing features. Maybe there is a trick or something to do in C# or Java? But I do not know how to implement this kind of code since I know nothing about programming.
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
524
Благодарностей
112
Баллы
43
This...
Код:
var l1 = project.Lists["List1"];
var l2 = project.Lists["List2"];
List <string> l = new List <string> ();
l.AddRange(l1);
l.AddRange(l2);
l = l.Distinct().ToList();
l1.Clear();
l1.AddRange(l);
With this C# code, your first list will have only distinct elements between List1 and List2.
 
Последнее редактирование:

BenP6

Новичок
Регистрация
08.07.2019
Сообщения
7
Благодарностей
0
Баллы
1
This...
Код:
var l1 = project.Lists["List1"];
var l2 = project.Lists["List2"];
List <string> l = new List <string> ();
l.AddRange(l1);
l.AddRange(l2);
l = l.Distinct().ToList();
l1.Clear();
l1.AddRange(l);
With this C# code, your first list will have only distinct elements between List1 and List2.
Thanks, I tried the code but unfortunately it did not work. The code processed good, returned a valid message. But did not returned the differences between the two lists. See here: https://gyazo.com/d8fa2b224035c2e09de767a64ea36c30 only returned "ok".
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
524
Благодарностей
112
Баллы
43
That code put in list1 the exclusive values.
Well, I think this is your question in the first post.
You need to know diferences between two lists?
 

BenP6

Новичок
Регистрация
08.07.2019
Сообщения
7
Благодарностей
0
Баллы
1
That code put in list1 the exclusive values.
Well, I think this is your question in the first post.
You need to know diferences between two lists?
Yes.
For example:
List1 contains: paris ; new york ; miami ; shanghai
List2 contains: paris ; tokyo ; new york ; miami ; london ; shanghai
Here the script will need to return "tokyo ; london" in a variable. Since those two data inputs are the only differences between the two lists.
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
524
Благодарностей
112
Баллы
43
Код:
var l1 = project.Lists["List1"];
var l2 = project.Lists["List2"];
List <string> l = new List <string> ();
List <string> unique = new List <string> ();
l.AddRange(l1);
l.AddRange(l2);
l1.Clear();
var q = from x in l
  group x by x into g
  let count = g.Count()
  orderby count descending
  select new {Value = g.Key, Count = count};
foreach (var x in q){
  if(x.Count < 2){
     l1.Add(x.Value);
   }
}
 

BenP6

Новичок
Регистрация
08.07.2019
Сообщения
7
Благодарностей
0
Баллы
1
Код:
var l1 = project.Lists["List1"];
var l2 = project.Lists["List2"];
List <string> l = new List <string> ();
List <string> unique = new List <string> ();
l.AddRange(l1);
l.AddRange(l2);
l1.Clear();
var q = from x in l
  group x by x into g
  let count = g.Count()
  orderby count descending
  select new {Value = g.Key, Count = count};
foreach (var x in q){
  if(x.Count < 2){
     l1.Add(x.Value);
   }
}
Still returning the value "ok" in the variable instead of the differences between lists.
I really have to return the differences between lists in an other list or variable.

Here a GIF video of the script failing (returning "ok"): https://gyazo.com/cd39a711928cdc1e22176ce2233afdfa
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
524
Благодарностей
112
Баллы
43
Uncheck box return value to variable, this is not important.
List 1 have exclusive values.
If you want to put new values in to a new list create a new list and change l1.add to your new var.
If you need values in variable, change for each.
If you don't know c#, consider learn.
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)