What's a good recommended way to compare lists and remove duplicate items?

kveldulv · 23.06.2018

You can call grep from the command line, or within scripts.

Код:

Grep
grep -Fxvf blacklist.txt uniq.txt >> uniq-new.txt
grep -F -x -v -f fileB fileA >> fileC

Awk
# this matches whole lines
awk 'FNR==NR {hash[$0]; next} $0 in hash' filter.txt data.txt > matching.txt
awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > not_matching.txt

# remove all the lines that appear in file B from the file A.
awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA

Grapidly · 16.10.2018

nycdude сказал(а):
Hello all,

I'm scraping links and data and sometimes I'll scrape the same links again I want to ignore. I have a running txt file as the history of all links to make the comparison.

I know I can loop through the lists to make the comparisons and save or delete but it just seems like too much. Is there a simpler and faster way?

Thanks.

Did you figure out the best way to do this? To save time, I would think to just remove DUPs at the end of project in list processing function.

Curious to know when you ended up using.

VladZen · 17.10.2018

Check example here - https://zennolab.com/discussion/threads/simple-example-actions.6628/#post-41758

Grapidly · 17.10.2018

VladZen сказал(а):
Check example here - https://zennolab.com/discussion/threads/simple-example-actions.6628/#post-41758

Awesome, thanks!

Поиск

What's a good recommended way to compare lists and remove duplicate items?

nycdude

Пользователь

kveldulv

Client

Grapidly

Новичок

VladZen

Administrator

Grapidly

Новичок

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)