Saturday 29 August 2015

One liner to remove duplicate entries using awk

Hello Everyone,

Today we will discuss about one liner for removing
Sometime, we have a situation where we have the file with lots of repetitive or duplicate entries. You need to remove all those entries or count the number of those repetitive elements.

Let's suppose we have a file with below content,

my_temp_file:

this
is
a
test
file
a
file
with
repetitive
content.
Ways
to
remove
duplicate
duplicate
lines

Now, to remove the duplicate entries, use the below awk one liner:

$ awk '!a[$0]++' my_temp_file
this
is
a
test
file
with
repetitive
content.
Ways
to
remove
duplicate
lines


Now, if you don't bother about the order, you can use sort and uniq in below manner:

$ cat one-liners/my_temp_file | sort | uniq
a
content.
duplicate
file
is
lines
remove
repetitive
test
this
to
Ways
with

It will do the same, but don't preserve the order of file.

You can watch the YouTube video here:



Thank you

Add your comments for doubts.

No comments:

Post a Comment