[olug] sort -u vs uniq

Lou Duchez lou at paprikash.com
Mon Mar 13 10:36:30 CDT 2017


This page might have some useful information:

http://unix.stackexchange.com/questions/75341/specify-the-sort-order-with-lc-collate-so-lowercase-is-before-uppercase

As to what you experienced, I know I was once surprised to see a 
PostgreSQL statement sort data differently between a Windows server and 
a Linux server -- it's a vague memory, but I think Linux was evaluating 
sort order by looking for a numeric component that precedes the rest of 
the string (so "2beornottobe" was sorting before 
"1234imdeclaringathumbwar" because "2" is less than "1234").  Is that 
what Linux is doing for you?  With a string like "108.78.42.145", maybe 
Linux sees that as "108.78" followed by ".42.145".

Can you foil this nefarious behavior by sorting by a non-numeric 
character prefixed to the IP addresses, somehow?  I bet not even "C" can 
mis-sort "A108.78.42.145" and "A69.38.74.12".


> I'm trying to get a list of uniq IP addresses from a log file. I have 
> a list of ALL IP addresses. Using sort -nu and sort -n | uniq give me 
> 2 different lists.
>
> A stare and compare make me think that sort -nu  only considers the 
> first 2 octets as significant. RTFM of the sort man page indicates 
> sort honors LC_COLLATE.
>
> <appear uninformed>
> LC_COLLATE isn't in env, so I'm assuming it's set at build/compile 
> time when building sort or in the c libraries someplace?
> </appear uninformed -- hardly, stupid probably better tag... and not 
> closed.>
>
> Could this be why the sort -u and uniq return differing output? I 
> don't see anyplace to specify "how much" to consider significant when 
> running sort. Anyone care to offer thoughts?
>
> Thanks.
>
>
> Noel
>
> _______________________________________________
> OLUG mailing list
> OLUG at olug.org
> https://lists.olug.org/mailman/listinfo/olug


More information about the OLUG mailing list