Tuesday, 19 November 2019

My understanding of sorting in Postgre

While sorting in Postgre, Collation is taken into account.

SELECT name FROM unnest(ARRAY[
    'a', 'B', 'c', 'D'
]) name ORDER BY name COLLATE "en_US";

It returns 

a
B
c
D

So, with collate "en_US", we can observe that first comes a, then B, then c and then D.

When we try this

SELECT name FROM unnest(ARRAY[
    'apple', 'BANANA', 'carrot', 'ORANGE'
]) name ORDER BY name COLLATE "C";

It returns

B
D
a
c

So, with collate "C", We can observe that all words which start with upper case comes first then the lower case ones.

Some points to be noted:

1) If we don't pass the collation while creating the DB, the collation of the operating system is taken into account which can be for example, "en_US.UTF-8".

2) As recommended in reference link 1, it is good to go with Collation "C" as it delivers consistent results in a platform-independent manner.

References

1) https://simply.name/pg-lc-collate.html

2) https://docs.oracle.com/javadb/10.10.1.2/devguide/cdevcollationexamples.html (Oracle-specific but concepts are applicable to Postgre as well)