Case-Sensitive sorting using Collator

Ravindranath Chowdary
Joined: Nov 08, 2006
Hi Ranchers,
This is regarding the sorting issue I am facing with java.text.Collator class.
I am using java.text.Collator to sort the Strings which can have English/Non-English characters.
If I pass list with two values {'a' , 'A'}, the output for me should be {'A', 'a'}.

I tried by setting strength as PRIMARY, SECONDARY, TERITIARY, IDENTICAL but it has not worked for me.
Also, I have tried RuleBasedCollator where I can define which character should come after the other like A < B < ....Z < a < b <c...; But this option force me to provide the rules for all the languages that we support. So, I cannot use this option.

Sampe code:
import java.util.*;
import java.text.Collator;

class Test {
public static void main(String[] args) {
ArrayList><String> list = new ArrayList<String>();

Collections.sort(list, Collator.getInstance());


Output: {a, A}
Expected Output: {A, a}

Could some one suggest me how to get case-sensitive sorting using Collator.

Christophe Verré

Joined: Nov 24, 2005
Posts: 14688

Please UseCodeTags the next time you post some code.

David Newton

Joined: Sep 29, 2008
Posts: 12617

I'm not sure there's an easy way to do this.
Mike Simmons
Joined: Mar 05, 2008
For the Collator I get from Collator.getInstance(), the order is case-sensitive. It just doesn't match the order you seem to want - it puts uppercase after lowercase, rather than before. However I'm not sure about that, since you don't really specify enough to tell us what you actually want. 'A' should be before 'a', OK. What about 'B' vs 'a'? Or 'Ä' vs. 'a'? I can't tell from your example what you think the order should be.

One possible solution is to simply reverse the case of all the characters before you sort the list, then reverse again after you sort. E.g.

If that doesn't work for you, you probably need to give us more info on how the results are different from what you expect.
Mike Simmons
Joined: Mar 05, 2008
It would probably be more correct to use code points rather than chars, given the limitations of 16-bit chars for Unicode. But this should be close enough to give the idea.
