wood burning stoves 2.0
The moose likes Beginning Java and the fly likes How to calculate occurrences in a Multimap? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "How to calculate occurrences in a Multimap?" Watch "How to calculate occurrences in a Multimap?" New topic

How to calculate occurrences in a Multimap?

michael stoker
Ranch Hand

Joined: Jul 15, 2010
Posts: 39
Hi everyone,

I'm new to Java and i'm facing some problems with Multimap tools.
I've got a text file that i parsed to collect some datas. In every single line, i collect the sequence Id, the gene names and its corresponding alleles and optionnaly the comments about the sequences (if there's one).

The aim of my work is first to sort the alleles out according to their corresponding genes (which is easy with the MultiMap by taking as a parameter an Arraylist which contains the list of all the alleles).

There is an example of my text file (i just made it more simple so its easier to understand):

So i need to get something like that :

which i could do it.

The problem starts here: For every different allele, i need to calculate:
-the number of the total sequences in which the allele appears
-the number of the redundant sequences
-the number of the non-redundants sequences
-the number of the sequences which contain a comment

To sum up, when i finished to read the file, i need to be able to say for every allele, how many sequences are associated to this allele and among those sequences, i need to be able to say how many are redundant and how many are not as well as how many contain a comment.
All this, while keeping the order defined first, which means the alleles sorted out according to their corresponding genes.

For example, for the allele 1 of the Gene A, i need to get as an output, something like this:

What would you propose as solutions to my problem please?

Any help will be really appreciated.

Ps: sorry for my english, i'm french
Martin Vanyavchich
Ranch Hand

Joined: Sep 16, 2008
Posts: 241
I don't know how many of these files you have and how many genes and sequences we're talking about. I would put all this data into a SQL database and do the sorting and counting there. Btw. how do you know if an allele is redundant or not?

... oh and about you beeing French, I totaly forgive you

I no good English.
michael stoker
Ranch Hand

Joined: Jul 15, 2010
Posts: 39
Hi Martin and thank you very much for your reply.

Actually i get this file from another program after submitting a request so the file changes all the time in its content but the format stays the same no matter what so thats why i prefered to make the treatments directly on the file without using any database as i wont be able to get all the datas before anyway.
About the redundancy of the sequences (not the alleles), its easy (well in theory). If you have 2 lines with the same allele AND the same sequence, then the sequence is redundant for this allele.

oh and thanks for forgiving me.... its really not easy to be french everyday lol, just kidding.
I agree. Here's the link: http://aspose.com/file-tools
subject: How to calculate occurrences in a Multimap?
It's not a secret anymore!