File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Beginning Java and the fly likes Help with string - array Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Help with string - array" Watch "Help with string - array" New topic

Help with string - array

P Derlyuk
Ranch Hand

Joined: Feb 17, 2013
Posts: 33
How do i separate a string like "ATGCCACTATGGTAG" into an array like [ATG, CCA, CTA, TGG, TAG]?
Any help would be great!
Wayan Saryada
Ranch Hand

Joined: Feb 05, 2004
Posts: 105


You could split the string by doing a loop. Take a substring that contains three characters each. Add this three-characters substring into a List. When all characters are read you can convert the List into an array.

You might also want to try to use a regular expression to do the split using the String.split() method.

Website: Learn Java by Examples
P Derlyuk
Ranch Hand

Joined: Feb 17, 2013
Posts: 33
I did something along those lines.

Henry Wong

Joined: Sep 28, 2004
Posts: 20521

P Derlyuk wrote:I did something along those lines.

You (1) used a for loop and the substring() method to get the three letter components, so that you can (2) build a string llst of components separated by a space, so that you (3) can then call split to get an array of the components ??? Would it not have been easier to just use a for loop and the substring() method to get the three letter components ?


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46348
There is a problem with substring and very long Strings, which may cause you problems if your DNA represents a whole organism, even one as simple as Caenorhabditis elegans. If you simply use substring, the backing array for the long String is preserved, which can be unnecessarily expensive on memory. You can sort that out by using
... new String(dna.substring(0, 3));
That problem may not occur if you only use Java7+

There is another problem which will occur if you use the + operator on Strings repeatedly: memory filled up. Every use of + is associated with creation of several Objects, and after a few thousand this starts to exhaust your memory. Garbage collection will retrieve that memory, but you can watch your program become slower and slower. 10000 bp: you can see the delay. 1000000 bp: you can leave the program to chunter away to itself while you have dinner.
Suggested solution: put the String into a StringBuilder (←link) whose length is dna.length() + dna.length() / 3 (you cannot do this for Strings ≥ 1610612736 bp because of overflow errors). Inset " " every 3rd place. I suggest you start inserting 3 places from the end and count backwards; it is easier. I think you find the 3 from the end with dna.length() - 4.

But, as previously stated you are better off creating an array length dna.length() / 3 and using new String(dna.substring(i, i + 3) to populate it. You can predict the length of the array, so you don’t need to go via a list.
Don’t copy‑and‑ paste from this post because I have used nbsp characters.
I agree. Here's the link:
subject: Help with string - array
It's not a secret anymore!