aspose file tools*
The moose likes I/O and Streams and the fly likes Read a 5.6 mb text file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Read a 5.6 mb text file" Watch "Read a 5.6 mb text file" New topic
Author

Read a 5.6 mb text file

Tirthankar Mukherjee
Ranch Hand

Joined: Apr 08, 2006
Posts: 51
I need to read a text file (5.6 mb ) with 162849 lines of data. I need to do a pattern search using regex on that. So I tried to read the whole file in a String variable and use regex , but its not working fine ... I need it to be done very fast as well as memory efficient way.

What is the standard procedure to do this kind of job.

Thanks in Advance
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42280
    
  64
What does "its not working fine" mean - what exactly is the code doing, and how is or isn't it working?


Ping & DNS - my free Android networking tools app
Tirthankar Mukherjee
Ranch Hand

Joined: Apr 08, 2006
Posts: 51
Ulf Dittmer wrote:What does "its not working fine" mean - what exactly is the code doing, and how is or isn't it working?


Its taking quiet a long time 25/30 secs ... I want to know what are the standard methods ... different approach for tackling this kind of situation ?

Putting the whole 5.6 mb of text in a single Sting is not possible I think, isn't it ? Then should I take a List of stings and break them in some small parts and then use regex. ?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42280
    
  64
So "it's not working fine" means that it's functionally OK, but too slow?

30 seconds would mean about 5000 lines per seconds, which doesn't seem all that bad. It of course depends on the complexity of the regexps. Are you certain that you need regexps, and can't use String methods like indexOf and contains?

Make sure you're using a BufferedReader for reading the file.
Tirthankar Mukherjee
Ranch Hand

Joined: Apr 08, 2006
Posts: 51
[Event "Sparkassen Gp 1"]
[Site "Dortmund GER"]
[Date "2002.07.11"]
[Round "6"]
[White "Shirov, Alexei"]
[Black "Gelfand, Boris"]
[Result "1-0"]
[ECO "B51"]
[WhiteElo "2697"]
[BlackElo "2710"]
[EventDate "2002.07.06"]
[Annotator "Hathaway, Mark"]

1. e4 c5 { Gelfand plays openings which are ideal for an aggressive player, but he isn't a wild-eyed tactician; he's a planner and classical positional player. } 2. Nf3 Nc6 3. Bb5 d6 ( 3...Qc7 ) ( 3...g6 ) 4. Bxc6+ bxc6 5. O-O Bg4 ( 5...e5 { Kasparov-Polgar, Eurotel, 2002, 1-0 } ) 6. h3 Bh5 7. e5 { White's aim is to open the e-file or to "break" Black's pawn structure. The immediate threat of e5-e6 is reminiscent of an Alekhine's Defense position. If White doesn't play e4-e5 immediately then Black might get sufficient control of e5 to keep White boxed-in on the light squares. } 7...e6 { Black tries to keep control of e5 to prevent White from playing g2-g4 followed by Nf3-e5. } ( 7...d5 8. e6 fxe6 9. Re1 ) ( 7...dxe5 $5 8. g4 ( 8. Re1 $6 f6 ( 8...Qd5 $6 9. g4 Bg6 10. Rxe5 Qd6 ) ) 8...Bg6 9. Nxe5 Qd6 { and it's not clear who's weaknesses are most significant } ) 8. exd6 Bxd6 { While Black gets good piece activity the poor pawn structure must make Gelfand a little nervous. } 9. d3 ( 9. d4 cxd4 10. Qxd4 Bxf3 ( 10...Bh2+ $4 11. Kxh2 Bxf3 12. Qxg7 Qf6 13. Qxf6 Nxf6 14. gxf3 ) 11. Qxg7 Qf6 12. Qxf6 Nxf6 13. gxf3 Rg8+ 14. Kh1 Rb8 { and Black's terrific piece activity should give him the advantage. The immediate threat of ...Rb8-b5-h5xh3# should make White worry. } ) 9...Ne7 10. Nbd2 O-O { Playing a Nh4 or Nd4 to add pressure to the pin on Nf3 would be desirable, but White can step out of the pin with Qd1-e1. Black has to do more, so he castles and prepares to bring more forces into play. } ( 10...Nf5 { is a very uncertain sacrifice offer } 11. Qe1 ( 11. g4 Bg6 12. gxf5 Bxf5 13. Kg2 ) 11...O-O ( 11...Nd4 12. Nxd4 cxd4 13. Qe4 Rc8 { leaves Bh5 striking at thin air and Pd4 is a little weak } ) 12. g4 Bg6 13. gxf5 Bxf5 14. Kg2 ) 11. Ne4 { Though White needs to clear the way for Bc1 to develop it's a little odd to see him weaken Nf3 this way. Does he intend to move Bc1 and then play Ne4-d2 to re-establish the defense of Nf3, or is g2-g4 still being considered? } 11...Nd5 12. Re1 { White's pawn structure is quite modest, but from that his pieces might spring forward. Only the pin on Nf3 is troublesome. } 12...Re8 { It appears Black might want to play ...e6-e5 to secure control of d4 and f4, but he might also have in mind to play ...Bd6-f8 to keep the bishop on the board. } ( 12...Rb8 ) 13. Ng3 { So, this is the idea behind Nd2-e4. Black has to either retreat and give up on the pin on Nf3 or give up one of the bishops for a knight. He could trade off Bd6 (13...Bxg3 14. fxg3) or Bh5 (13...Bxf3 14. Qxf3) . } 13...Bg6 { Apparently he didn't see any immediate value to trading, so he keeps the two bishops in hopes of some future time when they'll be especially valuable. } 14. Ne4 Bc7 $2 { I don't understand giving up Pc5 in this situation. What plan does Black follow which demands the bishop be at c7 rather than d6 or f8? } ( 14...Bxe4 $6 15. dxe4 Nb6 16. e5 Bf8 17. Qe2 { and Black is cramped on the king-side by Pe5 and his queen-side is awkward } ) 15. Nxc5 e5 { Black threatens to open the position, possibly with ...f7-f5, ...e5-e4, before White can complete his development. } 16. a3 { White, apparently, is not terribly impressed and simply uses Pa3 to oppose Nd5 and possibly to support b2-b4, whereby Nc5 is defended and Bc1-b2 becomes available. } 16...f5 17. c4 { The resulting weakness at d4 seems trivial. The more important thing is to get rid of Nd5, so Bc1 can be developed to a useful square. } 17...Nf6 18. d4 $5 ( 18. Bg5 ) 18...e4 19. Ne5 ( 19. Nh4 Bh5 20. Qa4 Qd6 21. g3 { when Nh4 is a bit stranded, but Bh5 isn't necessarily very good and Pd4 is weak, but Pc6 is too. There are a lot of positional features which have to be comprehended before it can be said who is better or what their plans are. } ) 19...Bh5 ( 19...Bxe5 20. dxe5 Rxe5 { seems simple enough and a good choice for Black. White is simply giving back the pawn he'd won earlier, to ensure easy development and a queen-side pawn majority for the ending. } 21. b4 { secures Nc5 and prepares either Bc1-b2 or Bc1-f4 } ( { not bad, but perhaps not best is } 21. Be3 Bh5 22. Qa4 Qd6 23. b4 Ree8 { when Black threatens ...f5-f4 } ) 21...Bh5 22. Qa4 Qe8 23. Bf4 Re7 24. Bd6 Rf7 ) 20. Qd2 { just keeping Pd4 defended, however awkward it may be } 20...Qd6 { This clearly indicates he intends to get rid of Ne5 with his rook, hoping to retain control of f4 with Qd6 & Bc7. } 21. Qc3 Rad8 ( 21...Rxe5 $4 22. dxe5 Qxc5 23. exf6 ) 22. Be3 { Black has reached an impasse. His queen and rooks and even Bc7 are blocked severely on dark squares by Pd4 & Ne5. } 22...Rxe5 ( 22...f4 $4 23. Bxf4 Qxd4 24. Qxd4 Rxd4 25. Nxc6 ) 23. Nb7 { White keeps Pd4 and Be3 to maintain control of the central dark squares. } ( 23. dxe5 Qxe5 24. Qxe5 Bxe5 25. Rab1 f4 26. Bc1 { and Black has some advantages which compensate for the exchange sacrifice } 26...Bg6 ) 23...Qf8 24. dxe5 { White wins one exchange and now threatens two more! This looks like a catastrophe for Black. } ( 24. Nxd8 Re8 25. Nxc6 f4 ) 24...Rd3 { saving one exchange and gaining time to save the other } 25. Qb4 Bxe5 26. Qxf8+ Kxf8 { At this moment it appears Black has made the necessary breakthrough. Be5 is a powerhouse and Rd3 is also very good. } 27. Nc5 Rd6 28. Nb7 Rd7 29. Nc5 { White seems satisfied to repeat the position. Black may think he has the better of it, so he tries for more. I don't think that's a wise decision. } 29...Re7 30. Rab1 f4 31. Bd2 { This move is what Black allowed when he refused to repeat the position and played 29...Re7. White now threatens Nc5xe4. } 31...Bd6 ( 31...Bd4 32. Bb4 Kf7 33. Nb3 Rd7 34. Nxd4 Rxd4 35. Bc5 Rxc4 36. Bxa7 { and the fight goes on, but Black no longer has the advantage of the two bishops! } ) 32. b4 ( 32. Bb4 $4 a5 ) 32...Kf7 $6 ( { It might be too late for Black to turn back the tide, because of the earlier exchange sacrifice, but I think Black needs to advance some pawns for the purpose of creating contact with the enemy. This would cause White to pause in his queen-side advances. } 32...g5 33. Bc3 Be5 34. Nxe4 Nxe4 35. Bxe5 Rxe5 36. f3 Bg6 37. fxe4 Ke7 $16 ) 33. Bc3 e3 ( { Now it's too late for } 33...g5 $4 34. Bxf6 Kxf6 35. Nxe4+ ) 34. Bd4 ( 34. fxe3 fxe3 35. Bd4 Bf4 ) 34...Bxc5 ( 34...exf2+ { Black might have seen this as a bad move because of the implied simplification, but it does get rid of a White pawn near Kg1 and it doesn't just lose a pawn. } 35. Bxf2 $16 ) 35. Bxc5 Re5 ( 35...exf2+ 36. Bxf2 Ne4 37. b5 ) 36. fxe3 f3 37. gxf3 ( 37. g4 $2 { leaves Pf3 on the board, in White's camp and that's dangerous, compared to simply capturing it } ) 37...Bxf3 { White is better on the queen-side and now has a share of the center under control (Bc5 & Pe3 work together well) , but Kg1 is a little exposed, so he should improve that before proceeding with his strengths. } 38. Kh2 ( 38. Rb2 ) 38...Be4 39. Rb2 Rh5 40. Rf1 { activating the piece, preventing ...Nf6-g4+ and in place to defend Ph3 } 40...a6 ( 40...g5 41. Rbf2 Rh6 42. Bd4 g4 43. Rxf6+ $18 ) 41. Bd4 Bf5 { Black does all he can to avoid too many exchanges; he's also threatening Ph3. } 42. Rf3 Ne4 ( 42...Be4 43. Rf4 Bf5 44. h4 ) 43. b5 axb5 44. cxb5 cxb5 45. Rxb5 { How convenient for White, to end these exchanges by pinning Bf5! } 45...g6 46. a4 Nd2 ( 46...Ng5 47. Rfxf5+ gxf5 48. Rxf5+ Kg6 ( 48...Ke8 49. Kg2 h6 50. a5 Kd7 ( 50...Rxh3 $4 51. Rxg5 ) 51. a6 Kc7 52. a7 Kb7 53. Rf8 ) 49. Rf6+ Kg7 50. Rf4+ Kg6 51. h4 ) 47. Rf4 Rxh3+ 48. Kg2 Rh5 49. a5 Ke6 { unpinning Bf5 and threatening ...Bh3+ or ...Be4+ to win Rb5 } 50. Re5+ Kd6 51. a6
1-0


The above is a single chess game , I have a database of such 10000 games, these are all saved in a single .pgn file
now I need to parse this file and get the game notations (leaving out the comments of the game analyst as well as when, where .. who played what .. all those data) for each and every game. I thought of using StringBuilder rather than StringBuffer , as I knew though StringBuilder is not thread safe but it is fast ... and I am implementing any thread on this parsing of the file.

I stated my problem clearly, hoping 4 more suggestion..
Tirthankar Mukherjee
Ranch Hand

Joined: Apr 08, 2006
Posts: 51
ohh yes I am using BufferedReader ....
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Read a 5.6 mb text file