Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Out of Memory Exception when using Levenshtein Algorithm

 
kaparapu madhu
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am comparing two html files, using Levenshtein Algorithm. I am reading these two files in arraylist and giving as input to Levenshteing Algorithm. This algorithm uses Edit matrix for comparision with size m*n ( arrayList1.size() * arrayList2.size() ).
It is working good and efficient for small files, but for large file it throws out of memory exception. To overcome this we used -Xmx700000000. But the problem is, it takes lot of time to generate diffReport nearly 15 minutes and above. I am searching for a possible solution to reduce the time
 
Geoffrey Falk
Ranch Hand
Posts: 171
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You do not need to allocate a 2x2 array. It is possible to implement the Levenshtein algorithm using only 1-dimensional arrays.

Calculate the matrix one row at a time. You only need the previous row to determine the values in the next row.

Geoffrey
 
Geoffrey Falk
Ranch Hand
Posts: 171
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This uses JDK 1.5:




[ June 14, 2005: Message edited by: Geoffrey Falk ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic