• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Five Lines of Code: Algorithm to eliminate code duplication

 
Ranch Hand
Posts: 97
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Christian,

Congrats on your new book.

Lets say we have a Java method to process data by chunks of 20K. Listing 1.
I want to refactor by using the logic in Listing 2. (I prefer not to use recursion)

My questions:
- Do you or anyone knows if the algorithm in Listing 2 was given a name?
- From refactorization guru point of view, what would be your recommendation to eliminate this code duplication?

Listing 1. Code Duplication


Listing 2. Refactorization. The difference is totalCounter == paramArray.length
 
Marshal
Posts: 70234
282
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you really still use StringBuffer? Wasn't it superseded for local variable use by StringBuilder? Why are you processing your Strings by the 20,000s? Why are you allocating a new StringBuffer rather than setting its length to 0? Why are you zeroing one counter? If you didn't zero it, you wouldn't need the other counter.
 
Jorge Ruiz-Aquino
Ranch Hand
Posts: 97
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Legacy code from 2000's. No refactor applied.
“You will see still greater abominations that they commit.”  

The goal of the first counter is to count up to 20K (the chunk).
The second counter is to make sure to process the remaining data if less than 20K.
 
Jorge Ruiz-Aquino
Ranch Hand
Posts: 97
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Campbell Ritchie.

The refactor is not about one specific method, but the logic from Listing 1.
My goal is to make a proposal for my project managers in order to stop doing C&P of that logic for processing by chunks.
 
Author
Posts: 31
8
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jorge,

Cool, I love looking at code. I actually have recently started live streaming once a week (https://www.twitch.tv/thedrlambda), and eventually, I hope people will submit code like this for us to discuss and refactor live. In the meantime let me give yours a go here.

Edit: Me from the future, it got a little crazy down there, so I made a tiny repo with your code, where you can inspect the code as it looks at each commit: https://github.com/maxipaxi/code-from-jorge

Honestly, I like listing 1 better, because the detail you point out in listing two is a lot easier to miss. However I do agree that this has too many responsibilities, it is also more than five lines ;)

. First I would extract the `if` inside the loop, to something like "processIfFullChunk".
. Then I would extract the last `if`, to something like "finalizeChunk" (edit: I remembered too late that "finalize" is reserved in Java, so this is not the best name).
. Then I would extract the body of the `for`-loop, to something like "addElementToChunk".
. Having multiple things ending in "Chunk" I would create a class called "Chunk".
. Push the three new methods into that class.
. Because multiple of them take `dataString` and `counter` I would make those field variables, and pass them in the constructor. I would also make the 20000 into a parameter to the constructor.
. At this point, I would push the initialization of the two variables into the constructor entirely eliminating them from `processByChunks`.
. I would then extract the code for processing (and resetting the chunk), to "processAndReset". Making "finalizeChunk" call that if `counter` is greater than 0.

This would be my endpoint if I had only the code you provided. However, as you have said that you want to use this structure multiple times you would probably make two or three copies of Chunk and only switching out at least one line. This would motivate unification (chapter 5). Because I don't know what the reuses are I am just going to "guess" at what line will be different and continue the refactoring. (This gets a bit crazy)

. We extract the line with `buildQueryAndProcessFunnyStuff` to its own method called "execute".
. We encapsulate this method in a class called "StuffProcessor".
. We extract an interface from this class.
. We make this "StuffProcessor" a field and parameter to the constructor, thereby moving the initialization of this object to the original method.
. The "StuffProcessorImpl" is pretty trivial right now, so we may as well just make it anonymous.

After all that we get this code:



If your team likes lambdas you can change the anonymous "StuffProcessor", I like them equally.

This got quite a bit technical. I hope it makes sense to you though, otherwise drop by my stream some time and I will show you the steps.
 
Jorge Ruiz-Aquino
Ranch Hand
Posts: 97
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Christian,

It makes sense to me.
Although I was pointing out the code duplication I do really like your approach to the rest of the code. It also makes sense to a lot of the code in our system.
I hope I can join at today's streaming.
 
Once upon a time there were three bears. And they were visted by a golden haired tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic