• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Sorting and rearranging a csv file by deleting duplicates

 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am looking for some help. I have an application at work that generates a csv with user information on it. I want to take that data, delete duplicate information, rearrange it, and create a spreadsheet. The csv is generated in the following format, but much larger:

21458952, a1234, Doe, John, technology, support staff, work phone, 555-555-5555
21458952, a1234, Doe, John, technology, support staff, work email, johndoe@whatever.net
21458952, a1234, Doe, John, technology, support staff, work pager, 555-555-5555
99977733, b9999, Smith, Jihn, technology, administration, work phone, 454-555-4444
99977733, b9999, Smith, John, technology, administration, work phone, 454-555-4444
99946133, b9854, Paul, Jane, technology, administration, work phone, 444-444-4444
99946133, b9854, Paul, Jane, technology, administration, work email, janepaul@whatever.net
99946133, b9854, Paul, Jane, technology, administration, work pager, 444-444-4444

I want to delete the duplicates and arrange the data in appropriate columns with in a csv file.

ID | PIN | Lname | Fname | Dept | team | work px | work email
21458952 a1234 Doe ... ... ... ... ...
99977733 b9999 Smith ... ... ... ... ...
99946133 b9854 Paul ... ... ... ... ...

I have been trying to build arrays with a BufferedReader to store the data, but I am running into difficulties dealing with duplicates and manipulating the data into a table. My Java skills are not very proficient (still working on that) and I need some direction on how this task can be done. Any help is greatly appreciated.


This is the code I have so far. I created a ReadFile class and a Employee class.






 
Marshal
Posts: 79239
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

I think you are better off trying to work out what you want to do before you try actually to do it. Write down the algorithms and things you want to do, and then keep writing until you have got everything in words of one syllable (well, lots of small words, at least). The it will not be at all hard for you to write code which does what you want.

Too hard for this board so I shall move you to a place where this thread will fit well.
 
Marshal
Posts: 8863
637
Mac OS X VI Editor BSD Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jason Barry wrote:I want to take that data, delete duplicate information, rearrange it, and create a spreadsheet.


This job with Java would become very complicated, and with your current code you're far behind that. When you're talking about creating a spreadsheet, worth to mention, there are lots of complicated stuff by doing that.
The best solution solving such a problems where I'd say excel spreadsheets getting involved - are integrated Macro's (VBA) in Excel.
It is not that difficult to get familiar with "basics" of VBA, so you could achieve your task quite easily (compared with Java).

I'm sorry if I disrupted your minds, I might wrong, and maybe someone else has different opinion, so hopefully they could help you and come up with something more concrete.
Welcome to the Ranch.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for having me Campbell! I am sure I will be here a lot now, since I want to further my Java skills. I will sit down and write all of the code out and share it.

I can get away with a text file instead of an excel file. I jut need to arrange the words in a particular order. Thanks for the advice guys.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Campbell, is this heading in the right direction, or do I need to be more specific in my goals for this program?


1. Import a text file (file path)
2. Read a text file (bufferedReader)
3. Separate text doc lines (while loop?)
4. Store the lines (arrays, in Employee Class?)
5. Separate the lines into words (while loop?)
6. Store the words (arrays?)
7. Delete duplicates (compare)
8. Arrange the words in a specific order(Employee Class)
9. Write the words to a file(Write)
10. close


Thanks in advanced!
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes and no. You are combining what you want to do and how you are going to do it. I shall quote your post in a minute and delete the parts you don't yet need.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jason Barry wrote:Campbell, is this heading in the right direction . . .


This is what you are looking for at the present stage:-

1. Import a text file
2. Read a text file
3. Separate text doc lines
4. Store the lines
5. Separate the lines into words
6. Store the words
7. Delete duplicates
8. Arrange the words in a specific order
9. Write the words to a file
10. close
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That looks good. Now, start with stage 1. Get that working, then you can think about stage 2. You can read about opening files in the Java® Tutorials; note that section was changed greatly in Java7, and it shows the most up to date way to do it.

While you are reading that section find the sections about formatting and scanning, too.

Write down some more details about how you need to store the individual words, and also what you mean by duplicate. You probably need that to clarify your thought about what to delete and what to retain.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the guidance Campbell.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You're welcome

Show us how far you have got with stage 1.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am able to import the text file and read. It is being stored in an array.

 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So far, so good How did you achieve that?

An array of what? I think you should have a class to encapsulate all those details. There are also data structures in the collections framework which can sort things, or even remove duplicates, all automatically. That may get you out of having to create the array in the first place. You just have to override the equals and hash code methods the right way. And provide a Comparator (or implement Comparable).
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks! This is the code I used




I tried to print the text file out line by line, but it skips lines here and there.

I will look into the collections framework and see if I can work with a hashset.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Campebell, that was an interesting read on collections. The issue I have, is how to implement those practices. I am still working towards that though.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There seems to be something not quite right about splitting a String into an array using comma as delimiter and then putting the String back together.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yep, I guess my logic is screwed up. i need to take a break. I will revisit this on a couple of hours.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am assuming I need to store the correct elements in an array and push certain elements into an Employee class.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You should have created an Employee class ages ago.

You should stop assuming. You have to decide what you mean by duplicates and how you are going to handle them. In the snippet of the text file you showed earlier, lines 1‑3 appear to refer to the same person. What are you going to do? Are you going to put phone number e‑mail and pager into one Employee object? Are you going to create three objects one for each line and then merge them into one Employee object? Are you going to create a data structure for each mapping phone number e‑mail and pager? Are you going to do something else.
But start by drawing a diagram of your Employee object with those fields in.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And now that you have a load of Strings flying around, and you are trying to extract information from part of a String, you will understand what Winston meant when he wrote this “cautionary tale”.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the input Campbell. What I want to do is create one employee object for each named employee and map all of the employee’s information into that employee object. I want two employee objects (John and Jane). John Doe would have an id, dept., position, work phone, email, and pager, likewise for Jane Paul.
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That is what I thought you meant. Now you will have to work out some way of mapping name or ID to phone number pager and e‑mail.

Draw a diagram of how you intend to get from the String to such a data structure, and then work out what sort of data structure to use. You may be able to dispose of that data structure when you have created all the Employee objects.
 
Jason Barry
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just had some free time. I drew up a diagram and tried to figure out some logic. I am still leaning towards an array to read the lines and store the strings in an array. The array will stop reading if the id is different than the current id. A hash map also looks like it would work, but I am not sure how to use a key to pull the strings I need to store in the employee object.
 
Today's lesson is that you can't wear a jetpack AND a cape. I should have read this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic