aspose file tools*
The moose likes Beginning Java and the fly likes help improve this code without the use of HashMap Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "help improve this code without the use of HashMap" Watch "help improve this code without the use of HashMap" New topic
Author

help improve this code without the use of HashMap

Paul Ngom
Ranch Hand

Joined: May 08, 2014
Posts: 316
    
    1
">


I am always surprised at the added knowledge i can get from others when i give my opinion on a topic.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11479
    
  16

Hi Paul,

Do you have an actual question? Simply posting 100 lines of code really isn't the best way to get help here. You need to ask specific, focused questions.

Even reading the thread title doesn't explain what you need help with. What is wrong with what you have? What do you want it do to? and why are you insisting a HashMap be used? Whenever someone says "How do I do X with Y", my immediate thoughts are "Why do I have to use Y?"

for example:

How do I turn a screw with paper?

How do I write a novel using water?

sometimes the questions make no sense.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

I can't quite tell what the HashMap is doing in that code, but you have a comment which suggests you shouldn't use one:



This comment suggests very strongly that you should be using a List, not a Map. An ArrayList would be a perfectly good choice, I think.
Stevens Miller
Ranch Hand

Joined: Jul 26, 2012
Posts: 567
    
    4

fred rosenberger wrote:How do I turn a screw with paper?

How do I write a novel using water?


Well, thanks, fred. I suppose you think I'll have no problem sleeping tonight after reading questions like those.
Paul Ngom
Ranch Hand

Joined: May 08, 2014
Posts: 316
    
    1
@fred, sorry if my title is not so clear as to what this program does. It works fine after i compile it notwithstanding some warnings. This program lists the sub directories of a given directory in a file. Alongside, the HashMap stores the list of the directories so one could count and reference them easily. Note that I am a beginner in Java. I do not want the warnings when i compile and i want the program to run faster. The present code is too slow when there are many directories.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11479
    
  16

Paul Ngom wrote:I do not want the warnings when i compile and i want the program to run faster. The present code is too slow when there are many directories.

well...what ARE the warnings? Remember that you want to make it as easy as possible for people to help you. So if you provided the full, exact text of those warnings, that means nobody here has to try and copy your code to their machine, compile it, and see what they get. They may not even get the same things you get, depending on what version of Java they are running.


And simply saying "i want the program to run faster" is not a good spec. You may want to skim our performance forum, where you will see tons of threads with advice like

  • make it work right before you worry about speed
  • have defined speed requirements - what EXACTLY is fast enough
  • premature optimization is the root of all evil


  • you can ALWAYS make your programs run faster...but unless you know what is 'fast enough', how do you know when you are done improving speed? You could also get a faster CPU to speed things up. it is possible a faster disk drive would speed things up. Are those options?
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    I am using jdk8
    after i run >javac -Xlint listreps.java
    i get the following warnings but it runs ok:

    listreps.java:17: warning: [rawtypes] found raw type: HashMap
    private HashMap hm = new HashMap(0,90000);
    ^
    missing type arguments for generic class HashMap<K,V>
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    listreps.java:17: warning: [rawtypes] found raw type: HashMap
    private HashMap hm = new HashMap(0,90000);
    ^
    missing type arguments for generic class HashMap<K,V>
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    listreps.java:25: warning: [rawtypes] found raw type: HashMap
    HashMap getReps(){
    ^
    missing type arguments for generic class HashMap<K,V>
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    listreps.java:45: warning: [unchecked] unchecked call to put(K,V) as a member of the raw type HashMap
    hm.put(cnt,pth);
    ^
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    listreps.java:68: warning: [unchecked] unchecked call to put(K,V) as a member of the raw type HashMap
    hm.put(0,"file da not closed!");
    ^
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    listreps.java:86: warning: [rawtypes] found raw type: HashMap
    HashMap ham=rp.getReps();
    ^
    missing type arguments for generic class HashMap<K,V>
    where K,V are type-variables:
    K extends Object declared in class HashMap
    V extends Object declared in class HashMap
    6 warnings
    Jesper de Jong
    Java Cowboy
    Saloon Keeper

    Joined: Aug 16, 2005
    Posts: 14347
        
      22

    You get those warnings because you are not using generics. Instead of using plain HashMap, you should specify the types of the keys and values in the HashMap - for example: HashMap<Integer, Long>

    Your code is hard to read because the indentation is inconsistent and because you are using very short and cryptic variable names. What does "bk" mean, or "listf", or "nlistf"? Why shorten a name like "path" to "pth"? To save typing one letter, at the expense of making the program harder to understand? It's important to choose good, clear names for variables, methods, classes etc., that will make your program much easier to understand.

    And how do you know that the use of HashMap makes your code slow? (At least, I assume that's what you think, because you want to the program to run faster and you're asking about how to do it without using HashMap). If you want to improve the performance of a program, you must first do some research and measurements (with a profiler, for example) to find out what exactly causes the performance problem.

    I suspect that it's not the HashMap at all that causes a performance problem; it's more likely that disk I/O is causing the problem.

    Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:I do not want the warnings when i compile and i want the program to run faster. The present code is too slow when there are many directories.

    And I'd care to bet that it has absolutely nothing to do with the fact that you're using a HashMap, so if that's the reason you want to use something else then think again. HashMap is almost always the fastest choice when you're only concerned with random access or checking whether a structure already contains something; and if order is important (as it might be if you want to list out the contents), then a LinkedHashMap may be better still.

    However, I suspect that your main problem actually lies at line 16 (or actually, 17, according to your error message), because you're creating a HashMap with an initial capacity of 0, and a load factor of 90,000, which means that:
    (a) Your HashMap probably only has one internal bucket, and therefore will work at the speed of a LinkedList.
    (b) It may well have to be re-hashed several times before your task is completed (but maybe not).

    I suspect that you meant to put those parameters the other way around, but even THAT is wrong, because the load factor must be positive.

    My advice:
  • 1. Try:
    private HashMap hm = new HashMap(90000);
    and see if that improves your performance.
  • 2. Next time, read the documentation carefully. You really need to understand what a "load factor" is before you start playing around with it.

  • HIH

    Winston

    Isn't it funny how there's always time and money enough to do it WRONG?
    Articles by Winston can be found here
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    Thanks for your advice. Load factor is a new term to me. I will have to explore it.
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18907
        
        8

    I don't think the slowness has anything at all to do with the minor details of HashMap's implementation. It's just memory usage, after all. It's much more likely to be related to the fact that processing each directory also involves opening and closing a file. (This could be done once instead of thousands of times, which would simplify the code as well as speeding it up.)

    And in my opinion the main problem with the code is this comment:



    This is incorrect: there is no recursive processing going on in that code at all. It's hard to tell how many levels of folders are being processed but it's no more than two. So the comment should be fixed to reflect what the code actually does, because comments which don't describe the code accurately are worse than no comments at all.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Clapham wrote:I don't think the slowness has anything at all to do with the minor details of HashMap's implementation. It's just memory usage, after all.

    In most cases, I'd agree, but not here. The fact is that that single line will probably cause the HashMap to behave like a LinkedList, which is probably not what you'd choose for this sort of stuff, even as a LIST. And if 90,000 is anywhere near the sort of capacity that is actually being used, then this thing could be doing hundreds of millions of ops (many of which might be lengthy string checks).

    However, (@Paul Ngom) it's an easy thing to check. Just do what I suggested, and see if it runs quicker.

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Paul
    Hi Paul, Have you tried the code at all? It will write all sub folders under a given directory in a file no matter their level. Even if the folder is 1000 directories deep, it will be listed. Directories that could not be opened are also listed in a different file.
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18907
        
        8

    No, I haven't tried it. But the fact is that it isn't doing anything recursive. So the comment needs to be fixed. And a whole lot of other comments need to be added to explain what it's doing; meaningful variable names would also help. As it is the code is very difficult to understand.

    And if I were going to write code to descend recursively through a directory tree, I would do just that. There's no need for HashMaps or any collections at all, just a simple recursive method.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Winston Gutkowski wrote:(many of which might be lengthy string checks)

    Just FYI: I note that, as of version 6, String.equals() doesn't use hashCode() as an arbiter of whether two Strings are equal or not; only length() - which strikes me as a major oversight on Sun's (and now Oracle's) part.

    [Edit] And that's because hashCode() is calculated, not cached. Another oversight, IMO.

    Winston
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39834
        
      28
    Load factor means how many of the buckets are occupied (on average). If you set load factor at 0.8f (80%) you have 16 buckets, the number of buckets increases to 32 when you have >12.8 pairs in the Map. The default is 75%. There is probably something in the HashMap API documentation.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:Have you tried the code at all? It will write all sub folders under a given directory in a file no matter their level. Even if the folder is 1000 directories deep, it will be listed. Directories that could not be opened are also listed in a different file.

    Are you sure? Because, like Paul (the other Paul), I can't see where you're recursing in order for that to happen. What I see is you writing the name of the directory to a file. Now, if you were to continually call that routine until you no longer have any new additions, then you might have a complete subdirectory list; but it's mighty tortuous.

    The fact is that what you're trying to do (I suspect) is precisely what recursion was designed for, viz:or something similar (there are many ways to do it).

    HIH (Sheesh, recursion on a Friday evening; I need to get a life ).

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Paul
    Paul, i am not here to re-invent the wheel. If i see better code, i will adopt it. So if you have a better solution, please share it.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:Paul, i am not here to re-invent the wheel. If i see better code, i will adopt it. So if you have a better solution, please share it.

    See above. It may not be exactly what you want, but I suspect it will give a base to work from.

    Winston
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18907
        
        8

    Winston's code is pretty much the smallest possible recursive method for going through a directory tree. However if you REALLY don't want to reinvent any wheels, then you could use the static Files.walkFileTree method which is built into the standard Java API.
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    You must be forgetting that this is a beginner forum. How do i try the code you posted? I see that it takes 2 arguments. The first is understandable but i don't know how to pass the second. best regards
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:You must be forgetting that this is a beginner forum.

    In which case, the question begs: why are you trying something so complex? Trawling directories is not a simple problem, and if you actually have a working solution, I commend you.

    How do i try the code you posted? I see that it takes 2 arguments. The first is understandable but i don't know how to pass the second.

    Simple. Create an empty List<File> and pass it, along with your top level directory, to the method. And if (as it looks) your top level is actually a list itself, call the method once for each path in the list that is a directory.

    Now, what you do with your "unreachable" files is another matter, but it's one that you can deal with after you've got the basic premise working.

    The main thing is: do you understand what it's doing? Recursion is not simple - even for me, after 35+ years in the biz - it takes thought. And (I suggest) a pencil and lots of paper.

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    I tried the piece of code that you supplied but it only listed level 1 sub directories.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:I tried the piece of code that you supplied but it only listed level 1 sub directories.

    You'd need to show us the entire code before we could help.

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    here it is
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:here it is

    OK, well:
    1. You haven't implemented what I wrote.
    2. You create a new List every time you trawl a new level. The idea is that you should keep adding to the same List, which is why you pass it as an argument.

    HIH

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    Please, pinpoint to where i am doing it wrong in the code i supplied.
    Tony Docherty
    Bartender

    Joined: Aug 07, 2007
    Posts: 2366
        
      50
    Please, pinpoint to where i am doing it wrong in the code i supplied.

    He just has, you haven't written the code he supplied.
    Your code only takes one parameter and creates a new ArrayList on each iteration, Winston's code creates a single ArrayList which is passed to the method and then for each recursion the same ArrayList is passed as the second paramter.
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    this works fine. Thank you Winston.
    Tim Holloway
    Saloon Keeper

    Joined: Jun 25, 2001
    Posts: 16250
        
      21

    There is a convention in Java (stolen from the Smalltalk language) that class names must start with an upper-case letter. Instance names and similar items such as class properties and methods, start with a lower-case letter. As do the various levels in package names.

    This is a convention, not an absolute, but a lot of Java tools will malfunction if you don't conform to it. Plus it's a clear signal to others that your Java proficiency is probably not that high.


    Customer surveys are for companies who didn't pay proper attention to begin with.
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Tim
    I agree that my java proficiency is not that high and i find no shame to that. I still learn new things everyday. What do you say about the first program i posted(90 lines)? Do you have more advice? They are welcome.
    Tim Holloway
    Saloon Keeper

    Joined: Jun 25, 2001
    Posts: 16250
        
      21

    I did see your admission as to your relative newness to Java and it's true that there's no shame. We've all been there. Although when pressed, I'll probably claim I was born knowing it. Still. most of us don't want to advertise the fact too widely. The JavaRanch is (unfortunately) one of the few places where one can be ignorant and not get flamed for it.

    I'm mostly going to let those who came before serve as your guides, as they're essentially correct. About the only thing I feel obliged to add is a little more background about hashmaps.

    There exists a thing called a "Perfect Hash", which is where in a hash map, all entries are accessible via a single lookup without the existence of overflow chains. If I have my CompSci notation correct, that's expressed as O(1).

    An ideal hash not only has that attribute, but also has no empty hash slots to waste memory, making it both ultra-fast and ultra-compact.

    There are functions that can be invoked that, given a set of data will compute a perfect hash for that data, but in most cases, it's sufficient to just approximate it. A hashtable is usually less likely to generate hash collisions when has a prime number of slots. The no-arguments constructor uses something like 17, but don't rely on my feeble memory if it's important. That's what the javadocs are for!

    Unlike Winston, I find recursion to be no more confusing than iteration. Although, to be fair, it took a long time for me to get the hang of proper iteration. If, as he asserts, Sun doesn't use its hash method to compare strings anymore, it's probably because they did statistical analysis, decided that a size comparison was generally faster than computing a hash and rejected enough of the mismatches to make it a preferred alternative. All either technique does is attempt to winnow obvious mismatches without having to do a character-by-character comparison - the ultimate, but slowest test. Caching the hashcodes of objects would make them take up more memory, plus for mutable objects (which String is not), you risk failing to update the cached hashcode.

    Finally, for what it's worth, some filesystems actually keep their directories on disk as hashes. So a directory tree in such a situation would typically be a hash of hashes. It works in RAM, too, where it's not uncommon.
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    It is about you code. When it finds a directory which access is denied, the program ends with a NullPointerException.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:It is about you code. When it finds a directory which access is denied, the program ends with a NullPointerException.

    OK, but at least you now know that you have code that basically does what you want, and now you can deal with those exceptions.
    It's an important lesson to learn: When you're programming, deal with one thing at a time.

    Just one possibility to solve your problem:but it's only one. You could simply ignore them if you want, but I seem to recall that you wanted to keep a separate list.

    A few other things to note:
  • I've changed the names somewhat, because your logic only stores directory names in the 'files' List, so why not give it a better name?
  • Since we now check whether the File that was passed is a directory right at the top, we can pass any old File to it we want, which actually makes it a bit more robust. The previous version assumed that what was passed was a directory.
  • Since you now have two Lists to maintain, there really isn't much point in returning only one of them, so it returns void. In fact, although I generally prefer methods to return something, when the List (or whatever) you're maintaining spans the entire recursion, there isn't really much point in returning it anyway, as it may confuse someone reading your code as to what's going on. It also, arguably, make your mainline easier to read, viz:

    List<File> directories=new ArrayList<File>();
    List<File> unreadables=new ArrayList<File>();
    listd listdir = new listd();
    listdir.trawl(new File("someDirectory"), directories, unreadables);
    System.out.println(directories);

  • HIH

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    Hi, thanks for the posts. I have noticed that the list directories falls short of one item when non accessible folders are found. Can you find where the problem is?
    Here is the current code:
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:Hi, thanks for the posts. I have noticed that the list directories falls short of one item when non accessible folders are found. Can you find where the problem is?

    Not really, because I have no idea what your system looks like, or what you're supplying to the program.

    If you print out the contents of those lists, rather than just their size, I suspect you could find out though.

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    I was able to find the solution to the problem when i printed the lists out.
    I am using ubuntu 14.04 and jdk8.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:I was able to find the solution to the problem when i printed the lists out.

    Great. what was it?

    I am using ubuntu 14.04 and jdk8.

    A reasonable choice; but it doesn't help in this case, because we still have no idea what the directory tree looks like or contains.

    For example: you've dealt with directories, but what about links to directories?

    Winston
    Paul Ngom
    Ranch Hand

    Joined: May 08, 2014
    Posts: 316
        
        1
    @Winston
    It is true that links to directories are considered as being folders in the program. Is there a way of excluding them?
    My program displays folders that could not be opened in both lists and this why i was having a difference in the count.
    Winston Gutkowski
    Bartender

    Joined: Mar 17, 2011
    Posts: 8223
        
      23

    Paul Ngom wrote:It is true that links to directories are considered as being folders in the program. Is there a way of excluding them?

    If so, I suspect not with java.io.File (there's no File.isLink() method); but it's possible that Files.readAttributes() might give you that information - but to be honest, I don't know (I've never tried it).

    Winston
     
    It is sorta covered in the JavaRanch Style Guide.
     
    subject: help improve this code without the use of HashMap