For sometime now i have been thinking on embarking on a project of my own. I would be glad if you can provide me with your expert suggestions.
I'm sure you know about machine learning algorithms, an implementation of it is in Weka
What i want to do is the following:
1. Develop a GUI browser based and/or a Web based Application with a Dashboard component in it. The application will do the following;
a) read an excel or comma separated value (CSV) file that contains customer shopping data,
b) Perform data pre-processing (ie cleaning of erroneous fields)
c) Apply Bayes Naiive algorithm or any other data mining algorithm after step b
And the Dashboard will do the following;
d) Show the result to the user.
e) Store the result in a database
f) Allow the user to access the database for past results and allow the user to apply the past results in the prediction of the new result.
Also this dashboard should have different dashlets like Graphs, Maps, Table data from tables in database etc
Now, i would request you to let me know what do i need to read and learn in order to develop this web based application. I did some reading on PrimeFaces, RichFaces, JavaSwing, PrimeFaces and RichFaces confused me a lot, i do not even know why in the first place i was reading it, maybe because somewhere in the dark corridors of my mind, the term "framework" keeps echoing! That's why i have been reading on Vaadin and LiftWeb
This erudite community members always have helped me a lot in the past, and i would really appreciate if you could help steer me in a direction.
I am very sorry for the above garbled post.. dont know how it happened.
Let me repost the code again
My question is how can i write the same code in JSP and involve Beans in it. Any tutorial or directions will be helpful
by the way, the above code does not even come close to what i want achieve as in the Original Post
Going through your requirements, I don't see the need of any fancy UI unless you are considerate about it.
I am just looking at the functional aspect of the requirements. Your CSS skills can still make it enough good I guess.
For the graphs and charts part I could recommend using JFreechart. Its really cool and easy.
With no disrespect intended, I'd suggest that generating static graphs and charts on the server is a thing of the past. Interactive client-side charting with packages such as HighCharts is a much more modern approach.
Thanks a lot, Amit and Bear for your comments.
Let me go through 'HighCharts' tutorial and figure out how will i be able to use it.
Meanwhile any other suggestions on how to perform data cleaning and data mining algorithms.
I'm almost tempted to follow Weka machine learning algorithms for data mining tasks but the only problem is that i cant understand some of its algorithms..!
Looking forth to your suggestions.
I never used HighCharts, but the demos are pretty cool. Thanks Bear
For data mining, Weka is a powerful tool and supports many algorithms. Moreover its open source.
You need to decide what algorithms you want, what type of data and cleaning you wish to perform,etc.
If you are looking at understanding data mining tool/algorithm RapidMiner community edition would be another choice.
Thanks Amit for your insight.
Tools like Weka, RapidMiner etc are already known to me. I prefer Weka more because its algorithms are open source code so it helps me to understand them better.
The algorithms that I'm interested to develop in Java are Naiive Bayes, Association rules, Classification and Clustering. The web has a plethora of source codes on these algorithms but i want to develop my own versions so that i gain a better understanding of them. Java Collections inspire me a lot. There have been several books on Data Mining/Weka etc but the problem is they are written in such a way as if targeting more experienced Java user genre.
Looking forth to your suggestions on the same.
Honestly I am not getting what you are exactly trying to do. If you want to develop your own implementation of any algorithm, then all you need is the understanding of the algorithm itself.
Existing implementations like Weka are handy when you want to test the algorithm against a data set. It may help in verifying correctness of your algorithm for large data set.
But be advised, the results may vary especially for clustering algorithms.
Ashish Dutt wrote:The algorithms that I'm interested to develop in Java are Naiive Bayes, Association rules, Classification and Clustering.
I would say you take the divide and conquer approach here. Association rules can be computed in n different ways and there are further more ways just to find the frequent itemset.
So pick a algorithm, pick a strategy for the frequent pattern, get the program running and then move on to the next one.
Usually clustering algorithms like K-means clustering are a good starting point. Association rule mining is bit tricky so comes later on .
On a side note, I feel that you are directly going to the final web application design and coding.
That is not the way I would have done it. There is no point in coding the JFileChooser when there is no underlying logic to process the selected file.
I just proposed K-means but there are more clustering algorithms out there. IMO K-means is the simplest to get hold of.
But again the effectiveness of clustering algorithms depend on the distribution of dataset.
Ashish Dutt wrote:What approach (or how would) you would have taken to design it?
Well I would have implemented the the algorithms and tested them as desktop applications, no GUI, no files nothing. Then add file inputs, add post processing like charting, make it to web app.
In one line -"build it bottom up".
Note that I have left out the design and just discussed the implementation. A good design is the base of a good implementation.