I have recently been assigned to a new web application project on my company. The application is a data mining application that batchruns at least once a day and produces reports based on that data. We have an existing application that is a total mess of bashscripts, unixcommands, sql-files, and some extremely creative use of pipes and input/output descriptors. After this the business people takes the resulting data (the "cleaned" data) and applies another bunch of sql-queries to get the final reports done an distributed. The problem is that this application is totally inflexible and takes forever to run (one of the tables have over 15 million rows). So the requirements for the new system is roughly like this: - Build it on a new plattform (I'm set on Java/servlets or a webapplication framework) - Build it resonably modular and extensible. - The batchrun has to be optimized considerably (sometimes it takes over two hours to run sucessfully, and if there is an error somewhere in the input data is has to be rerun). You can imagine what managment thinks about not having the reports in their hands at the end of business day. - Build interfaces for the BS people to enable them to tweak some parameters and view/edit/add different aspects of source data in specific ways. They also want a query interface with preset queries and result tables into the target data tables. - It *has* to be done (as in ready for deployment on production servers) in a month and a half. Yep. The problem is, I have never built a Java web application before. I have build web applications in Perl and Php for almost two years before starting at this company a year ago. So I have the general http/html knowledge. Here I have been coding server side core Java ( a socket based solution) for about 7 monts. So no detailed JSP and no servlets knowledge. I have been reading up on web application frameworks and presentation technologies and I've come up with the following trail of thought: - J2EE is definately overkill for this, and the time constraint makes it impossible. I do have access to a Weblogic server farm for development and production but I think I want to develop with Tomcat and test/deploy on Weblogic since this is most probably going to be the production environment. - Some kind of "light" application container like Spring. Since I will have to build a web tier/data access tier/logic tier, why not make componts out of it right? - Do I go for raw JSP, Spring templating (Spring *does* have it's own templating right?) or do I use Velocity for templating? I want to focus on the logic not input data validation and crap like that. - Can I get a speed improvement in a batchrun environment by using ORM products like Hibernate in this context? Besides doing ordinary sql optimization ofcourse... Or can I get far enough by just going core JDBC or using some kind of DAO pattern from Spring? I am leaning towards Spring/Spring templating/JDBC to keep it as simple as possible (considering the time constraint and my limited previous knowledge), but still extensible for later. I already know I have to extend it with more functionality in a later stage. *Any* input on this would be greatly appreciated! Jonas Larsson, troubled Java developer
Don't even try to write this in Java (and I MEAN that -- and yes, this really is coming from me). Use a commercial data mining tool. They do things you can't even dream of, and will be way faster to use than anything you can build. You might want to take a look at the products from DB2, Oracle, and Microsoft that can do this kind of thing. If you then decide that you need to build a web front-end to that, then that's fine, but don't try to start this from scratch. Kyle
Regarding the datamining app comment. The flow of the application is the following: - Aggregate five or six sources of data streams (text files, Excel files, binary DB dumps) into one DB. - "Massage" the data by doing calculations, tidying up and moving around data throug a series of steps (series of SQL-queries with integrated calculations). - Aggregate this "Massaged" data into a final summary table. - Use this summary table for data mining. This is the source data for the reports (which I am probaly going to do with Crystal Reports). This table is around 100K rows each day, but gets added to a big summary table for historical data. This history table grows with every business day and gets to about 15 million rows at the end of fiscal year. The BS ppl need to mostly work with "Todays" data but also need to do a series of preset queries on the history data now and again. Do I really need a real DM app for this? I do not have access to any kind of data mining applications (besides maybe standard Sybase DB tools). And in my organisation it would take weeks to get access to a license if my employer had licenses on DM apps. I don't have time for that... I have to go with what I know here... So. Plain servlets and prepared statements then? Do anyone have any input on presentation? I do not feel good about mixing HTML and Java if I use JSP... It seems hard to maintain if the app grows... /Jonas
Joined: Aug 10, 2001
I really hate to say this -- yes, this is exactly what commercial tools like DB2 Information integrator do. You've described a need for data integration, data cleansing, and historical data management -- exactly what they specialize in. While you could do this in Java, it is probably VERY worthwhile to approach your management and ask, perhaps politely and softly, if "haste makes waste" in this instance. It really is applying the wrong tool for the job. If you do this in Java, then a year or two from now I will guarantee you that it WILL be rewritten yet again, as it will be very hard to write something like this in a flexible, maintainable way. That's just not what Java is designed to do. What's more, you've said this is your first real J2EE project (BTW, J2EE != EJB -- J2EE includes Servlets and JSPs as well). Putting those together would make me very concerned about this. If there is ANY way to try to hasten the acquisition of a more specialized tool for this, I'd look into it before starting in Java. However, if you do end up going with Java, then I would actually recommend something more like Struts + Hibernate (if you decide you want O/R mapping) rather than Spring. You'll get better support for one of the more established open-source frameworks than you will for something like Spring. Just as an example -- there's one book on Spring (Rod Johnson's). There are something like 9 books on Struts alone, and more than 50 than describe Struts in some other context (like my book on WebSphere). With Struts you don't have to mix HTML and Java -- nearly everything is done with taglibs in Struts. There are also dozens of tools for doing Struts programming, both commercial and open-source. Kyle [ November 28, 2003: Message edited by: Kyle Brown ]
subject: Need you input on tech choices for new project.