Razvan Popovici

+ Follow
since Feb 15, 2007
Merit badge: grant badges
For More
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Razvan Popovici

I have recently read another SQL Antipattern book; most, if not all, of the data structure issues described would go away if a proper normalization (to normal form 3) is performed. So here are my questions:
- are there any database design Antipatterns that go above NF3?
- Is query concatenation (instead of parametric query) an antipattern?
- Another antipattern I see each day is "Query within a loop", and it looks like:

The antipattern's result is a slow program, using a join would increase the speed about 20 times.
Currently I am looking for an efficient way to perform the Welch two sample t.test (t.test function in "R", TTEST with heteroscedastic populations in Microsoft Excel) on about 100 million pairs of vectors. The elements of the vectors are extracted from a database. Once the results are computed and stored in a database table, a correction for multiple comparisons (such as Bonferroni or FDR) has to be applied for the 100 millions p-values calculated.

My proposed approach includes MySQL and "R"; the problem is the large number of SQL queries to be executed and, out of the box, the lack of parallelism for both MySQL and "R".

The backup plan considers PostgreSQL and custom PostgreSQL functions, written either in C++ or PL/R.

Would Pentaho make any decisive difference by:
- enabling multicore and/or cluster computation for a more efficient resource usage
- provide a simple, out of the box approach for this class of problems (statistic functions + large data)?

The second question is about reporting, can Pentaho output a report from two databases? It will have to issue a query to each database and merge the results in memory. Since the databases are very large compared with relatively small report (that would fit in memory), it is unpractical to copy the databases to a third database and perform a classic SQL join.

Thank you,

PS. quoted "R" because of getting this message:
We're sorry, but your post appears to contain abbreviations that we don't like people to use at the Ranch. Because JavaRanch is an international forum, many of our members are not native English speakers. For that reason, it's important that we all try to write clear, standard English, and avoid abbreviations and SMS shortcuts. See here for more of an explanation. Thanks for understanding.
If the abbreviation occurs within code, you can use code tags to post it successfully.
The specific error message is: "r" is a silly English abbreviation; use "are" instead.
The data mining part of Pentaho is missing, do you have in mind a new book on data mining only?

Regarding reporting, is it possible to use a variable page size? I had problems with reports developed in Crystal Reports for European and American customers, because of differences of page width between Letter and A4.

Second question regarding reporting, is drill-down or any "non-printing" technique permited in Pentaho?
It seems validators are not triggered if the value is empty. So checking for emptiness with validators is pointless, they will never be reached.

For performance reasons it would be a better idea to perform all the non-session checking on the client side, using JavaScript (empty fields, emails are properly formed, ranges, names have at least two letters, etc ). In my opinion, the validator concept only makes sense for server side checking (for instance, looking to see if the entry is in the database).

Does anybody know how to trigger the text in Message controls from client sided JavaScript in an elegant fashion?
14 years ago

Lasse Koskela wrote:Out of curiosity, I wrote a little micro-benchmark by parsing output from 'top' and determined that for an array of million "primitive" integers (well, in Ruby everything is an object) my benchmark seemed to consume 4 bytes per integer, which suggests that the interpreter does indeed have some optimizations for integers.

Ah. That was time wasted for a useless benchmark. And fun.

I would like to learn Ruby, can you publish your code?
14 years ago
One of my projects involving gene sequence alignment uses a large amount of big linear vectors of integers and floats. I have noticed in Java that if I am using a language provided vector (such as a[] ) the speed improves significantly as when using the implemented collections classes (List, Set, Vector). Also, by using primitive types (such as int, long) the speed improves up to 10 times as when using language classes (Integer, Long, Float). The reason is simple, each Integer is an instance of a class, residing in its own dynamically allocated memory area.
Regarding Ruby, how does Ruby manage the primitive types? I understand that the programmer sees them as classes, but internally, how much memory areas are allocated for example if one builds a vector of 1000 integers? Is this one, as in int[1000] in Java, or 1001 as of Integer[1000]: the array object and the 1000 Integer objects?
14 years ago
So why is the concurrency in Java 1.4 different of the concurrency in Java 7 (except of a deprecated thread kill function)?
There are two points in concurrency: one is the capability of spawning threads; as opposite to the processes, they share the memory with the parent thread. Processes may share some areas of memory, named shared memory, but this is not implemented in Java either. Second is the synchronization: the ability of a thread to wait for an event caused by another thread (such as signaling, closure, release of a resource), semaphores and critical sections.
It would be great if the author would address these two topics.
14 years ago
I worked with hibernate more than 3 years, still I have some questions regarding performance:

- deletion - is it any way to delete without reading the data. I mean:
Delete * from customers where name='John'
is much faster than selecting data, fetching it, than deleting each record based in ID.

- scrollable cursors
if the result of the select query is expected to be huge so it is unpractical to display the whole data, programmer may decide to fetch the first nnn records to display them in an user interface, than once the user pages or scrolls the control, the required data is fetched on demand. Practically the programmer can access the cursor in a random (as opposed to sequential) fashion. How does Hibernate fit in this approach? Please note this approach is bound to the database server capability, old databases such as Oracle 8 does not support it (at least I tried and didn't succeed few years ago).

- insertion issue
It seems the memory of the process is increasing once we add new entries to the database.

- instanceof does not work if using class inheritance. The classes seems to be always an instance of the base class, even if Hibernate could 'know' what class actually to instantiate, by using a discriminator.

It seems except Update, I have covered with questions all possible CRUD opperations

A lot of times I had to provide a (J2EE) solution for the following problem:

The client makes a request that is supposed to take a long time to be solved (3 min to 10 hours). While running this query, the client should be able to query the status of the task, and to be able to terminate it.

It is ok for the client to aquire the status by pooling, since there is no mechanism of call-back in EJB, like CORBA or DCOM have.

Until now, I found no standard way to do it: cannot use a session bean, since the request will time out and the client won't be able to aquire the status; if using message driven beans, I will also need to save the status in a entity and query it from the client using a session bean, which is the current solution; still, if working in a cluster, the node executing the task cannot be shut down until the task completes.

Is there any out of the box solution in the J2EE servers, particularly GlassFish for this pattern?


Is your book covering also the WS domain? Most of the web services tutorials/documentations provide only the clasical add(a,b) sample. Very interesting would be to know how to:
- use a POJO as a web service interface
- use a POJO as a parameter or a return value in a web service
- in the line above, usage of class hierarchies, especially when abstract classes are involved
- use lists, arrays, sets.

For me these issues were solved using the XFire servlet within a servlet container. Is any J2EE server, particulary GlassFish capable of providing these functionality out of the box and in a consistent manner? Consistent I mean I have about 40 POJOs, generated from an UML model, and I do not want to re-write the WSLD or any other descriptor file when we change the model.

Mark, your both questions are converging to a single answer:
In order to sell it, you have to think like a buyer. Most of the managers are balancing money budget, time budget and quality requirements against deadlines. If you are able to convince them that: less money will be needed, or less time will be needed (since people are payed, this means almost automatically less money) while being able to solve the requirements in time, I think there is no problem. Most of the managers are short term thinking persons, since their own work gets assessed usually based on immediate goals.

16 years ago
Most of the software projects end with a graphical user interface. It can be WEB (Ajax or plain), Java Swing or SWT, Windows API Based, qt, .NET, XWindow or even command prompt.
Is your book covering the techniques to regresively test such applications? From my experience, testing an Session EJB or WebService is relatively easy, but building a test of an user interface was always difficult.
Do you know an effective method of testing SWT GUIs?

Second question is about testing the non-functional requirements of an application. Does the book covers the testing of security, performance, compatibiliy, availability or usability?

16 years ago
I think the reason is purely financial. The tests cost money. If you work for a single or a few customers, correcting a problem from the beginnings will cost the same amount of money as correcting it while in production, that's so called bannana products, they are delivered green and mature at the customer. On the other hand, if you have a lot of customers, then you would spend much more money on correcting a problem in production as you would spend in development.
Despite of this logic, there are companies who don't want to invest more in quality assurance, because of no imediate incentive, they even consider the cost of deployment "normal".
The third category are the CTOs/deciders with no idea about testing, or at least no continous testing.
16 years ago
I think you don't have to mock up the web application either, that's a job for the web designer, not for the architect.
Your task is to model the application's flow and data using UML.