This week's giveaway is in the Testing forum.
We're giving away four copies of TDD for a Shopping Website LiveProject and have Steven Solomon on-line!
See this thread for details.
Win a copy of TDD for a Shopping Website LiveProject this week in the Testing forum!

Meghana Reddy

Ranch Hand
+ Follow
since Jan 29, 2002
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
1
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Meghana Reddy

Thank you for your responses:

ejaz khan wrote:I would suggest before start processing, open the file in EditPlus 3.41, goto Edit -->Delete-->Delete Duplicate Lines
It will help you remove all the duplicate records quickly



This is not an option since this is not a one time activity. This needs to be automated.

ejaz khan wrote: Another point, do not let your Java program to do duplicate hunting, instead create the unique/primary key rules on the DBMS and let the DBMS fail your duplicate records
Secondly, if the records do not have dependency, you can also use split file technique of UNIX. In this way, you will have smaller sized multiple files to process.
You can than use multiple threads to read split files in parallel and it will increase your read efficiency



This is another option, I'm considering. But we don't directly import into the transactional database at first shot.
We import this file into a temp table first and then start processing.
Since this is demographic file and luckily we have the SSN in the file, which we can use as a unique key on the temp table. But the problem is we need to know the exact values of each row that are duplicated, so that we can send that back in the rejected file to indicate what records were rejected and why!

Deepak Bala wrote: but this sounds more like a job for a ETL tool to me. Extract the contents of the text file -> Transform the values and eliminate duplicates -> Load to DB.


We don't have any ETL tools nor ETL expertise in our team, so, we may need to get someone who knows ETL to help with this.

Other than this, would using some sort of sorting (say merge sort) help identify the duplicates? I'm trying to see if someone had solved this problem using java before(however tedious, I'm pretty sure someone might have) and what their experiences had been.

Thanks again guys,
Meghana

9 years ago
Hi

we have a requirement here, where we have to process a huge demographic file(millions of records and could be 4-5 GB in size)

This file could have duplicate records that should be eliminated and after that we apply some business rules(developed in java, now considering to implement a rules engine) before populating all those records in a db.

Right now , we are doing everything sequentially and we are able to process only 20 records per sec which is not per SLA and
I'm looking for opportunities/ideas to improve the speed of processing.

So, I'm thinking to separate out the tasks in processing this file and see which tasks can be executed in parallel.
I've read about Map/Reduce approach and is this use case a good candidate for the Map/Reduce?
What is the best approach to eliminate the duplicates from such a large data set?
Any other thoughts?
9 years ago
Thanks Jeff. I never have had to use it before.
9 years ago
Just curious, Why bother about a 1000 digit number when java has a max value for Long which is 9223372036854775807 and is only 19 digits.
9 years ago
The only reason I can think of is that the handler is somehow not attached to the Service in the server environment especially because it is working in the standalone mode.

I dont know much about JBoss, but you can check the documentation to see if there's any additional configuration of handlers needed.

Sometime ago, I remember configuring the handlers from the admin page in Websphere. There probably is a similar configuration in JBoss.
9 years ago
I still didn't get the exact answer for my question.

AFAIK, the system properties will set the connect timeout at the system level which I don't want. I want to set the connect timeout for that specific URLConnection.

I understand the connect and request time outs.

// This is what I want which is what the service class will do when trying to fetch the WSDL from a URL but before we have a chance to get the port.
CONNECT_TIMEOUT: It is the time taken to establish a (socket) connection

// I'm not concerned about this. Because I can set this after obtaining the port and before invoking the business web service method.
REQUEST_TIMEOUT: Maximum time between establishing a connection and receiving data from the connection.

9 years ago
How are you invoking the setUser() method? meaning, at what point in your flow are you instantiating the handler and invoking the setUser() method?

Because, in a typical scenario, handler will not have any setter methods, the framework is supposed to invoke those specific methods.

Try hard coding the user value and see what happens.

9 years ago

Abhijit Durge wrote:
This is used when webservice client sends requests(executes a method) to the webService.



Are you sure? I think when the web service client sends requests , the "com.sun.xml.internal.ws.request.timeout" (which is the BindingProvider.REQUEST_TIMEOUT) setting controls this.
9 years ago
Here is the actual generated code that loads the WSDL from URL.



9 years ago
For the sake of argument, lets assume we are loading the WSDL dynamically from a URL.
9 years ago
I'm not familiar with RAD 7.5. So, I do not know which framework RAD is using to generate the web service client. If it is JAX-WS, you can set the basic auth in the MessageContext (this is typically done in a web service handler) as below:

Map<String, List<String>> headers = new HashMap<String, List<String>>();
headers.put("Username", Collections.singletonList("wsclient"));
headers.put("Password", Collections.singletonList("P@$$W0rd"));
MessageContext.put(MessageContext.HTTP_REQUEST_HEADERS, headers);
9 years ago
I'm not familiar with RAD 7.5. So, I do not know which framework RAD is using to generate the web service client. If it is JAX-WS, you can set the basic auth in the MessageContext (this is typically done in a web service handler) as below:

Map<String, List<String>> headers = new HashMap<String, List<String>>();
headers.put("Username", Collections.singletonList("wsclient"));
headers.put("Password", Collections.singletonList("P@$$W0rd"));
MessageContext.put(MessageContext.HTTP_REQUEST_HEADERS, headers);
9 years ago

Paulo Carvalho wrote: URL : http://localhost:8080/Generate/report



What is Generate in the above URL? That should be the name of the war file(in other words, the context root of the webapp) that you've deployed in tomcat.

Thanks
9 years ago
My bad. I thought, you had the exception on line 4. Can you post the malformed XML , so I can try ?
9 years ago