Help coderanch get a
new server
by contributing to the fundraiser

amit bose

Greenhorn
+ Follow
since Apr 01, 2005
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
In last 30 days
0
Forums and Threads

Recent posts by amit bose

Carey Brown wrote:Your best bet is to process the XML in a serial fashion, in that way you have no memory problems. You could use either the STAX or SAX libraries for this.


Secondly, you are keeping 4 copies of the data in memory: sbfContent(twice), result, and sbfValidatedContent.


sbfContent should be emptied before trying to append to it again.


result and sbfValidatedContent should be released before trying to re-read the file.




Thanks for the pointer Carey.

I was going through the webpage but it seems STAX API allows to stream XML data. As my input is not a XML file but rather multiple XMLs concatenated together, I am not sure if I can use this. Please correct me if I am wrong.

Also, regarding the duplicacy of data in memory: I will remove the duplicacy but the code fails prior to reaching the duplicated content (i.e. sbfValidatedContent etc.)
14 years ago

William Brogden wrote:It looks to me like there is only one pass through the file.

Why don't you write chunks of valid data as they are accumulated?

Bill



Thanks for the pointer Bill.

Actually, I wanted to write chunks of valid data as they are accumulated but firstly, I need to read the input data file where the code fails. The input is also not a XML file that could be processes easily but rather multiple XMLs concatenated together.
14 years ago

Somnath Mallick wrote:I think, since you are getting an out of memory error, it would help if you increase your JVM heap size.



Thanks for the pointer Somnath.

However, I am already using a large heap size as below:

14 years ago
Hi All,

Please find below the details of my query.

Problem: I need to process a huge(350 MB size) data file in Java.The data file is is basically a concatenation of multiple XMLs together.

What I need to do is..
(a) check if there are some unwanted characters in bewteen the XML tags
(b) If Yes, remove the tags
After the validation stage above, I need to write the file back to disc.

E.g. Input Data file sample (D1)
<?xml version="1.0" encoding="UTF-8"><books><!--- Books1. xml - some more tags go here --></books>some junk here
<?xml version="1.0" encoding="UTF-8"><books><!--- Books2. xml - some more tags go here --></books>
<?xml version="1.0" encoding="UTF-8"><books><!--- Books3. xml - some more tags go here --></books>more junk
<?xml version="1.0" encoding="UTF-8"><books><!--- Books4. xml - some more tags go here --></books>

(Please note that the content in input data file above will appear in a single line; For better readability I have shown indentation of the XMLs)


E.g. Output Data file sample (D2)
<?xml version="1.0" encoding="UTF-8"><books><!--- Books1. xml - some more tags go here --></books>
<?xml version="1.0" encoding="UTF-8"><books><!--- Books2. xml - some more tags go here --></books>
<?xml version="1.0" encoding="UTF-8"><books><!--- Books3. xml - some more tags go here --></books>
<?xml version="1.0" encoding="UTF-8"><books><!--- Books4. xml - some more tags go here --></books>

(The text 'some junk here' and 'more junk' have been removed in D2)


Earlier Solution:

I have shared my code below:


The above code thrown OutOfMemory Error for file size more than 100 MB and it happens when I am trying to read the file.

The next thing that I tried was using buffers to read the file than line-by-line:



The above code worked fine till the data file size was 200 MB or less. However, now I have a data file of 350 MB size and it keeps giving the out of memory error.
Increasing the buffer size does not sound like a good option.

Let me know if there are any pointers for this problem.


Thanks,
Amit
14 years ago
Hi,

How can I use a SAX Parser to split a huge XML file into multiple parts (I also will need to keep the hierarchy intact):

e.g.If the input xml is:
<institution>
<institution-name>Institute-123</institution-name>
<institution-address>Institute-address</institution-address>
<departments>
<department id="101">
<employee>
<name>Emp1-101</name>
</employee>
<employee>
<name>Emp2-101</name>
</employee>
</department>
<department id="102">
<employee>
<name>Emp1-102</name>
</employee>
<employee>
<name>Emp2-102</name>
</employee>
</department>
</departments>
</institution>

The Split XMLs must list the employee names with the department & institution details(with hierarchy intact).Thus, there will be 4 xmls in this case for employees

(Emp1-101.xml)

<data>
<institution-name>Institute-123</institution-name>
<institution-address>Institute-address</institution-address>
<department id="101">
<employee>
<name>Emp1-101</name>
</employee>
</department>
</data>

Similarly we need to have XMLs for Emp2-101, Emp1-102, Emp2-102

Please note that I cannot use DOM as the input XML is very huge.

Thanks,
Amit
Hi All,

Prologue:
------------
There are two kinds of entities: Manager and Projects(each project has a monthly financial report)
One Manager might have access to one or more Project (and hence to their respective financial reports as well)
Manager is the User who will login to the application.


Context:
------------
Once the user logs in to the web application, he/she gets a link to the PDF report(s). On clicking the report a window opens showing the contents of the report.The PDF reports are stored somewhere in the Server file system.
e.g. folder structure could be like this:
Reports > Project1 > Report1_PROJ1.pdf
Reports > Project1 > Report2_PROJ1.pdf
Reports > Project2 > Report1_PROJ2.pdf and so on..


Problem:
------------
The user might tamper with the URL and change it to try to read other PDF's to which he/she is not authorized.
e.g. ManagerXXX is authorized to see only "Project2" reports.
Current URL:
http://server/filelocation/Reports/Project2/Report1_PROJ2.pdf
The Manger can modify this URL as:
http://server/filelocation/Reports/

Then he/she will see all the available Project Reports even if he/she is not authorized to do so.
Main problem here is that, once the URL has been tampered, the control does not return to the web application but directly goes to the file system relevant to the changed URL.

Already explored solutions:
-----------------------------
Javascript:
We do not show the address bar/status bar so no chance of URL tampering
The problem is that if the browser disabled Javascript then there is great security issue.


Possible solution:
---------------------
If it was possible to somehow integrate the file system and the LDAP. In such case the access rights to the user will be based on groups to which the user belongs in LDAP.
However, I am unable to get any material to get started on this approach.


Regards,
Amit
[ October 24, 2007: Message edited by: amit bose ]
16 years ago
Hi,

Please let me know which version of JAXB/JWSDK is compatible with JDK1.3?

My finding:
I could locate the most primitive version i.e. JWSDK1.3 but this too requires JDK1.4 minimum.

Do we have any other lower version of JWSDK that may be compatible with JDK1.3?

Regards,
Amit
17 years ago
Hi,

I have a XML file e.g.
<Root>
<CarModel>
<ModelNumber>MXT987</ModelNumber>
<SellCode>HJ-ER-M5</SellCode>
</CarModel>
<Buyers>
<Buyer>
<Name>Smith</Name>
<ReceiptNo>RC123</ReceiptNo>
</Buyer>
<Buyer>
<Name>Marshall</Name>
<ReceiptNo>RC888</ReceiptNo>
</Buyer>
<Buyer>
<Name>John</Name>
<ReceiptNo>RC111</ReceiptNo>
</Buyer>
<Buyer>
...
</Buyer>
</Buyers>
</Root>

I have the following tables:


[TABLE1 ] :CAR
[COLUMN1] :Model_Number
[COLUMN2] :Sell_Code

[TABLE2 ] :CUSTOMER
[COLUMN1] :Model_Number
[COLUMN2] :Name
[COLUMN3] :Receipt_No

Is it posible to have a dynamic mapping between the XML data and the Database columns?
Say if there is a new XML like:

<Header>
<Item>
<ChasisNo>POU986</ChasisNo>
<TxnCode>KL-78-RT</TxnCode>
</Item>
<Merchants>
<Merchant>
<Title>Alpha</Title>
<BillNo>ML987</Title>
</Merchant>
<Merchant>
<Title>Beta</Title>
<BillNo>ML787</Title>
</Merchant>
<Merchant>
...
</Merchant>
</Merchants>
</Header>

Then the fields must be mapped to DB colums like:

ChasisNo:Model_Number(i.e. ChasisNo in XML tag must be mapped to DB column Model_Number)
TxnCode:Sell_Code
Title:Name
BillNo:Receipt_No

Regards,
Amit
[ February 15, 2007: Message edited by: amit bose ]
17 years ago
Hi,

I require a tool that takes an XML file as input and provides the XSD schema as the output.Later on the obtained XSD schema can me trimmed as per requirements.
Kindly suggest open source tool only as I have license issues(with freewares like XMLFox and sharewares).

I will appreciate if you drop a mail at amit_mnnit@yahoo.com as well.

Best Regards,
Amit
17 years ago
Hi,

I require a tool that takes an XML file as input and provides the XSD schema as the output.Later on the obtained XSD schema can me trimmed as per requirements.
Kindly suggest open source tool only as I have license issues(with freewares like XMLFox and sharewares).

[Request for e-mailing of answers deleted: UseTheForumNotEmail - Paul C]

Best Regards,
Amit
[ February 15, 2007: Message edited by: Paul Clapham ]
Hi Jim,

By that line I meant that the number of such files would be large.
About a certain thousand can be safely asssumed for now. The bulk is sure to go up in future.

Cheers,
Amit
17 years ago
My file input is not a XML file.Only that it has some header info present as XML Tags.
What I require to extract from this file is
content1,content2,........contentn (and)
Data1, Data2,......,Datan

So should I use Regex for the same, not sure about the Regex perfomance given that my file size would not exceed say 100 lines. However, the bulk of the input files would be quite enormous.
17 years ago
Hi,

I have a file(*.txt) which is of the format:

<tag1>content1</tag1>
<tag2>content2</tag2>
...Etc till...
<tagN>contentN</tagN>
{1 : Data1}{2 : Data2..
...Etc till....
}

What would be a optimal way of reading the same?
Would a plain buffered stream read suffice OR a better alternative exist.

Cheers,
Amit
[ December 16, 2006: Message edited by: amit bose ]
17 years ago
I have a database table T1(with 20 fields) and based on them I need to generate some XML files, say F1,F2,F3
I have methods: generateXMLFile1(x), generateXMLFile2(y), generateXMLFile3(z) which would create F1,F2,F3
Also, F1 uses 10 fields from T1, F2 uses 5 and F3 uses 15 fields.

Soln1: Create a bean class 'commonB' based on 20 fields of T1 and depending on XML file being generated,
populate fields and pass 'commonB'.
i.e. in case of generating F2 file, just populate commonB with 5 fields and call generateXMLFile1(commonB objCommonB)


Soln2: Create three bean classes B1(for F1 with 10 fields),B2(for F2 with 5 fields),B3(for F3 with 15 fields)
and invoke generateXMLFile1(B1 objB1) for F1 and so on for others.

(Please note that by bean,I mean simple Java bean and not EJBs)

Downside of solution1 is that e.g. in F2 generation we are using only 5 bean fields out of the 20 available fields
I am not sure if this causes performance issues.

Downside of solution2 would be we are creating a number of Java classes.i.e.down the line if we need to create
10 more XML messages, then 10 additional bean classes would be required.

Where should the tradeoff be done so that a optimal design is obtained?
Which approach id more performant?

Thanks,
Amit
amit_mnnit@yahoo.com
17 years ago
Hi,

Please let me know if anyone has implemented Thread pooling with the java.util.concurrent.ThreadPoolExecutor of J2SE 5.0

I will be reachable at amit_mnnit@yahoo.com


Thanks,
Amit