aspose file tools*
The moose likes Java Micro Edition and the fly likes Getting data from a webpage Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Mobile » Java Micro Edition
Bookmark "Getting data from a webpage" Watch "Getting data from a webpage" New topic
Author

Getting data from a webpage

Huang Qingyan
Greenhorn

Joined: Jan 18, 2002
Posts: 23
Hi,
Can you guys please kindly teach me on how to extract data from a webpage using J2ME?
I had succeded in the connection part and now my main problem is how do I extract the information that I wanted?
For example:
<html>
...
...
]1. the data i wanted<br>
</html>
Now form what you guys might had guess, the data that I wanted to extract is "the data I wanted". So how do I go about this?
Warmest regards,
Qingyan
Valentin Crettaz
Gold Digger
Sheriff

Joined: Aug 26, 2001
Posts: 7610
You can use an XML parser to parse the HTML you get and extract the relevant parts you need...
Check out the following articles:
Parsing XML in J2ME
XML and Java Language Programming For Wireless Devices: A Tutorial


SCJP 5, SCJD, SCBCD, SCWCD, SCDJWS, IBM XML
[Blog] [Blogroll] [My Reviews] My Linked In
Rishi Tyagi
Ranch Hand

Joined: Feb 14, 2002
Posts: 100
There is not a single method how you can retrieve data from a http link. For the same you can do following:-
1- if you are good enough in the xml programming and parsing xml files then use the xml tags to send data in webpage and extract data from the xml tags in your j2me application.
2- Simply print the Data-String from the web application using its output object and read that from your j2me application using input stream object.
Rishi
Michael Yuan
author
Ranch Hand

Joined: Mar 07, 2002
Posts: 1427
I agree with Rishi. You have to know the format to extract any useful data.
But be aware if you want to use a XML parser to parse HTML data: HTML pages are often incorrectly formatted (For exmaple, HTML pages often have <p>s without the matching </p>s) So, XML parsers often fail when parsing HTML.


Seam Framework: http://www.amazon.com/exec/obidos/ASIN/0137129394/mobileenterpr-20/
Ringful: http://www.ringful.com/
Stefan Haustein
Author
Greenhorn

Joined: Jul 28, 2002
Posts: 9
Originally posted by Huang Qingyan:

For example:
<html>
...
...
]1. the data i wanted<br>
</html>
Now form what you guys might had guess, the data that I wanted to extract is "the data I wanted". So how do I go about this?

Hi Qingyang,
if the data you are looking for is always embraced by "]1." and "<", you can read the stream until you find "]1." and use this as trigger for extracting the desired content. Of course, "]1." should not occur elsewhere, otherwise the extraction becomes more complicated. Example code:


Of course, depending on the actual HTML code, this can become more complex or even impossible... A general aproach may be to try to figure out a logical rule for detecting the content, something like "first <li> after the word 'prices'", and then to verify this manually for some examples, and finally to implement the rule found in Java.
Please note that this may still fail when the page design changes. When a machine readable version of the content is available (SOAP, RDF or XML-RPC), this is probably a more stable target for extraction.
Best,
Stefan


Stefan Haustein <br />Co-Author of "<a href="http://www.amazon.com/exec/obidos/ASIN/0672320959/ref=ase_electricporkchop" target="_blank" rel="nofollow">Java 2 Micro Edition Application Development</a>"
Huang Qingyan
Greenhorn

Joined: Jan 18, 2002
Posts: 23
Hi again,
First of all, thanks for you guys's kindest reply. Anyway, I guess I had to start learning XML parsing. I do heard alot about XML great stories in the past.
Can you guys please kindly tell me again what link that can give me good tutorial on XML parsing?
Warmest regards,
Qingyan
Michael Yuan
author
Ranch Hand

Joined: Mar 07, 2002
Posts: 1427
As I have mentioned in a previous comment, XML parsers often fail to parse legitimate HTML messages. What you really need is a customly build parser. And Stefan's example shows you exactly what you need to do ...
Rishi Tyagi
Ranch Hand

Joined: Feb 14, 2002
Posts: 100
Huang,
For parsing of xml there are several API's which can be used for the same purpose means for parsing xml data.
I have worked on jaxp API given by sun microsystems although many people prefer to use JClark APi for the same purpose.
But listen XML is a tag based language. If you are using XML for communation between two application first of all you will have to define DTD'd for the XML files after that you can send or receive data properly.
For example
<xml>
<user_data id=xxxxx>
<msisdn>9198xxxxxxxx</msisdn>
<age>23</age>
<address>abc, new delhi,India</address>
</user_data>
</xml>
the above example xml contains some data regarding user information.
which folows some predefined DTD like
<xml>
<user_data id=xxxx>
<msisdn>MSISDN</msisdn>
<age>Age of the user</age>
<address>Address of the user</address>
</user_data>
</xml>
and you will have to follow the same at the both ends of the application means in sender and as well as in receiver app.
and so you will have not to do a lot in this case while parsing only pick the data inside the specified tags using parser API.
But if you want to send simple data like
<xml>
<data>Hello how are you</data>
</xml>
i will suggest not to use any xml parser in this case and make a simple code which can extract the line of data between the specified tags as it will reduce the length and complexity of the code.
Hope it will be helpful for you.
For the XML parser API's you can go to
for jaxp API -->java.sun.com
JClark API --> you will have to search on the web as i am not able to recall the site name this time for the same
Regards,
Rishi
Stefan Haustein
Author
Greenhorn

Joined: Jul 28, 2002
Posts: 9
p.s.: My Utils4Me XML reader allows to set the parsing mode to relaxed. In relaxed mode, it is less strict concernig the XML encoding rules, so it may be able to read your HTML page. Actually, reading any HTML was the idea behind the relaxed mode, so if it does not, let me know...
Best,
Stefan
Michael Yuan
author
Ranch Hand

Joined: Mar 07, 2002
Posts: 1427
Originally posted by Stefan Haustein:
p.s.: My Utils4Me XML reader allows to set the parsing mode to relaxed. In relaxed mode, it is less strict concernig the XML encoding rules, so it may be able to read your HTML page. Actually, reading any HTML was the idea behind the relaxed mode, so if it does not, let me know...
Best,
Stefan

Thanks for the info, Stefan! It is very useful outside J2ME as well. When I developed for the server side, I have used different tools to tidy HTML pages to XML before feeding them to XML parsers. It was such a pain.
Stefan Haustein
Author
Greenhorn

Joined: Jul 28, 2002
Posts: 9
Since I was asked offline to explain the example in more detail, I will try to do so here, perhaps somebody else is interested....
Basically, what the example code does is to store the text just before the searche content in a variable trigger. In the example, this is "]1.".
The variable match is a counter that counts how many characters from trigger were matched so far.
When the char just read in the first while loop (variable i) matches the first char in trigger ("i==trigger.charAt(match)"), match is incremented. Now, match points to the second char in trigger (character indices in strings start with 0). If the next char matches the second char in trigger, match is incremented again, etc., until match equals the number of characters in trigger ("match==trigger.length()"), or a mismatch occurs ("else"). In the first case, the full trigger string was recognized, and the break statement terminates the first loop. In the second case, the match counter is reset to 0 or 1.
The second loop just reads the result string until the first character of an HTML tag (< is read.
Best,
Stefan
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Getting data from a webpage