aspose file tools*
The moose likes Java in General and the fly likes is there a way to convert html to JSON Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "is there a way to convert html to JSON" Watch "is there a way to convert html to JSON" New topic
Author

is there a way to convert html to JSON

bryan lim
Ranch Hand

Joined: Dec 26, 2008
Posts: 140
i know there is way to convert JSON to html.

but does anybody knows some links where i can find some code to convert html to JSON. tried googling but no avail.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39537
    
  27
I wouldn't say that "there is way to convert JSON to html" in general. It's possible to retrieve data in JSON format and present that in an HTML page, is as far as I'd go.

Can you give an example of what it would mean to convert HTML to JSON? What are you trying to achieve by doing so? Lastly, are you trying to do this with Java code?


Ping & DNS - updated with new look and Ping home screen widget
bryan lim
Ranch Hand

Joined: Dec 26, 2008
Posts: 140
thank you for your reply. Yes i hope to do it in java. i want to parse the html from website and turn them into JSON.
my whole point is to make data easier to be manipulate. right now, i have code that can read JSON and retrieve data from it. I thought that taking data from html is difficult.
I was hoping there is some existing code that can turn a whole html website into JSON format to make things easier. or is there a better way to do this?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39537
    
  27
There is no general conversion from HTML to JSON, nor do I think it would make much sense to define one (which isn't to say it wouldn't make sense for your purposes). If I wanted to work with HTML I'd convert it to XML (using a library like NekoHTML or TagSoup) and then use XML/DOM APIs to manipulate it.
bryan lim
Ranch Hand

Joined: Dec 26, 2008
Posts: 140
thanks again for your reply.

can you elaborate on whydo you think it doesn't make sense to define a HTML to JSON?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39537
    
  27
I see JSON's benefit in the transfer of data, and where manipulation of the data by JavaScript is required. Neither is the case for a Java application that parses web pages. So what value would JSON add over using XML/DOM APIs?
Bob Cen
Greenhorn

Joined: Jul 08, 2010
Posts: 4
Ulf Dittmer wrote:There is no general conversion from HTML to JSON, nor do I think it would make much sense to define one (which isn't to say it wouldn't make sense for your purposes). If I wanted to work with HTML I'd convert it to XML (using a library like NekoHTML or TagSoup) and then use XML/DOM APIs to manipulate it.


I think you just defined a need to have a HTML to JSON converter. Having to go through 2 additional steps to achieve what could be done in a single step is not the most desirable way of doing something. Having said that, has anyone developed a HTML to JSON converter yet? I could use one as I have need to convert a number of HTML files (without going into my reasons for doing it this way). Thank you.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13868
    
  10

Which question are you asking:

- "is there a way to convert HTML to JSON"
- "is there a way to convert JSON to HTML"

Because in the title you're asking one thing, but in the topic itself you're asking the opposite.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Bob Cen
Greenhorn

Joined: Jul 08, 2010
Posts: 4
Jesper Young wrote:Which question are you asking:

- "is there a way to convert HTML to JSON"
- "is there a way to convert JSON to HTML"

Because in the title you're asking one thing, but in the topic itself you're asking the opposite.


Actually, I meant HTML to JSON conversion. I can see a need for both converters - HTML to JSON and JSON to HTML! Now, do you know of an actual HTML to JSON converter? I am looking for a straight-forward Yes or No answer, not do 3 backflips, a 3 hour headstand, followed by three moer backflips to get there (as I have seen posted by a number of answers to this same question). Also, I am not looking for another question as to why I want to do it that way as that is irrelevant to the question - stating my reasons just brings answers as I just described - "do 3 backflips ...". Thank you.

BTW, the title on this "page" states "is there a way to convert html to JSON?", which I believe is what was my question. Thanks.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13868
    
  10

Bob Cen wrote:Now, do you know of an actual HTML to JSON converter? I am looking for a straight-forward Yes or No answer, not ...

In that case... no.

If you're not interested why not, then stop reading now. Otherwise...

HTML is a totally different kind of format than JSON. JSON (JavaScript Object Notation) is a way to serialize objects (you have an object, which contains key-value pairs etc.). HTML describes a page layout. That doesn't fit very well with what JSON represents. There is no clear, one-to-one way to map the structure of an HTML document to objects with key-value pairs. They are just two totally different formats, made for different purposes.

What's the reason why you need this? If you explain what your actual goal is, we might be able to help you further.
Bob Cen
Greenhorn

Joined: Jul 08, 2010
Posts: 4
Jesper Young wrote:
Bob Cen wrote:Now, do you know of an actual HTML to JSON converter? I am looking for a straight-forward Yes or No answer, not ...

In that case... no.

If you're not interested why not, then stop reading now. Otherwise...

HTML is a totally different kind of format than JSON. JSON (JavaScript Object Notation) is a way to serialize objects (you have an object, which contains key-value pairs etc.). HTML describes a page layout. That doesn't fit very well with what JSON represents. There is no clear, one-to-one way to map the structure of an HTML document to objects with key-value pairs. They are just two totally different formats, made for different purposes.

What's the reason why you need this? If you explain what your actual goal is, we might be able to help you further.


The format of HTML and JSON (and the purpose of HTML and JSON) is irrelevant. Converters are written all the time, given the definition of what has to be converted. Granted, some conversions are not 100%, but even 99% is better than none. As far as my question and your response, which implies there is no way to convert between the two formats, is not exactly accurate. Firefox converts HTML to JSON and JSON to HTML. It does this for bookmarks. When you start with a fresh install of Firefox, you get a bookmarks.html "starter file". As you add bookmarks to this file, Firefox will save the updated bookmarks.html as a JSON file in the bookmarksbackup folder. Later, if the need arises, you can restore a bookmarks.html file from a "saved" JSON backup. Granted, there may be certain rules to be followed in the process used by Firefox, which other HTML <-> JSON applications may not follow, but Firefox shows it can be done in both directions. I may be wrong, but I suspect if what I have seen I have interpreted correctly (and maybe not), a 2-way converter could be written. My understanding is JSON was written as an alternative to XML. Now there are XML <-> HTML converters (XML <-> other formats -- OmniFormat, Crystal Passage, etc.).

Actually, this discussion has made me think about it (as I wasn't thinking of XML in my initial post). There are XML <-> JSON converters. So I could do the conversion in 2 steps HTML -> XML and then XML -> JSON (and, of couse, reverse the procedure to go the other way). Thank you. javascript:emoticon('');
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39537
    
  27
Of course it's possible to convert HTML to JSON and back. The easiest way is something like

That way, an HTML page can be transported as JSON. Is this a useful representation? For some purposes - yes, but for most purposes - probably not.

So I think the salient point of Jesper's post is this:
There is no clear, one-to-one way to map the structure of an HTML document to objects with key-value pairs.

Given that there is an unlimited number of ways to map HTML to JSON the purpose of wanting to do this becomes important.

The example of the bookmarks is valid, but it's also rather limited - its HTML is well-defined, can be relied upon not to change, and uses only a few HTML elements, so writing converters for both directions is easy. For the general case where the HTML is not under your control (and can contain CSS and JavaScript), that's not the case.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13868
    
  10


Bob Cen wrote:The format of HTML and JSON (and the purpose of HTML and JSON) is irrelevant. Converters are written all the time, given the definition of what has to be converted. Granted, some conversions are not 100%, but even 99% is better than none. As far as my question and your response, which implies there is no way to convert between the two formats, is not exactly accurate. Firefox converts HTML to JSON and JSON to HTML. It does this for bookmarks.

I did not say there is absolutely no way to do this. But there is no clear way to do this in general. Your example with the Firefox bookmarks is a very specific example, with a specific mapping. It's not a general mapping that converts any HTML page to JSON.

Bob Cen wrote:Actually, this discussion has made me think about it (as I wasn't thinking of XML in my initial post). There are XML <-> JSON converters. So I could do the conversion in 2 steps HTML -> XML and then XML -> JSON (and, of couse, reverse the procedure to go the other way). Thank you. javascript:emoticon('');

If the page is in XHTML (a variant of HTML that conforms to XML syntax - i.e. all elements have open and close tags, all attributes have quoted values etc.) then you don't even need to do the HTML to XML conversion first. But you'd get a JSON document that has the same structure as the HTML, for example:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>Some text</p>
</body>
</html>

converts to:

"html": { "head": { "title": "Hello World" }, "body": { "p": "Some text" } }

You could do that if you want to. I'm curious however why you'd want to do this, to me it looks like a strange way to use JSON.
Bob Cen
Greenhorn

Joined: Jul 08, 2010
Posts: 4
Jesper Young wrote:

I did not say there is absolutely no way to do this. But there is no clear way to do this in general. Your example with the Firefox bookmarks is a very specific example, with a specific mapping. It's not a general mapping that converts any HTML page to JSON.

If the page is in XHTML (a variant of HTML that conforms to XML syntax - i.e. all elements have open and close tags, all attributes have quoted values etc.) then you don't even need to do the HTML to XML conversion first. But you'd get a JSON document that has the same structure as the HTML, for example:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>Some text</p>
</body>
</html>

converts to:

"html": { "head": { "title": "Hello World" }, "body": { "p": "Some text" } }

You could do that if you want to. I'm curious however why you'd want to do this, to me it looks like a strange way to use JSON.


Thank you for your response. My initial reason has to do with Firefox and the changes they made in Release 3 and later. I have a number of different bookmark files that I (eventually) need to get in sync. Although, Firefox provides an import function, it doesn't work the same way as it did in prior releases. It basically "appends" the import to an existing bookmark file and you need to "organize bookmarks" and "edit" to get what you want. Just copying your (legitimate) "bookmark" file into the Firefox directory does not work as Firefox overlays it with the prior bookmark file. Just deleting the backups doesn't solve the problem either (although I haven't tried that in conjunction with deleting the "places.sqlite" file). One way to get what you want would be to convert the HTML file to a JSON file and save it with the proper filename in the Firefox backup directory. Then you can "restore" to an earlier bookmark file (JSON) to get the bookmarks you really want. It is unfortunate that Firefox did not allow for a "blanket replace" (or import) of a bookmark file with the latest releases. Anyway, this is one reason why an HTML to JSON converter wuld be useful. There are also other situations. My google searches have turned up a number of people searching for HTML <-> JSON converters and some searches have come up with "convoluted" ways of doing it.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: is there a way to convert html to JSON
 
Similar Threads
pdf to html with java
JSON to HASHMAP
hardcoding in javascript
DWR and JSON
How to set the content type for a GET request from a browser