• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Convert hyphenated tags in XML to camelCase using java regex?

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to convert

<food>
<fruit-apple>red-apple</fruit-apple>
<good-banana>yellow</good-banana>
<food>


into

<food>
<fruitApple>red-apple</fruitApple>
<goodBanana>yellow</goodBanana>
<food>

I did it using



but I want to implement this using regex?
 
Sheriff
Posts: 4870
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch!

vivek kumar verma wrote:but I want to implement this using regex?


Are you asking whether it is possible?

Regular Expressions define a search pattern, so you use it for pattern matching in text. Regex itself does not offer any facility to edit text.

Can you be more specific with your question?
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the reply. I need efficient way to write it using java.util.regex.* .
 
Tim Cooke
Sheriff
Posts: 4870
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regex is not a text editing tool. So with that alone you cannot achieve what you want.

My next question is: Why?

You have a solution that I assume works (I haven't tested it), so why do you need another one? What's wrong with the one you have?
 
author
Posts: 23879
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The regex replaceFirst() and replaceAll() methods don't offer the ability to do camel case, so you will need to use the lower-level regex appendReplacement() and appendTail() methods.

Regardless, I don't know if it will be as fast (or as easy to read) as what you currently have though.

Henry
 
Marshal
Posts: 69843
278
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome again

vivek kumar verma wrote:. . .

but I want to implement this using regex?

Why not use a StringBuilder? I am pretty sure some of your if statements can be got rid ofThat is a very bad name for a variable, takeIt. Maybe use insideTag instead. Note the delete method returns StringBuilder so you can daisy‑chain multiple method calls like that. I think that is called a fluent interface but I am not certain.
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)
 
Campbell Ritchie
Marshal
Posts: 69843
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The simplest solution is to redirect the principal electron supply for your computer into the manager's teacup
Otherwise ask him why he thinks regexes make for a simpler solution.
 
Henry Wong
author
Posts: 23879
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

vivek kumar verma wrote:I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)



For the most part, it's not very hard. The regex is just a dash followed by a lower case letter. And the code is just a loop, find() using the regex, appendReplacement() with a toUpper() for the lower case letter, and later appendTail() after the loop.

The hard part is dealing with ensuring you are within the "< >". You will need to use zero length look behind and look ahead to the nearest "<" and ">" respectively. Unless you are comfortable with regexes, this may be difficult to read.

Henry
 
Campbell Ritchie
Marshal
Posts: 69843
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can of course use a regex to find the tags. Will that regex find a tag containing several hyphens?
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not written the full code, but this was my solution for removing dash and converting in to camel case.


as you have written the problem lies in taking "< >"
 
lowercase baba
Posts: 12871
62
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

vivek kumar verma wrote:I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)


If you boss insisted you drill a 1/8" hole in a board, but do it using a sledge hammer only, would you?




 
Campbell Ritchie
Marshal
Posts: 69843
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You drill the hole with a punch instead. That mistake killed several score people when I was young.
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I only want to check whether tags contain hyphen inside and not the values. How should I do it?

Above solution is not working for me, because there are multiple tags and it will look for hyphen inside bold tags <goodApple>fruit-fruit<goodApple> .
I just want to check whether there's hyphen inside tags ("< >")?
 
Henry Wong
author
Posts: 23879
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

vivek kumar verma wrote:I have not written the full code, but this was my solution for removing dash and converting in to camel case.



First, no need to have the letters before the dash (and the toLower() method call) -- as your original code didn't do it. And hence, no need for the regex to do it. Furthermore, it doesn't work. You are looking for lower case before the dash, so, it will only match if it is already lower case.

vivek kumar verma wrote:
as you have written the problem lies in taking "< >"



As described in my previous post, your best option is probably to use the zero-length look-ahead and look-behind features. I suggest you look into the regex tutorials there.

Henry
 
Henry Wong
author
Posts: 23879
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

vivek kumar verma wrote:If I only want to check whether tags contain hyphen inside and not the values. How should I do it?

Above solution is not working for me, because there are multiple tags and it will look for hyphen inside bold tags <goodApple>fruit-fruit<goodApple> .
I just want to check whether there's hyphen inside tags ("< >")?



First, I don't think you want to use "\\D". This will match anything that is not a digit, which includes any punctuations like a dash. Second, you probably want to use the find() method instead of the matches() method, as there are multiple tags in the string.

And third, regarding your question, you probably should not use the greedy qualifier, as that will pair the first "<" with the last ">", even if they don't belong in the same pair. You probably want the reluctant qualifier instead -- meaning use ".*?" instead of ".*".

Henry
 
Campbell Ritchie
Marshal
Posts: 69843
278
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All of which shows how right Fred is to liken regular expressions to a sledgehammer for drilling holes.
 
Tim Cooke
Sheriff
Posts: 4870
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At this stage it may be worth highlighting to your manager that you had a working solution 2 days ago and the fruitless endeavour for a regular expression solution is a complete waste of time.
 
Ranch Hand
Posts: 729
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would suggest an xslt approach which is the most appropriate to accomplish such kind of administrative work.

This is a quickly put-together xslt for the work to get done.

 
Tim Cooke
Sheriff
Posts: 4870
317
IntelliJ IDE Python Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks absolutely horrendous. How might the OP apply that?
 
g tsuji
Ranch Hand
Posts: 729
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

That looks absolutely horrendous. How might the OP apply that?


Luckily, I am not that miserable manager who has no chance to defend himself but being criticized at his back. Chances are that he may not be interested in defending himself neither as people may have made up their mind already.

In its simplest form, you can run a batch or commandline to get the output. The source file, say food.xml, the xslt file, say conversion.xsl, and the resultant file foodconverted.xml. The command line would look like.

whereas, you add, to the classpath environment variable, the jar's in the xalan-j package (say v2.7.1 that I have on the box or any more updated version), namely, xalan.jar, serializer.jar, xercesImpl.jar and xml-apis.jar.
(ref https://xml.apache.org/xalan-j/commandline.html)
 
Rancher
Posts: 4615
47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since this is a transformation, I would consider using the tool designed for that, IMO.

I expect that xslt could be tidied up to make it slightly neater, but g tsuji did say it was quickly thrown together.
 
Saloon Keeper
Posts: 2622
128
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Cooke wrote:That looks absolutely horrendous. How might the OP apply that?


+1 for that !


vivek kumar, You do know that you have to change both opening as well as closing tag right ? By the way, I am just throwing a suggestion .. Is DOM a good idea (Or is it overkill)?
(given that this is a valid xml, Using dom, you can step through each element and get its name, then use whatever regex and detect slash)


 
salvin francis
Saloon Keeper
Posts: 2622
128
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Here's my attempt :





Of course, I am just renaming everything to "awesome"
 
Saloon Keeper
Posts: 12150
258
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Some comments:

You're doing too much in your constructor. Constructors are only for initialization. You can probably move all that code to a static method that accepts a Document instance. You can also make a static method that transforms an xml String to a Document, and another one that transforms a Document to console output. There's no need for a MyTransformer instance.

Your recursivelyRenameNodes() can be singular. Let it accept one Node, and for the rename step use getOwnerDocument(). You can then call it on the Document instance.

You're using magic numbers. Instead of getNodeType() == 1 use getNodeType() == Node.ELEMENT_NODE.
 
salvin francis
Saloon Keeper
Posts: 2622
128
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the criticism Stephan van Hulst,
However, what do you feel about this ? Is this overkill for the op's problem ?
 
Stephan van Hulst
Saloon Keeper
Posts: 12150
258
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not necessarily. I wrote a solution that used a regex, but it might break in situations I haven't foreseen. (For instance, it takes attributes and element content into account, but not xml comments). With the DOM API you can be relatively certain that your solution will work in all kinds of weird situations. The problem is that I really really really dislike JAXP. The design is clunky, verbose, and it invariably leads to nasty code (like checking whether a Node is an Element by checking getNodeType).

This is the xml I used to test:
 
salvin francis
Saloon Keeper
Posts: 2622
128
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:The problem is that I really really really dislike JAXP. The design is clunky, verbose, and it invariably leads to nasty code (like checking whether a Node is an Element by checking getNodeType).

Its not that bad. Sometimes, its awesome. I had a simple example where an application used an existing configuration in xml format and i had to add index some content from there. It just took a 10-15 lines of code to eliminate a huge list of hard coded magic String list.

Imagine doing this with regex

That being said, I see most folks moving away from xml and into json territory since the amount of meta data is quite less.
 
Stephan van Hulst
Saloon Keeper
Posts: 12150
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can write a regular expression just fine to get the regionName as you call it, but only on input that "looks like xml". If the input is actually commented-out XML, the regex will still treat it as if it's regular XML. The regex also assumes that the input is well-formed. You can not write a general purpose XML parser with regular expressions.

The problem is that both XML and regular expressions are incredibly unwieldy general purpose tools. Using one on the other just compounds the problem. Another problem is that XML needs a lot of state information to be parsed correctly, while regular expression engines try to be mostly stateless.
 
Dave Tolls
Rancher
Posts: 4615
47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:
The problem is that both XML and regular expressions are incredibly unwieldy general purpose tools. Using one on the other just compounds the problem. Another problem is that XML needs a lot of state information to be parsed correctly, while regular expression engines try to be mostly stateless.



That's one of the reasons I think the xslt approach isn't as silly as it might look.
It's actually built to do this sort of thing.
 
Stephan van Hulst
Saloon Keeper
Posts: 12150
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree.
 
Don't get me started about those stupid light bulbs.
    Bookmark Topic Watch Topic
  • New Topic