• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Junilu Lacar
  • Liutauras Vilda
Sheriffs:
  • Paul Clapham
  • Jeanne Boyarsky
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
Bartenders:
  • Jesse Duncan
  • Frits Walraven
  • Mikalai Zaikin

I need the URL unicode representation of <> angle brackets?

 
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So, I'm working on a web scanner. In this project i need to url encode special characters like <,>,,etc. I can url encode these fine with

String payload = URLEncoder.encode( payload, "UTF-8" );
This will take <> and turn them into urlencoded characters i believe %3d,%3e.

Now my problem: I also need to encode these special characters as unicode. In particular full width unicode <,> become %uff1c,%uff1e? i tried

payload = URLEncoder.encode( payload, "Unicode" ); this gives me some strange thing like %ff%2e for angle brackets.

I'll do it by hand if i need to just wanted to check first!
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you sure you need to URL encode them into Unicode? URL specs by W3C says URLs should be in UTF-8 or translational problems may occur.

What specifically do you mean by Unicode (there are multiple Unicode encodings. But the IANA site doesn't list any with just a 'Unicode' alias). Do you mean UTF-16? In Java, when I use "Unicode" as the encoding it appears to go into UTF-16:

produces


The output for UTF-16 and Unicode are the same. The characters are:
%FE%FF => The two bytes defining byte order
%00%3C => The two bytes defining < (003C)
+ => Space
%FE%FF => The two bytes defining byte order
%00%3E => The two bytes defining > (003E)

The first byte for < and > will be 00 because they both fit into a single byte.

You can drop off the byte order characters by using:

Assuming your other side expects Big Endian (default for URLs).
reply
    Bookmark Topic Watch Topic
  • New Topic