imran sujoy

+ Follow
since Sep 02, 2008
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by imran sujoy

I have to say I don't understand why you create a (correct) Reader over the properties file, then create a String from that, then convert that String back to an array of bytes, then create a second Reader over that array of bytes. Why can't ANTLR just be given the first Reader?

Because, first of all the reading of file is in existing framework, the output content is being used by different clients. So, I don't have much control on that. The file reading is part of the framework and the antlr job is part of a client.
Secondly, I could have directly pass the content to the AntlrInputstream like this:

And that was what I was doing. Later I thought of identifying the character encoding from the input stream and so I have used the reader.

But... but, one mysterious thing, just today morning when I again tried the code from command line, there are no more ANTLR token recognition error. I really don't know how it get solved, but for now, it's working fine.
5 years ago
I am sorry, which code I have not posted? I have posted the code of how I read the file. Then I pass the content of the file to AntlrInputStream. The BOM is of the file the code is reading, the xml is the output, I am not reading the BOM of the xml.

This is the ANTLR code, if it help, which read character by character and try to identify the tokens:

5 years ago
The code to read the file is :

The "file" is

The the charset passed in the method are read from the BOM, or if BOM is not present its defaulted to UTF-8.

So ANTLR is throwing an exception. But it's not an ANTLR issue?

See, if I pass ANTLR a valid unicode character it can recognize that. But if I pass garbage character while reading the file, ANTLR can't help. So I believe its not ANTLR issue, rather how I am reading the file, and somewhere I am missing the character encoding. Like, as I mentioned, the same unicode characters are recognized while I am (the code) reading the same file by running the Main class from Eclipse. Also, json does not give me trouble when running it even from terminal.
5 years ago
Sorry. I thought putting "reading" inside quotes would make it clear that its really the code reading and not me as a person. Probably a non native speaker issue. And I would please ask you to read my first post where the difference in console and eclipse is told, its not an antlr issue.
5 years ago
Ok, let me describe it further. In Antlr I describe grammar, for properties file it can be something like, key separator value. Then further I define what the key, separator and value are; they could be alphabets, numbers, special characters. Separator could be space, colon or equal sign. The more rigorous and exhaustive my grammar would be the more successful my parsing would be. Now think, the properties file contain a greek letter. But since I haven't define unicode characters as my token, antlr would fail to recognize that and would throw an error. So I have included the unicode ranges as well in the possible key value value. Now I am reading a properties file which contains that greek character. While doing that from eclipse I am not facing any problem to "read" the file, but when I am calling the application from console those greek character are not read as UTF8 and here I am not sure, is it because of that the characters are already passed to antlr as some meaningless symbol and hence the problem.
I don't know if I could make this clear enough.
5 years ago
Um, ANTLR grammar is written to parse certain file. From the grammar file parser, lexer, listener files are generated. Those files are used to read the tokens and extract offsets, text etc. So, ANTLR is not anything to do with Eclipse, it generates certain Java files based on a grammar file and those files can be used anywhere.

And no, ANTLR does not have any restriction in identifying Unicode characters.
5 years ago
Its not about showing the characters in the console. Rather, the ANTLR could not recognize the Unicode characters while running from console which is the problem.

Just now, I saw an interesting thing, the same unicode characters if placed in a json file, while parsing using ANTLR4 it does not give any problem and all Unicode characters are recognized.
5 years ago
I have a situation like this:
I am trying to read a properties file and convert it into xml. The properties file contain unicode characters outside the range of ascii character sets. My properties file contains:

I am using ANTLR4 to parse the properties file ( I need the offsets of values and line numbers).

While, running the application by calling the Main class from eclipse, all the unicode characters (e.g. ü ص © ® ° ) are interpreted and written to the xml properly. But, while I am running the application from Windows command line by creating a jar, ANTLR is throwing error like:

Could anybody please help me out here. Are there any difference between calling a class directly and calling the jar from command line with respect to character encoding?
5 years ago
Thanks Piet.
Ok, so do you mean, that whether the two threads fall into a deadlock is somewhat dependent on how fast the threads are executing, so if I have hundreds of other processes in a synchronization block, that can make the threads to fall in deadlock situation? So, is this underlying system dependent behavior?
In the same line, I have the following code which I felt should fall in a deadlock situation, but it does not. Could you please explain why this is not going into deadlock situation?

As many time I ran this code, I got the following output:
Thanks for the reply Ulf, but I'm not sure how much EmailValidator class will help when one needs a custom validator (like in my case + allowed, but ... not allowed, though the 2nd one is a perfectly valid email IMO). Anyway, just after posting here I got the solution myself:

The () in place of [] did the trick actually.

11 years ago
I have a requirement of validating email field. Most of the things are done. But I'm stuck at one place. The email allows _, +, . only once in the username part (i.e. before @), once or more, but never consecutively. Also, . can't be immediately before or after @. I couldn't do that checking for consecutive dots or pluses. How to add with \\w a dot or a plus? Can somebody kindly help? Thanks in advance.

11 years ago

Please forget about it. I found why the error was occurring. I was actually using one single PreparedStatement for two different methods (silly me), and when the flow returned from the called method, it still was the first preparedstatement rather than the new one, hence it couldn't found the parameter in the proper indexed position.