aspose file tools*
The moose likes Java in General and the fly likes JLS formal grammar Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "JLS formal grammar" Watch "JLS formal grammar" New topic
Author

JLS formal grammar

Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
In JLS 18.1,
IdentifierSuffix:
[ ( ] BracketsOpt . class | Expression ])
I am not interested in the case of optional brackets or class.
IdentifierSuffix:
[ ( ] . Expression ])
I want to apply IdentiierSuffix to a cast expression of this form (T)a().b()
I understand . Expression What confuses me is the unbalanced brackets and parentheses. Can you please explain the strange syntax.
(Have I chosen an appropriate forum for this question?)
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
(Have I chosen an appropriate forum for this question?)
Absolutely.
What confuses me is the unbalanced brackets and parentheses. Can you please explain the strange syntax.
I had to stare at this a while, looking at the online version. It turns out that the use of italics is key (!!!), so I'll reproduce the original more exactly:
[ ( ] BracketsOpt . class | Expression ])
Note that all brackets on this line are not in italics, which seems to indicate that they are literal bracket chars rather than indicators that an element is optional. (If you look at the top of the page, or other examples, [foo] indicates zero or one foo, while '[' foo ] indicates [ followed by foo followed by ']'. Meanwhile, the parens in the above are italicized, indicating they are not literal. They didn't bother explaining what non-literal parens mean, but I think we can assume they're used for logical grouping. So, the above expression can be interpreted as (using more Java-like notation):
[code]'[' + (
']' + BracketsOpt + '.' + class
|
Expression + ']'
)
So, using the above expression as part of the rule for Primary:
Identifier { . Identifier } [ IdentifierSuffix ]
we can see that it could represent
java.lang.String (no IdentifierSuffix)
java.lang.String[].class (no BracketsOpt)
java.lang.String[][][].class (using BracketsOpt)
java.lang.String[n + 1] (using Expression)
Note that
java.lang.String.class
is not covered here; that's handled by a different line of IdentifierSuffix. The part you ask about is just for handling things with at least one bracket in them.
I want to apply IdentiierSuffix to a cast expression of this form (T)a().b()
Let's see - that's an Expression, more specifically an Expression1, more specifically an Expression2, more specifically an Expression3. Now we see that "(T)" matches '(' + Type + ')', so the remaining "a().b()" must be another Expression3. How?
We can see that "a()" matches
Primary:
Identifier { . Identifier }[ IdentifierSuffix]

since "a" is an Identifier, and "()" matches
IdentifierSuffix:
Arguments
Arguments:
( [Expression { , Expression }] )

(No Expression needed.) So given that "a()" is a Primary, "a().b()" matches
Expression3:
Primary {Selector} {PostfixOp}

if we can make ".b()" match Selector. Well, we have
Selector:
. Identifier [Arguments]

so yes, "b" matches Identifier, and "()" matches Argumetns.
And we begin to see why we normally let compilers handle this sort of thing, rather than humans.
<digression>
I see that that while JLS 18 generally uses italics to indicate logical structures and no italics for literals, they use a non-italicized '|' to indicate logical alternation, e.g. (foo | bar ). So how do they indicate a literal '|'? Looking at Infixop, they evidently just put the '|' on a single line by itself and expect you to figure that since there's nothing else there, it can't represent alternation; must be literal. :roll: Gee, thanks for consistency.
Also, what about '.'? Is it italicized, or not? Looking at the HTML source I see that frequently they use an italicized '.' when they want a literal. Which makes no sense at all, except that if you don't look at the source, no one can tell if it's italicized or not, so I guess the italics are just there to confuse people who actually look at the HTML source. Let that be a lesson for us.
And lastly, look at this line from IdentifierSuffix:
. ( class | this | super Arguments | new InnerCreator )
The initial ( is not italicized, while the final one is. Note also that class and this are italicized, but not super. This is a special usage of italics, used to denote the fact that the author is on crack, and should not be trusted. Well, maybe not the original author(s); maybe it was whoever converted to HTML. (I don't have a print copy of the JLS handy - could someone else check this?) The point is though - the JLS can contain errors. Though they're pretty rare. And moere annoyingly, there doesn't seem to be any published errata , though Gilad Bracha has acknowledged that they do know about some errors. (Guess I'll send another e-mail now.) Anyway, don't get too locked into literal interpretation of everything that you overlook the possibility that they've simply screwed up (in some very subtle way, no doubt).
</digression>
Cheers...
[ September 13, 2003: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I sent an error report to Sun on this.
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
Thank you very much Jim for your detailed answer. Thank you for following through and discussing the cast expression.
Now I understand it. I have internalized it this way
[ (] BracketsOpt . class | Expression ] )
means [ (] x | y ])
which means either of these two productions
[ ] x
[ y ]
----
Given the expression (T)a().b().c(), I am trying to explain why the parser evaluates all of a().b().c() and then applies the cast operator. I want to explain it using the JLS 18.1 grammar.
I know how to describe the expression in terms of Expression3, Primary, Selector, IdentifierSuffix and Arguments. But I don�t know how that explains why the parser does not stop after evaluating a() and then apply the cast.
Question: What is it about the grammar that tells us the parser will evaluate all of a().b().c() and not just a(), before applying the cast?
(My ulterior motive is to show that dot is not an operator in the precedence chart. Instead, you have to apply the grammar. According to the JLS, dot is a separator, not an operator. Then I will be ready to compare new a().b() and (T)a().b())
[ September 13, 2003: Message edited by: Marlene Miller ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Given the expression (T)a().b().c(), I am trying to explain why the parser evaluates all of a().b().c() and then applies the cast operator. I want to explain it using the JLS 18.1 grammar.
I know how to describe the expression in terms of Expression3, Primary, Selector, IdentifierSuffix and Arguments. But I don�t know how that explains why the parser does not stop after evaluating a() and then apply the cast.
Question: What is it about the grammar the tells us the parser will evaluate all of a().b().c() and not just a(), before applying the cast?

Well now, that's an interesting question. I may be wrong, but I don't believe that the grammar in JLS 18.1 does provide you with that level of information. It defines rules for parsing a .java file, which is the first step of compilation - but there's no requirement for the grammar to provide all details of how to compile a language, such as order of precedence, or order of execution. (Recall that these are not the same thing in many cases.) As an example, consider the expression
a + b * c
If we follow the grammar of 18.1, we get a match following this path

(Hope that's clear.) So great. What does that tell us? We know which parts are identifiers, and which are Infixops. OK. Does this tell us anything about order of operation? I don't think so. Try evaluating
a * b + c
instead. You'll end up with the same result:

So nothing in the grammar has told us anything about whether the * is evaluated before the +, or a before b, etc. That info isn't in the grammar, it's elsewhere in the JLS.
I believe the situation is much the same with your cast question. The grammar rules don't, for example, tell us anything about the cast other than the fact that it's a ( followed by a Type followed by ). It doesn't tell us anything about what that means, never even identifies the cast as a Cast. So I don't think we can expect it to tell us much about when the cast operates, if it doesn't even go so far as to recognize that it is indeed a cast.
Of course there are actually two differnt grammars contained in the JLS - the one in 18.1 is the "lean, mean" version designed more for compilerefficiency than for explaining things to humans. Alternately, you could use the grammer which is strewn throughat the rest of the JLS ("piecemeal" as they say); this is actually designed to provide more explanatory power along the way. E.g. it does end up implicitly describing the order of precedence of diffenet operators. But as we've previously discussed, that doesn't tell us about order of evaluation, which instead is described in text, in JLS 15.7.
So anyway, I don't think the grammar is going to give you the level of detail you're interested in.
(My ulterior motive is to show that dot is not an operator in the precedence chart. Instead, you have to apply the grammar. According to the JLS, dot is a separator, not an operator. Then I will be ready to compare new a().b() and (T)a().b())
Again, interesting. I agree that the JLS does not support the notion that . is an operator. However I've always sort of assumed that was more a choice they made in terms of how they chose to describe it. I think that if they'd wanted to describe it as an operator, they could have, and cousl have made the language behave the same way that it does. Much like the JLS doesn't really support the notion of a table operator precedence; that's just an alternate way of presenting the information in the JLS, which makes operators more understandable. If I recall correctly, . is an operator in C++ at least, and I tend to think that Java's behavior is close enough to C++ that they could've done the same here if they felt like it. Maybe I'm missing something though. I'll be interested to hear your argument.
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
Thank you for your explanation. I know it takes time to write a clear and extended answer. Thank you for your time. Your table is a nice way to show a grammar expansion.
I am glad to hear a perspective that does not agree with the way I have figured things. I need to know if I am wrong, so that I don�t spread incorrect explanations to trusting souls.
Consider (T)a().b()
(T)a() is a
(Type) Expression3
...
(Type) Identifier ()
a().b() is a
Expression3
Primary Selector
...
Identifier () . Identifier ()
I can describe (T)a().b() as
(Type) Expression3
I don�t see a way to describe (T)a().b() as
Expression3 . b()
That suggests to me the parser dictates that (T) operates on a().b() Even if it is reasonable (to some of us), that does not make it correct.
Question 1: What do think of this argument?
Question 2: How do the Java compiler writers know what to do? Is it time to ask them?
(If you are curious about the reason for these questions - the original problem is to explain new Thread().start(). The Java Programming Language calls both new and dot operators. dot is suppose to have higher precedence than new, as noticed by Gopal Shah. Well, well.)
[ September 14, 2003: Message edited by: Marlene Miller ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
That suggests to me the parser dictates that (T) operates on a().b() Even if it is reasonable (to some of us), that does not make it correct.
Question 1: What do think of this argument?

I agree with both statements. It does suggest that (T) operates on a().b(), but it doesn't tell us for sure.
Question 2: Do you think the only way to know for sure is to query the Java compiler writers?
No. For one thing, that would just tell us what a particular compiler implementation did; it doesn't tell us what other compilers can or should do. I think the proper course here is to look elsewhere in the JLS for this info. Unfortunatly, I can't quite seem to find it there either. We have:
15.7.2
The Java programming language also guarantees that every operand of an operator (except the conditional operators &&, ||, and ? : ) appears to be fully evaluated before any part of the operation itself is performed.

That's pretty close, if we can just establish what's the operand, and what's the operator.
15.16
A cast expression converts, at run time, a value of one numeric type to a similar value of another numeric type; or confirms, at compile time, that the type of an expression is boolean; or checks, at run time, that a reference value refers to an object whose class is compatible with a specified reference type.
CastExpression:
( PrimitiveType Dimsopt ) UnaryExpression
( ReferenceType ) UnaryExpressionNotPlusMinus
The type of a cast expression is the type whose name appears within the parentheses. (The parentheses and the type they contain are sometimes called the cast operator.) The result of a cast expression is not a variable, but a value, even if the result of the operand expression is a variable.
...
At run time, the operand value is converted by casting conversion (�5.5) to the type specified by the cast operator.

I emphasized the part that tells us clearly which part the "cast operator" is. That's good. But I can't find anything that actually tells us what an operand is. Or more particularly, what is the operand associated with the cast. We can emply "common sense" and assume (correctly) that the operand in this case that the operand is the other stuff, everything after the cast operator. Which in this case means either the UnaryExpression or the UnaryExpressionNotPlusMinus. These are well-defined (should we wish to follow the full chain of productions in the grammar. But the one link of establishing "what's the operand" is never actually established in the JLS, that I can find. We're left to guess. Note that it may seem "obvious", but let's face it, lots of other "obvious" things need to be questioned carefully here to properly usnderstand. Otherwise it would be obvious to many people that in the expression a + (b + c), b + c is evaluated before a. Which it actually isn't. Also, recall that there are operators (the postfix operators) for whom the operand acually precedes the operator. So IMO it's not really that obvious where the operand to an operator is, if they never ever tell us.
OK, I'm probably not going to bother writing another email to Sun on this particular issue, as the truth is, I do know what they mean in this case. But that comes from an external understanding of the term "operand", not from the JLS. They could have fixed this easily with one more line. As it is though, the grammar productions jsut sort of sit there amidst the rest of the JLS, without explicit meaningful connection to the text. :roll:
Sorry Marlene, you caught me in a grumpy mood, and a sloppily written bit of the JLS is a good target for my ire.
(I have tried to get extract the essence of the problem from the context, but I can tell you like to know the context.
Who, me?
The original problem is to explain new Thread().start(). The Java Programming Language calls both new and dot operators. dot is suppose to have higher precedence than new, as noticed by Gopal Shah. Well, well.)
Interesting, I never noticed that. I don't have a copy of TJPL, but I see a table in the Java Tutorial which seems to support this, here. But quite simply, I don't think it's correct. I suspect that the precendence table was inherited from C++, then replaced eventually with the JLS grammar productions, and perhaps no one looked closely enough at the precednece table after that, since it had no official status. If anyone can suggest a reason why the . "operator" is listed as high as it is, above new, I'd be interested in hearing it.
Note also that regardless of "precedence", the parsing rules (from the grammar) do implicitly support associating the "new" with "Thread()" but not the ".start()". It's just a pain to trace through all the connections...
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
Thank you again Jim. Your ideas are very helpful. It is a pleasure to work with someone who can separate intuition from logic.
I think I have figured it out! Or almost. I fear that I have used up my Jim-Yingst units of help for several weeks or months on this one. Just one more idea.
Let�s forget the JLS 18.1 grammar.
By JLS 15.14, a Primary (JLS 15.8) has higher precedence than a cast operator.
Given (T)a().b(), a().b() is a Primary expression.
Given new Thread().start(), new Thread() is a Primary Expression.
[ September 15, 2003: Message edited by: Marlene Miller ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I fear that I have used up my Jim-Yingst units of help for several weeks or months on this one.

Don't worry, I only spend a lot of time if I think the question is interesting, so it's rewarding to me too. Of course, if I don't spend a lot of time on some question in the future, it may indicate I don't think it's interesting - or it may well indicate that I don't have the time then. So don't take it personally.
Just one more idea.
Let�s forget the JLS 18.1 grammar.
By JLS 15.14, a Primary (JLS 15.8) has higher precedence than a cast operator.

I'm not seeing the part of 15.14. that indicates this. Do you mean 15.16, with the CastExpression production? Plus a bunch of other definitions in preceding sections which establish that the thing after the ( ReferenceType ) can be a Primary. And 15.8 and 15.12 establish that a().b() can be a primary.
Given (T)a().b(), a().b() is a Primary expression.
Yes. It's not 100% clear that this means that the Primary a().b() is the operand of the cast operator, but if it is, then it must be evaluated before the cast operation. And common sense says "what else could the operand be?" but that's still not as rigorous as I would lik, conisdering the rigor of the rest of the definition. Oh well...
Given new Thread().start(), new Thread() is a Primary Expression.
Yes. 15.8 and 15.9 establish that new Thread() can be a Primary, and then 15.12 allows us to combine that Primary with the remaining .start() to form a MethodInvocation. And then 15.12.4.1 actually does tell us explicitly that, since the method is not static, the Primary must be evaluated before the method can be executed. (See, they really do have this sort of info here in the JLS, usually.)
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
This is as far as I can get, trying to explain (T)a().b()
Here are productions from JLS 15.14 and JLS 15.15 (with not-relevant parts removed).

It looks like the second part of a cast expression is a Primary or ExpressionName or another cast expression.
Somehow this implies a Primary composed of Primary's is evaluated before the cast is applied. Somehow this implies evaluation of a().b() does not stop at a().
I concur. Oh well...
[ September 16, 2003: Message edited by: Marlene Miller ]
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
#29 Don�t know what you don�t know.
It is essential not to profess to know, or to seem to know, or to accept that someone else knows, that which is unknown... At virtually every stage of even the most successful software projects, there are large numbers of very important things that are unknown. It is acceptable--even mandatory--to articulate your ignorance, so that no one misjudges the state of things, how much is still unknown.
--My favorite one from Jim McCarthy's Dynamics of Software Development
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Somehow this implies a Primary composed of Primary's is evaluated before the cast is applied. Somehow this implies evaluation of a().b() does not stop at a().
Well we do have the statement that "every operand of an operator ... appears to be fully evaluated before any part of the operation itself is performed." And the (T) is identified as the cast operator. So clearly the "operand" must have been fully evaluated beforehand. So given
CastExpression:
( PrimitiveType Dimsopt ) UnaryExpression
( ReferenceType ) UnaryExpressionNotPlusMinus
we just need confirmation of the "obvious" assertion that the UnaryExpression or UnaryExpressionNotPlusMinus is the operand. That would tell us that the a().b() must have been fully evaluated (or "appear to be" which means we can't tell the difference if it wasn't). We don't have this confirmation in the JLS, but if we just pretend they actually said what we know is true...
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
It is essential not to profess to know, or to seem to know, or to accept that someone else knows, that which is unknown.
Dang, I saw this just after posting the above. You got me. Ah, well...
Marlene Miller
Ranch Hand

Joined: Mar 05, 2003
Posts: 1391
Oh, that's funny. What a coincidence. Thank you for bouncing ideas with me.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: JLS formal grammar
 
Similar Threads
array confusion
Garbage for objects and strings ...
Arrays and Order-of-precedence
instance varibles and static methods????
Arrays