• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

What is parsing ?

 
justin smythhe
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can someone give a simple explanation to begin with ? The wikipedia definition is too confusing as of now.
Please tell me which one is a good/the best definition:
Answers.com-
In general, parsing is when you take a large chunk of data and break it down into smaller, more useful chunks.

When a compiler or interpreter is turning the source code of a programming language into executable code, it must first parse that source code so it knows what statements the program is trying to use. It can then use that information to translate your source into computer-understandable machine code.


What's the best way to explain parsing to a new programmer?

Wikipedia
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. Parsing can also be used as a linguistic term, for instance when discussing how phrases are divided up in garden path sentences.

Parsing is also an earlier term for the diagramming of sentences of natural languages, and is still used for the diagramming of inflected languages, such as the Romance languages or Latin. The term parsing comes from Latin pars (ōrātiōnis), meaning part (of speech).
 
Alex Armenteros
Ranch Hand
Posts: 75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Parsing in java means basicly reading a text and transforming it into another type

For example it exist the function parseInt that takes a text and (if it can) transform it into a number so this way it's possible to perform operations such a substraction or multiplication.

A more complex function for example would be parseDictionary (this one does not exist) that would transform this:

House:Place where people live;Office:Place where people work;Cat:Feline pet

in an Object.

This is my idea of what means parsing.
 
Winston Gutkowski
Bartender
Pie
Posts: 10087
55
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
justin smythhe wrote:Can someone give a simple explanation to begin with ?

int i = Integer.parseInt("-83456");

Please tell me which one is a good/the best definition:

I quite like the Answers.com one myself; specifically, the first line.

The main thing to remember is that Strings make bad substitutes for other types. That means: if you have a number, make it a number, not a String.

The problem is that sometimes you have no choice. For example, if you're getting input from a user, they probably have to enter it via a keyboard, which is a character-oriented device, so most methods associated with it will return either individual characters or a String (eg, Scanner.nextLine()). Therefore, if your program wants, say, an integer, it makes sense to parse the String they enter and convert it to an Integer, viz:
String line = new Scanner(System.in).nextLine();
int userInt = Integer.parseInt(line);


The above is very simplistic, but hopefully it illustrates the point.

Winston

[Edit] Actually looking at the above, it's an explanation of when you parse, rather than what it is. But feel free to come back if you need any more clarification.
 
Campbell Ritchie
Sheriff
Posts: 48381
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Parsing generally means taking text apart and deciding what the relationships between successive tokens are. It is a grammatical term, and can be used of any language, natural, arithmetic, logic, computer languages, etc. The parseInt() method does something very similar, converting the String to an int.
 
justin smythhe
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Parsing generally means taking text apart and deciding what the relationships between successive tokens are. It is a grammatical term, and can be used of any language, natural, arithmetic, logic, computer languages, etc. The parseInt() method does something very similar, converting the String to an int.


What are theses "tokens" ? Can you give me a (slightly bigger) example to illustrate your explanation. If possible, please show me something that needs to be parsed,
why in needs to be parsed and what is the result after it has been parsed.
 
Winston Gutkowski
Bartender
Pie
Posts: 10087
55
Eclipse IDE Hibernate Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
justin smythhe wrote:What are theses "tokens" ? Can you give me a (slightly bigger) example to illustrate your explanation. If possible, please show me something that needs to be parsed,
why in needs to be parsed and what is the result after it has been parsed.

Here you go. 5 seconds to type "parsing example" into Google, and 2 to pull up the page.
As for your first question, another 10 second on Google gave me this.

Winston
 
Stephan van Hulst
Bartender
Posts: 5334
48
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, let's take a simple mathematical expression:

"-2*x^3 + 5*x - 4".

This expression is in textual form. Parsing it means identifying the various parts of the expression according to some set of rules, and how they relate to each other. For instance, a specific parser may identify the expression as a polynomial and determine that -2*x^3 is the leading term, and that -2 is a coefficient and that 3 is a power. So parsing assigns a meaning to a chunk of data.

Different parsers use different rules and interpret data differently. "424-555-5555" can be seen as an arithmetical expression by one parser, and can be seen as a Los Angeles phone number by another.
 
Campbell Ritchie
Sheriff
Posts: 48381
56
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote: . . . an arithmetical expression by one parser, and can be seen as a Los Angeles phone number by another.
Because those parsers are using different grammars.
 
Paul Clapham
Sheriff
Pie
Posts: 20725
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In the XML context, I think of "parsing" as taking an undifferentiated stream of characters (an XML document) and converting that stream of characters into an internal format. Depending on the parser, this format might consist of a stream of SAX events or a DOM tree or something else. In this context the opposite of "parsing" is "serializing", which converts said internal format back to an undifferentiated stream of characters.

Again this is interpreting a text in terms of a formal grammar, but it highlights the common usage that parsing constitutes creating structured data from unstructured data.
 
Campbell Ritchie
Sheriff
Posts: 48381
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, it doesn’t mean creating structured data from unstructured. It is more akin to working out the structure behind the data. The data are actually not changed by parsing, but they can be accessed differently. Parsing is actually only a part of that process.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic