File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes How to identify some english word is number Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to identify some english word is number" Watch "How to identify some english word is number" New topic
Author

How to identify some english word is number

Em Aiy
Ranch Hand

Joined: May 11, 2006
Posts: 226
I am parsing some sentence which would be having english sentences with numbers like 87,000 or 8.302 or 45.43e3 or 54BA3E

how can i check whether a word is english word or its a number?


The difference between <b>failure</b> and <b>success</b> is often being <b>right</b> and being <b>exactly right</b>.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11475
    
  16

First, you have to define in English what determines if a set of characters is a number or not. What, EXACTLY is allowed, and what EXACTLY is NOT allowed. Just writing down 3 or 4 examples may be enough for your brain, but there are a LOT of implicit assumptions there.

Once you decide what the rules are, then you can start coding them. But until you define what the rules are, writing any code is pointless.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Em Aiy
Ranch Hand

Joined: May 11, 2006
Posts: 226
fred rosenberger wrote:First, you have to define in English what determines if a set of characters is a number or not. What, EXACTLY is allowed, and what EXACTLY is NOT allowed. Just writing down 3 or 4 examples may be enough for your brain, but there are a LOT of implicit assumptions there.

Once you decide what the rules are, then you can start coding them. But until you define what the rules are, writing any code is pointless.

lets say the rules are numeric values in any format i.e 88,000 or 88000 or 88,000.00
the hexadecimal numbers
the floating point with "power" sign or suffix

I can write the code to iterate through every character of a word to determine what i want .. I wanted to ask is there any built in support in java? i.e some methods like isNumber()
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
There's no built-in methods that do what you're asking for. There are a few algorithms I could think of to start. Obviously, there some ambiguity in the rules you've given thus far. would DEAD or FADE be parsed as a number or a w


Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter
Em Aiy
Ranch Hand

Joined: May 11, 2006
Posts: 226
Garrett Rowe wrote:There's no built-in methods that do what you're asking for. There are a few algorithms I could think of to start. Obviously, there some ambiguity in the rules you've given thus far. would DEAD or FADE be parsed as a number or a w

actually i was about to write some code to to check whether some word is number or not so i thought better search around rather than reinventing the wheel.

I was confuse since the number like 88000 (is easy to detect) but then i would have to tackle these cases as well
88,000 (coma separated)
88,000.00 (proper decimal notation)

so thats why i was asking this question.

Talking about rules. I would say again lets say the rules are "basic". Like you are reading a newspaper and there are chances that few numebrs can be there in news and you have to detect those. now you can imagine what kind a number could ever be appear in news papers + "the hexadecimal notation"
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Well... there is Integer.parseInt(String) to convert a String to an int, if the String isn't parsable, that method throws a NumberFormatException, which you could catch and try again. You could strip the punctuation out of each token so that that doesn't cause failures:

Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18882
    
    8

Okay, so if I encountered the string "two billion" then that would be a number? Or are you only interested in numbers rendered as digits? Is there a limit on the number of digits or would a string of 87 digits be a number?

And what about i (the square root of minus one)? Or e (the root of the natural logarithms)? Or pi?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39784
    
  28
When I saw "word means number" I interpreted that as "word meaning a natural number", so you would include zero, nought, naught, aught, nothing, O, cipher, nil, love, duck etc. And that is before you have even got to "one"

"Natural number" means a member of the set ℕ, ie non-negative integers, or 0 ... ∞.
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10519
    
    9

How about stuff like dozen, score, pair?


[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Vidmantas Maskoliunas
Greenhorn

Joined: Nov 16, 2009
Posts: 22
If I understand your problem well, java.util.Scanner with its methods hasNextInt(), hasNextDouble(), nextInt(), nextDouble() and so on may help.


SCJP 6.0, willing to find Java job in NZ/AU and move there - LinkedIn profile - Java blog
Arjun Abhishek
Ranch Hand

Joined: Jul 08, 2008
Posts: 57
Hi

I have written the solution for identifying if the words with digits are a valid number. While this solution is simple, it can be expanded by adding more regex to the Pattern.




Please let me know in what are the cases this program would fail and if possible how that could be avoided.

cheers
K
Em Aiy
Ranch Hand

Joined: May 11, 2006
Posts: 226
Paul Clapham wrote:Okay, so if I encountered the string "two billion" then that would be a number? Or are you only interested in numbers rendered as digits? Is there a limit on the number of digits or would a string of 87 digits be a number?

And what about i (the square root of minus one)? Or e (the root of the natural logarithms)? Or pi?

I have to detect only digits .. no the words which means a number.

one million - should not be detected
1,000,000 - should be detected
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11475
    
  16

Again, giving examples like "THIS is good, THIS is bad" is not a way to write programs. you need to define exactly what is allowed, or what causes it to be excluded.

First, how will you get the tokens from the string? is "100 273" the number 100,273 or is it TWO numbers, 100 and 273?

I'm trying to get you to define the rules. once you have a well defined set, you can code to them. your rules may be

1) separate tokens based on the space character.
2) remove all punctuation from each token, except a '.' between two digits
3) a '-' is optional as the first character, but nowhere else.
4) There is an optional number of digits or characters A-F (are lowercase allowed?)
5) there is an optional decimal point
6) There is an optional number of digits or characters A-F (are lowercase allowed?)
7) there is an optional character 'e'
8) if there is an 'e', then there can be an optional number of digits

Will this work? i don't know. do you want to allow "1.738 e4"? the above rules would fail seeing this as 17380 since there is a space between the 8 and the 'e'.

We can't tell you what the rules should be because we don't know your requirements. You have to tell us that.


 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to identify some english word is number