• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to identify some english word is number

 
Ranch Hand
Posts: 226
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am parsing some sentence which would be having english sentences with numbers like 87,000 or 8.302 or 45.43e3 or 54BA3E

how can i check whether a word is english word or its a number?
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First, you have to define in English what determines if a set of characters is a number or not. What, EXACTLY is allowed, and what EXACTLY is NOT allowed. Just writing down 3 or 4 examples may be enough for your brain, but there are a LOT of implicit assumptions there.

Once you decide what the rules are, then you can start coding them. But until you define what the rules are, writing any code is pointless.
 
Em Aiy
Ranch Hand
Posts: 226
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

fred rosenberger wrote:First, you have to define in English what determines if a set of characters is a number or not. What, EXACTLY is allowed, and what EXACTLY is NOT allowed. Just writing down 3 or 4 examples may be enough for your brain, but there are a LOT of implicit assumptions there.

Once you decide what the rules are, then you can start coding them. But until you define what the rules are, writing any code is pointless.


lets say the rules are numeric values in any format i.e 88,000 or 88000 or 88,000.00
the hexadecimal numbers
the floating point with "power" sign or suffix

I can write the code to iterate through every character of a word to determine what i want .. I wanted to ask is there any built in support in java? i.e some methods like isNumber()
 
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There's no built-in methods that do what you're asking for. There are a few algorithms I could think of to start. Obviously, there some ambiguity in the rules you've given thus far. would DEAD or FADE be parsed as a number or a w
 
Em Aiy
Ranch Hand
Posts: 226
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Garrett Rowe wrote:There's no built-in methods that do what you're asking for. There are a few algorithms I could think of to start. Obviously, there some ambiguity in the rules you've given thus far. would DEAD or FADE be parsed as a number or a w


actually i was about to write some code to to check whether some word is number or not so i thought better search around rather than reinventing the wheel.

I was confuse since the number like 88000 (is easy to detect) but then i would have to tackle these cases as well
88,000 (coma separated)
88,000.00 (proper decimal notation)

so thats why i was asking this question.

Talking about rules. I would say again lets say the rules are "basic". Like you are reading a newspaper and there are chances that few numebrs can be there in news and you have to detect those. now you can imagine what kind a number could ever be appear in news papers + "the hexadecimal notation"
 
Garrett Rowe
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well... there is Integer.parseInt(String) to convert a String to an int, if the String isn't parsable, that method throws a NumberFormatException, which you could catch and try again. You could strip the punctuation out of each token so that that doesn't cause failures:

 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Okay, so if I encountered the string "two billion" then that would be a number? Or are you only interested in numbers rendered as digits? Is there a limit on the number of digits or would a string of 87 digits be a number?

And what about i (the square root of minus one)? Or e (the root of the natural logarithms)? Or pi?
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When I saw "word means number" I interpreted that as "word meaning a natural number", so you would include zero, nought, naught, aught, nothing, O, cipher, nil, love, duck etc. And that is before you have even got to "one"

"Natural number" means a member of the set ℕ, ie non-negative integers, or 0 ... ∞.
 
Bartender
Posts: 11497
19
Android Google Web Toolkit Mac Eclipse IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How about stuff like dozen, score, pair?
 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I understand your problem well, java.util.Scanner with its methods hasNextInt(), hasNextDouble(), nextInt(), nextDouble() and so on may help.
 
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I have written the solution for identifying if the words with digits are a valid number. While this solution is simple, it can be expanded by adding more regex to the Pattern.




Please let me know in what are the cases this program would fail and if possible how that could be avoided.

cheers
K
 
Em Aiy
Ranch Hand
Posts: 226
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Okay, so if I encountered the string "two billion" then that would be a number? Or are you only interested in numbers rendered as digits? Is there a limit on the number of digits or would a string of 87 digits be a number?

And what about i (the square root of minus one)? Or e (the root of the natural logarithms)? Or pi?


I have to detect only digits .. no the words which means a number.

one million - should not be detected
1,000,000 - should be detected
 
fred rosenberger
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Again, giving examples like "THIS is good, THIS is bad" is not a way to write programs. you need to define exactly what is allowed, or what causes it to be excluded.

First, how will you get the tokens from the string? is "100 273" the number 100,273 or is it TWO numbers, 100 and 273?

I'm trying to get you to define the rules. once you have a well defined set, you can code to them. your rules may be

1) separate tokens based on the space character.
2) remove all punctuation from each token, except a '.' between two digits
3) a '-' is optional as the first character, but nowhere else.
4) There is an optional number of digits or characters A-F (are lowercase allowed?)
5) there is an optional decimal point
6) There is an optional number of digits or characters A-F (are lowercase allowed?)
7) there is an optional character 'e'
8) if there is an 'e', then there can be an optional number of digits

Will this work? i don't know. do you want to allow "1.738 e4"? the above rules would fail seeing this as 17380 since there is a space between the 8 and the 'e'.

We can't tell you what the rules should be because we don't know your requirements. You have to tell us that.


 
what if we put solar panels on top of the semi truck trailer? That could power this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic