Meaningless Drivel is fun!*
The moose likes Other JSE/JEE APIs and the fly likes Parsing a date from arbitrary text. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Parsing a date from arbitrary text." Watch "Parsing a date from arbitrary text." New topic
Author

Parsing a date from arbitrary text.

Basil Bourque
Greenhorn

Joined: Dec 30, 2003
Posts: 7

Is there any robust implementation of a parser for extracting a date from arbitrary text?
By arbitrary, I mean anything a user may type into a text field on an HTML form, for example.
The DateFormat class included with Java parses dates from text, but you must specify the _exact_format of the text. If I knew the exact format, and the user used it perfectly, I wouldn't need a parser library!
I'm looking for a more robust parser library that can make sense out of various input, expecially locale-aware (i18n) ones.
--Basil Bourque
Mark Vedder
Ranch Hand

Joined: Dec 17, 2003
Posts: 624

"Holly un-validated input Batman!"
Parsing an arbitrary "anything goes" string into a Date object would be pretty hard if you were limiting it to just a single locale. Throw in I18N and that would be a pretty tall order indeed. Just think of all the possibilities:
  • order of month, day & year;
  • abbreviations vs. fully typed months and day names,
  • 4 digit year vs 2 digit year
  • 1 vs 2 digit month and date
  • different separators ('/' '-' '.' ' ')
  • does the string represent month & year; month day & year; month day, year, & time;
  • etc, etc

  • { if any math gurus want to calculated the total number of permutations, knock yourself out. Let s know the result }
    Take for example the string "200310" � is that
  • The month of October 2003
  • Oct 1, 2003
  • March 10, 2020
  • March 10, 0020
  • Oct 3, 2020
  • today at 8:03:10pm


  • Toss in the ongoing argument between my mother and I as to whether September is abbreviated "Sept" or "Sep" (let alone "Sept.") and the craziness just goes on an on.
    IMHO, you really need some kind of structure to the string in order to make parsing it a reasonably surmountable task. And if you are designing the input form, that gives you the opportunity to do so. It�s when you have raw collection of preexisting data and the data has no common structure or format that things can get very hard and tricky. And even then, it is usually a case of having multiple formats present, not just arbitrary data.
    Take a look at JavaAlmanac.com examples e320. Formatting a Date Using a Custom Format and e323. Formatting and Parsing a Date for a Locale for some guidance and sample code. Also look at the setLenient( ) method of the DateFormat class in combination with the above examples, although by default, date parsing is lenient. You may be pleasantly surprised how lenient the parser can be. It's not all-knowing, but it does a pretty good job.
    Personally, I try an avoid using an input field/parameter of "Date" on a user input form � I always use multiple input fields/parameters of Month, Day and Year (and hour & minute if needed) � And even then I often use drop down selection rather than text boxes. Using separate parameters makes validation a lot easier. It also allows for easier I18N in that I can change the order of the fields on the input form as needed for a particular locale. If you do use a single "date" input field, you simply must tell the user what format to use, and then validate that String before passing it on to your DateFormatter. (Who among us hasn't entered Feb 30th into a web form just to see if the programmer is using proper validation? And if I am the only one, then maybe I do need to get a life like my dog keeps telling me :roll: )
    Those are my thoughts on the subject. Others may have additional comments, or know of something that I am not aware of. I look forward to their opinions as well...
    Dirk Schreckmann
    Sheriff

    Joined: Dec 10, 2001
    Posts: 7023
    Welcome to JavaRanch, Basil!
    I'm moving this to the Other Java APIs forum...


    [How To Ask Good Questions] [JavaRanch FAQ Wiki] [JavaRanch Radio]
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Parsing a date from arbitrary text.