• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to do regex searches / matches data from external file

 
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Regex & Java Gurus,

I would like to find out whether it is possible to do pattern matches against data from an external file during a regex searches in Java. Below is an example of the type of string in Employee.dat:

ID Firstname Surname ………
001 John Smith .………
002 Carmen Brown ……….
….......

I am searching for a list of firstnames and surnames which are becoming too long to be included in the regex itself. As a result, is it possible to achieve the following objectives with regex without splitting the line into individual field with multiple regex:

( i ) search for firstnames in second column against data from external file (e.g. Patient.dat).
( ii ) search for surnames from third column against data from external file (e.g. Patient.dat).

Keep in mind that the order of searches must be from left to right. Perhaps the lookaround solution is possible but I am not familiar how it works.

I have no problem using regex to pattern string searches in Java in general.

Your assistance would be very much appreciated.

Thanks in advance,

Jack
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

The Java regex stuff allow the seaching of patterns in a buffer. The Java IO stuff allows you to read files into buffers.

So, if you want both, you will have to write the code yourself -- the loads files into buffers, how you want it, and seach it, as how you want it.


I wouldn't be surprised if there is a library that does it, but in my opinion, it is probably easier to just code it.

Henry
 
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Henry,

Thank you for offering your suggestion.

Are you able to provide a simple example of how to search patterns in a buffer? I can’t seem to find any example around. On the other hand, I am familiar with how Java IO buffering works but still cannot picture how the two functionalities working together, most likely because I don’t know how pattern searches with buffering.

It would be much appreciated if you could provide a little more detail / example that entails your suggestion.

Thanks again,

Jack
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Harsha,

Thank you for your response to this post.

Your suggestion would work well assuming that the names are simple without exceptions. For instance, it is not uncommon to have 2 letters first or last names, particularly for non-English names. As a result, I use a combination of quantifier and lookahead to work out the end of the names. However, both of these regex functionalities are not always clear cut, and hence, cannot be used to accurately split up the names into their columns for each record.

Consequently, I find regex to be more flexible to pattern matching especially when it comes to examples that are not clear cut. I gather that regex doesn’t provide pattern matches between matching string and data that reside on external file? Is this true? Otherwise, are you able to provide an example of how it works?

Many thanks again,

Jack
 
Harsha Smith
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
can you provide us uncommon names ? so that we can atleast try. As far as I know, Regex does work well with files.
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Harsha,

Below are some long and unusual Sri Lankan names as an example:

wijegunawardene, saparamadu;
Sinna Lingam KarupPaiya;
Uda Walawwe Mahim Bandaralage Chanaka Asanga Welegedara;
Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas.


Are you able to show how regex matches patterns from current string / line with data (e.g. names) reside in an external ASCII file such as Patient.dat. I am looking for examples of how Java searches patterns in a buffer that comes from a file.

Thanks a lot,

Jack
 
Harsha Smith
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is how I usually regex when working with files. and what separates each column from another?

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Hersha,

Thanks for yet another example but I am beginning to realize that I have not fully explained where my problem lie. So let’s try it again:

( i ) I am reading the first primary file line by line. e.g. Employee.dat.
( ii ) While searching for a long list of names on each line, that are getting way too long to be included in the regex such as the following:

String regex = "(\\d)+ (\\w(?:John|Carmen|.....))+ (\\w(?:Smith|Brown|........)) …..

( iii ) Lookup the names in Patient.dat from within regex instead.
( iv ) Return true if these names can be found in Patient.dat.

In other word, I am matching some patterns between 2 files using regex to determine how many words contribute firstname, and same for surname as well as matching both names and is doing a good job of it. Note that this method is different to the novice approach of direct comparison between the same 2 files. The earlier approach is reading Patient.dat from within regex while the latter is from Java I/O.

Btw, what is the difference between the two examples you have provided? Does Scanner class provide the capability of reading the secondary Patient.dat into the buffer, so that they could be used by regex as lookup, while matching patterns from primary Employee.dat? Likewise, are those 2 files being cached / buffered during pattern matching?

You seem to only show a one dimensional hardcoded regex matching content from a single input file while I am looking for a dynamic regex capable of secondary file lookups (Patient.dat) to match patterns against names in primary (Employee.dat) file.

Have I explained myself clearer this time?

Many thanks for your patience and persistence to help someone in need.

Jack
 
Harsha Smith
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As of now, I don't have a solution for your problem. sorry.

But I hope , senior members of the forum will provide you a good solution. Happy Weekend!
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds like you understanding what I am on about Harsha, that is progress in itself.

I can do with feedbacks on whether this approach is the right way to go, or there is a better way to getting the same objective.

Thanks all the same,

Jack
 
Climb the rope! CLIMB THE ROPE! You too tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic