aspose file tools*
The moose likes Java in General and the fly likes How to do regex searches / matches data from external file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to do regex searches / matches data from external file " Watch "How to do regex searches / matches data from external file " New topic
Author

How to do regex searches / matches data from external file

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Regex & Java Gurus,

I would like to find out whether it is possible to do pattern matches against data from an external file during a regex searches in Java. Below is an example of the type of string in Employee.dat:

ID Firstname Surname ………
001 John Smith .………
002 Carmen Brown ……….
….......

I am searching for a list of firstnames and surnames which are becoming too long to be included in the regex itself. As a result, is it possible to achieve the following objectives with regex without splitting the line into individual field with multiple regex:

( i ) search for firstnames in second column against data from external file (e.g. Patient.dat).
( ii ) search for surnames from third column against data from external file (e.g. Patient.dat).

Keep in mind that the order of searches must be from left to right. Perhaps the lookaround solution is possible but I am not familiar how it works.

I have no problem using regex to pattern string searches in Java in general.

Your assistance would be very much appreciated.

Thanks in advance,

Jack
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18108
    
  39


The Java regex stuff allow the seaching of patterns in a buffer. The Java IO stuff allows you to read files into buffers.

So, if you want both, you will have to write the code yourself -- the loads files into buffers, how you want it, and seach it, as how you want it.


I wouldn't be surprised if there is a library that does it, but in my opinion, it is probably easier to just code it.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Henry,

Thank you for offering your suggestion.

Are you able to provide a simple example of how to search patterns in a buffer? I can’t seem to find any example around. On the other hand, I am familiar with how Java IO buffering works but still cannot picture how the two functionalities working together, most likely because I don’t know how pattern searches with buffering.

It would be much appreciated if you could provide a little more detail / example that entails your suggestion.

Thanks again,

Jack
Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Harsha,

Thank you for your response to this post.

Your suggestion would work well assuming that the names are simple without exceptions. For instance, it is not uncommon to have 2 letters first or last names, particularly for non-English names. As a result, I use a combination of quantifier and lookahead to work out the end of the names. However, both of these regex functionalities are not always clear cut, and hence, cannot be used to accurately split up the names into their columns for each record.

Consequently, I find regex to be more flexible to pattern matching especially when it comes to examples that are not clear cut. I gather that regex doesn’t provide pattern matches between matching string and data that reside on external file? Is this true? Otherwise, are you able to provide an example of how it works?

Many thanks again,

Jack
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
can you provide us uncommon names ? so that we can atleast try. As far as I know, Regex does work well with files.
Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Harsha,

Below are some long and unusual Sri Lankan names as an example:

wijegunawardene, saparamadu;
Sinna Lingam KarupPaiya;
Uda Walawwe Mahim Bandaralage Chanaka Asanga Welegedara;
Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas.


Are you able to show how regex matches patterns from current string / line with data (e.g. names) reside in an external ASCII file such as Patient.dat. I am looking for examples of how Java searches patterns in a buffer that comes from a file.

Thanks a lot,

Jack
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
This is how I usually regex when working with files. and what separates each column from another?

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi Hersha,

Thanks for yet another example but I am beginning to realize that I have not fully explained where my problem lie. So let’s try it again:

( i ) I am reading the first primary file line by line. e.g. Employee.dat.
( ii ) While searching for a long list of names on each line, that are getting way too long to be included in the regex such as the following:

String regex = "(\\d)+ (\\w(?:John|Carmen|.....))+ (\\w(?:Smith|Brown|........)) …..

( iii ) Lookup the names in Patient.dat from within regex instead.
( iv ) Return true if these names can be found in Patient.dat.

In other word, I am matching some patterns between 2 files using regex to determine how many words contribute firstname, and same for surname as well as matching both names and is doing a good job of it. Note that this method is different to the novice approach of direct comparison between the same 2 files. The earlier approach is reading Patient.dat from within regex while the latter is from Java I/O.

Btw, what is the difference between the two examples you have provided? Does Scanner class provide the capability of reading the secondary Patient.dat into the buffer, so that they could be used by regex as lookup, while matching patterns from primary Employee.dat? Likewise, are those 2 files being cached / buffered during pattern matching?

You seem to only show a one dimensional hardcoded regex matching content from a single input file while I am looking for a dynamic regex capable of secondary file lookups (Patient.dat) to match patterns against names in primary (Employee.dat) file.

Have I explained myself clearer this time?

Many thanks for your patience and persistence to help someone in need.

Jack
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
As of now, I don't have a solution for your problem. sorry.

But I hope , senior members of the forum will provide you a good solution. Happy Weekend!
Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Sounds like you understanding what I am on about Harsha, that is progress in itself.

I can do with feedbacks on whether this approach is the right way to go, or there is a better way to getting the same objective.

Thanks all the same,

Jack
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to do regex searches / matches data from external file
 
Similar Threads
Regex Question with * quantifier from K&B
Finding a file matching a pattern?
Url Matching Algorithm Used By Tomcat for web.xml
Regex Expression.
Parsing Files with spaces