aspose file tools*
The moose likes PHP and the fly likes [Solved] Help With Regular Expressions/Scraping Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Languages » PHP
Bookmark "[Solved] Help With Regular Expressions/Scraping" Watch "[Solved] Help With Regular Expressions/Scraping" New topic
Author

[Solved] Help With Regular Expressions/Scraping

Brandon Golway
Greenhorn

Joined: Oct 19, 2010
Posts: 23
I'm trying to extract the text inside of the <td> tags (such as: "West Orange, NJ", "Saint Barnabas Health Care System", and "Manager Field Services North") and the contents of the href attribute from data scraped by a php script. The script itself works, I just don't know how to formulate the expressions.
Here's a sample of HTML that contains the job info:




This is what I've tried $location= '/location (.+?)/'; but it just gives back array(2) { [0]=> string(10) "location j" [1]=> string(1) "j" } j


Here's the scraper too in case you need to see that: curl_scraper.php
Thanks.
Brandon Golway
Greenhorn

Joined: Oct 19, 2010
Posts: 23
I got it to output "West Orange, NJ" using this expression: $regex_location= '/<td class=\"location\">(.+?)<\/td>/';

There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 507
    
    3
>There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
That is not an unusual return structure of the matches argument. It results from the pattern containing one pair of round brackets for group/backreference. In this case, it is the "(.+?)" part of the pattern. To access it, it is that simple, unless you've other thing in your mind more sophisticated.
Brandon Golway
Greenhorn

Joined: Oct 19, 2010
Posts: 23
got it to work all i was missing was _all from preg_match

 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: [Solved] Help With Regular Expressions/Scraping