This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

[Solved] Help With Regular Expressions/Scraping

 
Brandon Golway
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to extract the text inside of the <td> tags (such as: "West Orange, NJ", "Saint Barnabas Health Care System", and "Manager Field Services North") and the contents of the href attribute from data scraped by a php script. The script itself works, I just don't know how to formulate the expressions.
Here's a sample of HTML that contains the job info:




This is what I've tried $location= '/location (.+?)/'; but it just gives back array(2) { [0]=> string(10) "location j" [1]=> string(1) "j" } j


Here's the scraper too in case you need to see that: curl_scraper.php
Thanks.
 
Brandon Golway
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I got it to output "West Orange, NJ" using this expression: $regex_location= '/<td class=\"location\">(.+?)<\/td>/';

There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
 
g tsuji
Ranch Hand
Posts: 656
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
>There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
That is not an unusual return structure of the matches argument. It results from the pattern containing one pair of round brackets for group/backreference. In this case, it is the "(.+?)" part of the pattern. To access it, it is that simple, unless you've other thing in your mind more sophisticated.
 
Brandon Golway
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
got it to work all i was missing was _all from preg_match

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic