Shoutbox

[php] Get Meta Data From Page - Printable Version

-Shoutbox (https://shoutbox.menthix.net)
+-- Forum: MsgHelp Archive (/forumdisplay.php?fid=58)
+--- Forum: Skype & Technology (/forumdisplay.php?fid=9)
+---- Forum: Tech Talk (/forumdisplay.php?fid=17)
+----- Thread: [php] Get Meta Data From Page (/showthread.php?tid=51821)

[php] Get Meta Data From Page by philo23 on 10-15-2005 at 03:14 PM

I Would Like To Know How To Get The <link><meta> data from pages. Using PHP, I am Making A Small Blog search engine and would like to index the rss from the page. Hope There Are A Few PHP Gurus Out There.


RE: [php] Get Meta Data From Page by -dt- on 10-15-2005 at 03:42 PM

err look up Regular Expressions
heres a good guide
http://sitescooper.org/tao_regexps.html

then use preg_match (link)
with something like /<link (?:[^\>]*)href="(.+)">/
to match the link and have it returned.


RE: [php] Get Meta Data From Page by philo23 on 10-15-2005 at 03:59 PM

code:
$filename = $url;
            $handle = fopen($filename, "r");
            $contents = fread($handle, filesize($filename));
            fclose($handle);
            if (preg_match('<link (?:[^\>]*)href="(.+)">', $content)) {
                $i=0;
                $rss = array();
                while (preg_match('<link (?:[^\>]*)href="(.+)">', $content)) {
                    $url = explode('<link (?:[^\>]*)href=', $content);
                    $rss[$i] = $url[0];
                    $i++;
                }
                $amount = count($rss);
                $i=0;
                while ($amount > $i) {
                    $rssquery = "INSERT INTO rss_link (id, site_id, url) VALUES ('', '$site_id', '$rss[$i]')";
                    mysql_query($rssquery);
                }
            }

I Am Guessing, this is straight off the top of my head, This Might Work If $url = The Url Of The Page I Am "indexing"?
RE: [php] Get Meta Data From Page by -dt- on 10-15-2005 at 04:40 PM

never ever use fopen for files outside your webserver. (i know atleast my host and my webserver have the url wrapper turned OFF so you cant even do that anyway)
use the curl library like. (which is way better)

code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$foo = curl_exec($ch);
curl_close($ch);

and $foo will be the downloaded data

and did you even read what preg_match did? the 3rd param is a varible witch will contain the matches , so why not check that then loop through it :-/

and I dont even get why your exploding the url by the regexp?

and your last while is a never ending loop because you never increase $i.
For is alot better for looping than that last while.
http://php.net/for
RE: [php] Get Meta Data From Page by philo23 on 10-15-2005 at 04:42 PM

opps lol missed the $i++; lol
whats is this curl?


RE: [php] Get Meta Data From Page by J-Thread on 10-15-2005 at 06:51 PM

The CURL library isn't part of the standard PHP install as far as I know...

Why don't use file_get_contents? I don't know servers that have disabled the fopen wrappers....

Otherwise you can use:

code:
function get_page($host, $path, $get) {
   
   $out = "GET /$path HTTP/1.1\r\nHost: $host\r\nConnection: Close\r\n\r\n";
   
   $fp = fsockopen($host, 80, $errno, $errstr, 30);
   $in = "";
   
   fwrite($fp, $out);
   $body = false;
   while (!feof($fp)) {
       $s = fgets($fp, 1024);
       if ( $body )
           $in .= $s;
       if ( $s == "\r\n" )
           $body = true;
   }
   
   fclose($fp);
   
   return $in;
}