What happened to the Messenger Plus! forums on msghelp.net?
Shoutbox » MsgHelp Archive » Skype & Technology » Tech Talk » [php] Get Meta Data From Page

[php] Get Meta Data From Page
Author: Message:
philo23
Junior Member
**

Avatar
PHP Mad Head

Posts: 18
33 / Male / –
Joined: Oct 2005
O.P. [php] Get Meta Data From Page
I Would Like To Know How To Get The <link><meta> data from pages. Using PHP, I am Making A Small Blog search engine and would like to index the rss from the page. Hope There Are A Few PHP Gurus Out There.
10-15-2005 03:14 PM
Profile E-Mail PM Web Find Quote Report
-dt-
Scripting Contest Winner
*****

Avatar
;o

Posts: 1819
Reputation: 74
36 / Male / Flag
Joined: Mar 2004
RE: [php] Get Meta Data From Page
err look up Regular Expressions
heres a good guide
http://sitescooper.org/tao_regexps.html

then use preg_match (link)
with something like /<link (?:[^\>]*)href="(.+)">/
to match the link and have it returned.
[Image: dt2.0v2.png]      Happy Birthday, WDZ
10-15-2005 03:42 PM
Profile PM Web Find Quote Report
philo23
Junior Member
**

Avatar
PHP Mad Head

Posts: 18
33 / Male / –
Joined: Oct 2005
O.P. RE: [php] Get Meta Data From Page
code:
$filename = $url;
            $handle = fopen($filename, "r");
            $contents = fread($handle, filesize($filename));
            fclose($handle);
            if (preg_match('<link (?:[^\>]*)href="(.+)">', $content)) {
                $i=0;
                $rss = array();
                while (preg_match('<link (?:[^\>]*)href="(.+)">', $content)) {
                    $url = explode('<link (?:[^\>]*)href=', $content);
                    $rss[$i] = $url[0];
                    $i++;
                }
                $amount = count($rss);
                $i=0;
                while ($amount > $i) {
                    $rssquery = "INSERT INTO rss_link (id, site_id, url) VALUES ('', '$site_id', '$rss[$i]')";
                    mysql_query($rssquery);
                }
            }

I Am Guessing, this is straight off the top of my head, This Might Work If $url = The Url Of The Page I Am "indexing"?
10-15-2005 03:59 PM
Profile E-Mail PM Web Find Quote Report
-dt-
Scripting Contest Winner
*****

Avatar
;o

Posts: 1819
Reputation: 74
36 / Male / Flag
Joined: Mar 2004
RE: [php] Get Meta Data From Page
never ever use fopen for files outside your webserver. (i know atleast my host and my webserver have the url wrapper turned OFF so you cant even do that anyway)
use the curl library like. (which is way better)
code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$foo = curl_exec($ch);
curl_close($ch);

and $foo will be the downloaded data

and did you even read what preg_match did? the 3rd param is a varible witch will contain the matches , so why not check that then loop through it :-/

and I dont even get why your exploding the url by the regexp?

and your last while is a never ending loop because you never increase $i.
For is alot better for looping than that last while.
http://php.net/for
[Image: dt2.0v2.png]      Happy Birthday, WDZ
10-15-2005 04:40 PM
Profile PM Web Find Quote Report
philo23
Junior Member
**

Avatar
PHP Mad Head

Posts: 18
33 / Male / –
Joined: Oct 2005
O.P. RE: [php] Get Meta Data From Page
opps lol missed the $i++; lol
whats is this curl?
10-15-2005 04:42 PM
Profile E-Mail PM Web Find Quote Report
J-Thread
Full Member
***

Avatar

Posts: 467
Reputation: 8
– / Male / –
Joined: Jul 2004
RE: [php] Get Meta Data From Page
The CURL library isn't part of the standard PHP install as far as I know...

Why don't use file_get_contents? I don't know servers that have disabled the fopen wrappers....

Otherwise you can use:

code:
function get_page($host, $path, $get) {
   
   $out = "GET /$path HTTP/1.1\r\nHost: $host\r\nConnection: Close\r\n\r\n";
   
   $fp = fsockopen($host, 80, $errno, $errstr, 30);
   $in = "";
   
   fwrite($fp, $out);
   $body = false;
   while (!feof($fp)) {
       $s = fgets($fp, 1024);
       if ( $body )
           $in .= $s;
       if ( $s == "\r\n" )
           $body = true;
   }
   
   fclose($fp);
   
   return $in;
}

This post was edited on 10-15-2005 at 06:52 PM by J-Thread.
10-15-2005 06:51 PM
Profile E-Mail PM Find Quote Report
« Next Oldest Return to Top Next Newest »


Threaded Mode | Linear Mode
View a Printable Version
Send this Thread to a Friend
Subscribe | Add to Favorites
Rate This Thread:

Forum Jump:

Forum Rules:
You cannot post new threads
You cannot post replies
You cannot post attachments
You can edit your posts
HTML is Off
myCode is On
Smilies are On
[img] Code is On