get_meta_tags

(PHP 4, PHP 5, PHP 7, PHP 8)

get_meta_tags — 从一个文件中提取所有的 meta 标签 content 属性，返回一个数组

说明

get_meta_tags ( string $filename , bool $use_include_path = false ) : array

打开 filename 逐行解析文件中的 <meta> 标签。解析工作将在 </head> 处停止。

参数

filename

HTML 文件的路径字符串。此参数可以是本地文件也可以是一个 URL。

Example #1 get_meta_tags() 解析了什么

<meta name="author" content="name">
<meta name="keywords" content="php documentation">
<meta name="DESCRIPTION" content="a php manual">
<meta name="geo.position" content="49.33;-86.59">
</head> <!-- 解析工作在此处停止 -->

（注意回车换行－ PHP 使用一个本地函数来解析输入，所以 Mac 上的文件将不能在 Unix 上正常工作）。

use_include_path

将 use_include_path 设置为 true 将使 PHP 尝试按照 include_path 标准包含路径中的每个指向去打开文件。这只用于本地文件，不适用于 URL。

返回值

返回一个数组，包含所有解析过的 meta 标签。

返回的关联数组以属性 name 的值作为键，属性 content 的值作为值，所以你可以很容易地使用标准数组函数遍历此关联数组或访问某个值。属性 name 中的特殊字符将使用'_'替换，而其它字符则转换成小写。如果有两个 meta 标签拥有相同的 name，则只返回最后出现的那一个。

更新日志

版本	说明
4.0.5	开始支持没有使用引号括起来的 HTML 属性。

范例

Example #2 get_meta_tags() 返回什么些


<?php
// 假设上边的标签是在 www.example.com 中
$tags = get_meta_tags('http://www.example.com/');

// 注意所有的键（key）均为小写，而键中的'.'则转换成了'_'。
echo $tags['author'];       // name
echo $tags['keywords'];     // php documentation
echo $tags['description'];  // a php manual
echo $tags['geo_position']; // 49.33;-86.59
?>

注释

Note:
只有包含 name 属性的 meta 标签才会被解析。

参见

htmlentities() - 将字符转换为 HTML 转义字符
urlencode() - 编码 URL 字符串

User Contributed Notes

bobble bubble 05-Aug-2015 08:07


This regex gets meta tags independent of sequence by capturing inside a lookahead.

Further uses the branch reset feature for different quote styles of values.

The pattern can be tested here: https://regex101.com/r/oE4oU9/1



<?PHP



function getMetaTags($str)

{

  $pattern = '

  ~<\s*meta\s



  # using lookahead to capture type to $1

    (?=[^>]*?

    \b(?:name|property|http-equiv)\s*=\s*

    (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|

    ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))

  )



  # capture content to $2

  [^>]*?\bcontent\s*=\s*

    (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|

    ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))

  [^>]*>



  ~ix';

  

  if(preg_match_all($pattern, $str, $out))

    return array_combine($out[1], $out[2]);

  return array();

}



// usage

$meta_tags = getMetaTags($str);



?>

LWC 25-Apr-2015 08:38


New version based on mariano at cricava dot com's work with:

1) Support for Meta properties (like Facebook's og tags).

2) Support for Unicode (UTF-8) encoded Meta lines.

3) An option not to convert htmlentities - if you plan to actually use the results and not just display them.



function getUrlData($url, $raw=false) // $raw - enable for raw display

{

    $result = false;

   

    $contents = getUrlContents($url);



    if (isset($contents) && is_string($contents))

    {

        $title = null;

        $metaTags = null;

        $metaProperties = null;

       

        preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );



        if (isset($match) && is_array($match) && count($match) > 0)

        {

            $title = strip_tags($match[1]);

        }

       

        preg_match_all('/<[\s]*meta[\s]*(name|property)="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);

       

        if (isset($match) && is_array($match) && count($match) == 4)

        {

            $originals = $match[0];

            $names = $match[2];

            $values = $match[3];

           

            if (count($originals) == count($names) && count($names) == count($values))

            {

                $metaTags = array();

                $metaProperties = $metaTags;

                if ($raw) {

                    if (version_compare(PHP_VERSION, '5.4.0') == -1)

                         $flags = ENT_COMPAT;

                    else

                         $flags = ENT_COMPAT | ENT_HTML401;

                }

               

                for ($i=0, $limiti=count($names); $i < $limiti; $i++)

                {

                    if ($match[1][$i] == 'name')

                         $meta_type = 'metaTags';

                    else

                         $meta_type = 'metaProperties';

                    if ($raw)

                        ${$meta_type}[$names[$i]] = array (

                            'html' => htmlentities($originals[$i], $flags, 'UTF-8'),

                            'value' => $values[$i]

                        );

                    else

                        ${$meta_type}[$names[$i]] = array (

                            'html' => $originals[$i],

                            'value' => $values[$i]

                        );

                }

            }

        }

       

        $result = array (

            'title' => $title,

            'metaTags' => $metaTags,

            'metaProperties' => $metaProperties,

        );

    }

   

    return $result;

}



function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)

{

    $result = false;

   

    $contents = @file_get_contents($url);

   

    // Check if we need to go somewhere else

   

    if (isset($contents) && is_string($contents))

    {

        preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);

       

        if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)

        {

            if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)

            {

                return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);

            }

           

            $result = false;

        }

        else

        {

            $result = $contents;

        }

    }

   

    return $contents;

}

?>



<?php

$result = getUrlData('http://whatever...', true);



echo '<pre>'; print_r($result, true); echo '</pre>';



?>



Output example:



<?php

Array

(

    [title] => The requested page's title

    [metaTags] => Array

        (

            [description] => Array

                (

                    [html] => <meta name="description" content="Something..." />

                    [value] => Something...

                )

        )

    [metaProperties] => Array

        (

            [og:type] => Array

                (

                    [html] => <meta property="og:type" content="article"/>/>

                    [value] => article

                )

        )

)

?>

richard dot dern at athaliasoft dot fr 02-Oct-2013 08:50


I personally experienced less issues using the DOM functions than regular expressions while trying to fetch meta tags and not using get_meta_tags function (in order to get http-equiv meta tags too).



<?php



$doc = new DOMDocument();

$doc->loadHTML($html);



$xpath = new DOMXPath($doc);



$nodes = $xpath->query('//head/meta');



foreach($nodes as $node) {

    [...]

}



?>

Ebpo 08-Mar-2013 06:31


Be aware that the function looks for the metatags in the whole page. If one of the meta is commented in your code for some reason, it will still be grabed.

jstel at 126 dot com 13-Jul-2009 09:36


this function could get each meta of html content , and stripped all js and css. 





<?php


function get_meta_data($content)


{


    $content = strtolower($content);


    $content = preg_replace("'<style[^>]*>.*</style>'siU",'',$content);  // strip js


    $content = preg_replace("'<script[^>]*>.*</script>'siU",'',$content); // strip css


    $split = explode("\n",$content);


    foreach ($split as $k => $v)


    {


        if (strpos(' '.$v,'<meta')) {


             preg_match_all(


"/<meta[^>]+(http\-equiv|name)=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]*>/i",


$v, $split_content[],PREG_PATTERN_ORDER);;


        }


    }


    return $split_content;


}


?>

Antonio - Malaga 10-Feb-2009 02:21


It's not work if meta syntax not have trailing slash.

doob_ at gmx dot de 27-Nov-2008 04:12


<?php





/*


** Extracts and formats meta tag content 


*/





function get_meta_data($url, $searchkey='') {    


    $data = get_meta_tags($url);    // get the meta data in an array


    foreach($data as $key => $value) {


        if(mb_detect_encoding($value, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1') {    // check whether the content is UTF-8 or ISO-8859-1


            $value = utf8_decode($value);    // if UTF-8 decode it


        }


        $value = strtr($value, get_html_translation_table(HTML_ENTITIES));    // mask the content


        if($searchkey != '') {    // if only one meta tag is in demand e.g. 'description'


            if($key == $searchkey) {


                $str = $value;    // just return the value


            }


        } else {    // all meta tags


            $pattern = '/ |,/i';    // ' ' or ','


            $array = preg_split($pattern, $value, -1, PREG_SPLIT_NO_EMPTY);    // split it in an array, so we have the count of words            


            $str .= '<p><span style="display:block;color:#000000;font-weight:bold;">' . $key . ' <span style="font-weight:normal;">(' . count($array) . ' words | ' . strlen($value) . ' chars)</span></span>' . $value . '</p>';    // format data with count of words and chars


        }


    }


    return $str;


}





$content .= get_meta_data("http://www.example.com/");


/*


output looks like this:





description (23 words | 167 chars)


SELFHTML 8.1.2 - Die bekannte Dokumentation zu HTML, JavaScript und CGI/Perl - Tutorial und Referenz, mit etlichen Zusatztips zu Design, Grafik, Projektverwaltung usw.





keywords (13 words | 119 chars)


SELFHTML, HTML, Dynamic HTML, JavaScript, CGI, Perl, Grafik, WWW-Seiten, Web-Seiten, Hilfe, Dokumentation, Beschreibung





etc.





*/





$content .= get_meta_data("http://www.example.com/", "description");


/*


output looks like this:





SELFHTML 8.1.2 - Die bekannte Dokumentation zu HTML, JavaScript und CGI/Perl - Tutorial und Referenz, mit etlichen Zusatztips zu Design, Grafik, Projektverwaltung usw.


*/





?>

diel at caroes dot be 16-Jan-2008 10:38


Quick meta data grabber

[code]

if(get_meta_tags('http://'.$_POST['pagina'])){

        print '<font class="midden">Meta data from http://'.$_POST['pagina'].'</font>';

        $metadata = get_meta_tags('http://'.$_POST['pagina']);

        echo '<table width="100%">';

        print '<tr><td>Meta</td><td>Waarde</td></tr>';

        foreach($metadata as $naam => $waarde){

            echo '<tr><td valign="top">'.$naam.'</td><td>'.$waarde.'</td></tr>';

        }

        print '</table>';

    }else{

        print '

        <div class="red_h">Incorrect</div>

        ';

    }

[/code]

roganty at gmail dot com 19-Aug-2006 03:33


This is a slight amendment to jimmyxx at gmail dot com function



I tried using the regex displayed in his code, and php threw up a couple of errors



Below is the correct regular expression that works

(Please note that I had to split the regex into strings because php.net was complaining about the line being to long)

<?php

preg_match_all(

   "|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>|i",

   $html, $out,PREG_PATTERN_ORDER);

?>



The problem was due to the quotes being incorrectly escaped.

I hope this helps anyone who has been having problems with his code

jimmyxx at gmail dot com 20-Sep-2005 12:37


I used this as part of my mini php search based search engine - it really slowed the whole thing down. I wrote this function to read HTML (just fetch the file or use something like snoopy) and extract the meta data via a simple regex, works a treat and made my crawler much faster:



<?php



function get_meta_data($html) {



    preg_match_all(

    "|<meta[^>]+name=\\"([^\\"]*)\\"[^>]+content=\\"([^\\"]*)\\"[^>]+>|i",  $html, $out,PREG_PATTERN_ORDER);



    for ($i=0;$i < count($out[1]);$i++) {

        // loop through the meta data - add your own tags here if you need

        if (strtolower($out[1][$i]) == "keywords") $meta['keywords'] = $out[2][$i];

        if (strtolower($out[1][$i]) == "description") $meta['description'] = $out[2][$i];

    }



return $meta;    

}



?>

mariano at cricava dot com 12-Sep-2005 05:18


Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)



<?php

function getUrlData($url)

{

    $result = false;

    

    $contents = getUrlContents($url);



    if (isset($contents) && is_string($contents))

    {

        $title = null;

        $metaTags = null;

        

        preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );



        if (isset($match) && is_array($match) && count($match) > 0)

        {

            $title = strip_tags($match[1]);

        }

        

        preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);

        

        if (isset($match) && is_array($match) && count($match) == 3)

        {

            $originals = $match[0];

            $names = $match[1];

            $values = $match[2];

            

            if (count($originals) == count($names) && count($names) == count($values))

            {

                $metaTags = array();

                

                for ($i=0, $limiti=count($names); $i < $limiti; $i++)

                {

                    $metaTags[$names[$i]] = array (

                        'html' => htmlentities($originals[$i]),

                        'value' => $values[$i]

                    );

                }

            }

        }

        

        $result = array (

            'title' => $title,

            'metaTags' => $metaTags

        );

    }

    

    return $result;

}



function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)

{

    $result = false;

    

    $contents = @file_get_contents($url);

    

    // Check if we need to go somewhere else

    

    if (isset($contents) && is_string($contents))

    {

        preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);

        

        if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)

        {

            if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)

            {

                return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);

            }

            

            $result = false;

        }

        else

        {

            $result = $contents;

        }

    }

    

    return $contents;

}

?>



Here's an example of its usage. Check that the included URL has a META REFRESH redirection:



<?php

$result = getUrlData('http://www.marianoiglesias.com.ar/');



echo '<pre>'; print_r($result); echo '</pre>';



?>



For the above code the output would be:



<?php

Array

(

    [title] => Mariano Iglesias: El Eternauta    

    [metaTags] => Array

        (

            [description] => Array

                (

                    [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />

                    [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.

                )



            [DC.title] => Array

                (

                    [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />

                    [value] => Mariano Iglesias - Weblog

                )



            [ICBM] => Array

                (

                    [html] => <meta name="ICBM" content="-34.6017, -58.3956" />

                    [value] => -34.6017, -58.3956

                )



            [geo.position] => Array

                (

                    [html] => <meta name="geo.position" content="-34.6017;-58.3956" />

                    [value] => -34.6017;-58.3956

                )



            [geo.region] => Array

                (

                    [html] => <meta name="geo.region" content="AR-BA">

                    [value] => AR-BA

                )



            [geo.placename] => Array

                (

                    [html] => <meta name="geo.placename" content="Buenos Aires">

                    [value] => Buenos Aires

                )



        )



)

?>

Michael Knapp 12-Mar-2005 12:47


Tim's code is good (thanks Tim), except it won't work very well if the tag is part of a long non-breaking string.



E.g. try getting the title from Google Maps (http://www.google.com/maps).



A better solution is:



<?php 

$title = "";

    

if ($fp = @fopen( $_POST['url'], 'r' )) {



    $cont = "";

    

    // read the contents

    while( !feof( $fp ) ) {

       $buf = trim(fgets( $fp, 4096 )) ;

       $cont .= $buf;

    }



    // get tag contents

    @preg_match( "/<title>([a-z 0-9]*)<\/title>/si", $cont, $match );

    

    // tag contents

    $title = strip_tags(@$match[ 1 ]); 

} 



?>



Note the strip_tags. Another thing to be careful of is to check for ", <, and >. You will need to strip those out if you are posting the output to a form.



Also, it is probably best to use the /i modifier, because some people might code <TITLE> etc...

rehfeld 05-Feb-2005 06:45


in response to

jp at webgraphe dot com



this function grabs meta tags, not http headers



if you need the headers



<?php



$fp = fopen('http://example.org/somepage.html', 'r');



// the variable $http_response_header magically appears

print_r($http_response_header);



// or

$meta_data = stream_get_meta_data($fp);

print_r($meta_data);



?>

tim dot bennett at haveaniceplay dot com 01-Feb-2005 07:43


If you want to get the contents of tags other than meta you can use:



<?php



$page = "http://www.mysite.com/apage.php";



    // tags

    $start = '<atag>';

    $end = '<\/atag>';



    // open the file

    $fp = fopen( $page, 'r' );



    $cont = "";



    // read the contents

    while( !feof( $fp ) ) {

        $buf = trim( fgets( $fp, 4096 ) );

        $cont .= $buf;

    }

    

    // get tag contents

    preg_match( "/$start(.*)$end/s", $cont, $match );



    // tag contents

    $contents = $match[ 1 ]; 



?>

jp at webgraphe dot com 12-Dec-2003 05:37


If the URL is doing a redirection using the headers (like you would do with PHP function header("Location: URL");), the page has no content (in general). It appears get_meta_tags() doesn't catch that kind of redirection (like cURL would do) and it lead me to a timeout of my script.



I experienced this in a spider I wrote in order to feed my database of all available pages on my site and one link was linking to a page that simply has the following code:



<?php

  header("Location: sections.php?section=home");

  exit();

?>



That made my script hang for a moment and apparently, get_meta_tags() wasn't even able to return me an error.



JP.

20-Dec-2001 11:01


Tested PHP 4.0.6





get_meta_tags() seems to look only in the beginning of a file, meaning that e.g. if there is a lot of PHP code before the HTML header it will return nothing ...


Tested using get_meta_tags() on local files with about 9000 characters of PHP code before HTML HEADER.





Workaround: if possible move code after header or if not: include a file.

Ben dot Davis at furman dot edu 30-Jul-2001 03:29


I have found that for large searches, get_meta_tags is very slow.  I created a large search engine for a website that couldnt use a database and I first tried pulling out the meta tags.  


I have found that it is actually much faster to use eregi to pull out the meta tags.  This code below pulls out the description:





if (eregi ("<meta name=\"description\" content=[^>]*", $contents, $descresult))


                                {


                                    $description = explode("<meta name=\"description\" content=", $descresult[0]);


                                    echo "<font face=\"Arial\" size=2>$description[1]</font>";


                                    


                                }

richard at pifmagazine dot com 21-Apr-2000 05:17


Something that is not mentioned above and should be : When using get_meta_tags on a remote PHP page the page will be parsed before the meta tags are returned - so you can capture meta tags generated dynamically (by PHP??) on the remote end.





This DOES NOT work the same way when getting meta tags on local file systems.  Local files are not parsed through the web server before returning to get_meta_tags().  If the META tag is hard-coded into the page, you'll be fine - but if it dynamically generated you will not be able to capture it unless you use the full URL when calling your local files.

richard at pifmagazine dot com 21-Apr-2000 05:00


An Important Note about META tags and this function :  if your META tag contains newline "\n"  characters, get_meta_tags() will return a NULL value for that name property.  Removing the newlines from the source META tag corrects the problem.