There is an active feature request for an xzopen function. The xz program uses the LZMA algorithm for much higher compression ratios than gzip.
Until such time that xzopen might be available, here is my best design for safely reading .xz files which originate remotely and are not trusted.
Below is a CSV log viewer - there is a lot going on here, so I will explain it below:
<?
$PRG_NAME = basename($_SERVER['PHP_SELF'], '.php');
$PRG_LEN = strlen($PRG_NAME);
if(substr($z = $_GET['CSV'], 0, $PRG_LEN) == $PRG_NAME)
{
header('content-type:application/csv;CHARSET=gb2312');
header('Content-Disposition:attachment;filename="' .
$_GET['CSV'] . '.csv"');
if('.xz' == substr($z, -3))
{
$tmpfname = tempnam("/tmp", "php-log-viewer");
$fpx = fopen($tmpfname, 'w');
fprintf($fpx, "/var/app/log/%s\0", $z);
fclose($fpx);
$fp = popen("xz -cd --files0=" . $tmpfname, 'r');
}
else $fp = fopen('/var/app/log/' . $z, 'r');
while($line = fgets($fp))
echo '"' . preg_replace('/[~]/', '","', rtrim($line)) . "\"\n";
fclose($fp); if(is_file($tmpfname)) unlink($tmpfname); exit;
}
?>
The logs are tilde-delimited files (~) in the /var/app/log directory which the code above converts to CSV and injects into Excel. They are regularly compressed by cron, but the latest logs will be uncompressed text files. A separate section of my code (not included here) presents them via opendir()/readdir()/stat().
The file prefix that the viewer will allow the user to see is determined by the name of the script - if I name it FTP-LOG.php, then any file beginning with /var/app/log/FTP-LOG can be read. I am enabling the viewer for different prefixes by making hard links to the script.
Since the log might not (yet) be compressed, I check the extension - if .xz is detected, then the gymnastics begin.
It is not safe to pass form content from remote users to a UNIX shell, and I am trying to avoid this. Fortunately, xz has the --files and --files0 options, and I create a temporary filename, record the file of interest in it, then open an xz process for reading (otherwise, a simple fopen() will suffice). Recording a \0 allows safer processing of files with embedded newlines (which is allowed by POSIX), and is immediately familiar to fans of "find -print0" and "xargs -0".
Unfortunately, neither bzip2 nor lzip have have a --files[0] option. It is quite useful in this case, and appears to improve security.