 |
preg_replace (PHP 3 >= 3.0.9, PHP 4, PHP 5) preg_replace -- 执行正则表达式的搜索和替换 说明mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )
在
subject 中搜索
pattern 模式的匹配项并替换为
replacement。如果指定了
limit,则仅替换
limit 个匹配,如果省略
limit 或者其值为 -1,则所有的匹配项都会被替换。
replacement 可以包含
\\n
形式或(自 PHP 4.0.4 起)$n
形式的逆向引用,首选使用后者。每个此种引用将被替换为与第
n
个被捕获的括号内的子模式所匹配的文本。n
可以从 0 到 99,其中
\\0 或 $0
指的是被整个模式所匹配的文本。对左圆括号从左到右计数(从 1 开始)以取得子模式的数目。
对替换模式在一个逆向引用后面紧接着一个数字时(即:紧接在一个匹配的模式后面的数字),不能使用熟悉的
\\1 符号来表示逆向引用。举例说
\\11,将会使
preg_replace() 搞不清楚是想要一个
\\1 的逆向引用后面跟着一个数字
1 还是一个
\\11 的逆向引用。本例中的解决方法是使用
\${1}1。这会形成一个隔离的
$1 逆向引用,而使另一个
1 只是单纯的文字。
例子 1. 逆向引用后面紧接着数字的用法
<?php $string = "April 15, 2003"; $pattern = "/(\w+) (\d+), (\d+)/i"; $replacement = "\${1}1,\$3"; print preg_replace($pattern, $replacement, $string);
/* Output ======
April1,2003
*/ ?>
|
|
如果搜索到匹配项,则会返回被替换后的
subject,否则返回原来不变的
subject。
preg_replace() 的每个参数(除了
limit)都可以是一个数组。如果
pattern 和
replacement 都是数组,将以其键名在数组中出现的顺序来进行处理。这不一定和索引的数字顺序相同。如果使用索引来标识哪个
pattern 将被哪个
replacement 来替换,应该在调用
preg_replace() 之前用
ksort() 对数组进行排序。
例子 2. 在 preg_replace() 中使用索引数组
<?php $string = "The quick brown fox jumped over the lazy dog.";
$patterns[0] = "/quick/"; $patterns[1] = "/brown/"; $patterns[2] = "/fox/";
$replacements[2] = "bear"; $replacements[1] = "black"; $replacements[0] = "slow";
print preg_replace($patterns, $replacements, $string);
/* Output ======
The bear black slow jumped over the lazy dog.
*/
/* By ksorting patterns and replacements, we should get what we wanted. */
ksort($patterns); ksort($replacements);
print preg_replace($patterns, $replacements, $string);
/* Output ======
The slow black bear jumped over the lazy dog.
*/
?>
|
|
如果
subject 是个数组,则会对
subject 中的每个项目执行搜索和替换,并返回一个数组。
如果
pattern 和
replacement 都是数组,则
preg_replace() 会依次从中分别取出值来对
subject 进行搜索和替换。如果
replacement 中的值比
pattern 中的少,则用空字符串作为余下的替换值。如果
pattern 是数组而
replacement 是字符串,则对
pattern 中的每个值都用此字符串作为替换值。反过来则没有意义了。
/e 修正符使
preg_replace() 将
replacement 参数当作
PHP 代码(在适当的逆向引用替换完之后)。提示:要确保
replacement 构成一个合法的
PHP 代码字符串,否则
PHP 会在报告在包含
preg_replace() 的行中出现语法解析错误。
例子 3. 替换数个值
<?php $patterns = array ("/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/", "/^\s*{(\w+)}\s*=/"); $replace = array ("\\3/\\4/\\1\\2", "$\\1 ="); print preg_replace ($patterns, $replace, "{startDate} = 1999-5-27"); ?>
|
本例将输出:
|
例子 4. 使用 /e 修正符
<?php preg_replace ("/(<\/?)(\w+)([^>]*>)/e", "'\\1'.strtoupper('\\2').'\\3'", $html_body); ?>
|
这将使输入字符串中的所有 HTML 标记变成大写。
|
例子 5. 将 HTML 转换成文本
<?php // $document 应包含一个 HTML 文档。 // 本例将去掉 HTML 标记,javascript 代码 // 和空白字符。还会将一些通用的 // HTML 实体转换成相应的文本。
$search = array ("'<script[^>]*?>.*?</script>'si", // 去掉 javascript "'<[\/\!]*?[^<>]*?>'si", // 去掉 HTML 标记 "'([\r\n])[\s]+'", // 去掉空白字符 "'&(quot|#34);'i", // 替换 HTML 实体 "'&(amp|#38);'i", "'&(lt|#60);'i", "'&(gt|#62);'i", "'&(nbsp|#160);'i", "'&(iexcl|#161);'i", "'&(cent|#162);'i", "'&(pound|#163);'i", "'&(copy|#169);'i", "'&#(\d+);'e"); // 作为 PHP 代码运行
$replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)");
$text = preg_replace ($search, $replace, $document); ?>
|
|
注:
limit 参数是 PHP 4.0.1pl2 之后加入的。
参见 preg_match(),preg_match_all()
和 preg_split()。
bwooster47 at gmail dot com
31-Oct-2006 07:46
Tripped over a subtle issue in preg_replace!
Eventually got the right answers from the readers at alt.comp.lang.php!
preg_replace assumes a /g - replace all occurences.
So, when you need a replacement to force a single / character at end of string, i.e, replace 0 or more / characters with single /, here's what you need:
$t = preg_replace('!/*$!', '/', $s, 1);
Note the all important 1 (limit) at the end. Default is -1, and it seems to force a non-greedy match for /*$ , so if $s == aa//, the output is still a// with limit -1.
But with 1 as the limit, all values of $s such as aa or aa/ or aa// or aa//// all end up like aa/, just as needed.
[Not sure why a//// ends up as a// with limit -1...]
So, final story, to make *$ be greedy, use the limit of 1.
welniana at i64 dot pl
16-Oct-2006 04:46
Use preg_replace to verify input parameters:
function verify_loginname($s) { /* a-z, A-Z, 0-9 */
return(preg_replace('@[^a-zA-Z0-9]@','',$s));
}
function verify_integer($i) { /* 0-9 */
return(preg_replace('@[^0-9]@','',$i));
}
function verify_number($n) { /* 0-9 + , + . */
return(preg_replace('@[^0-9,\.]@','',$n));
}
--
Welniana
http://posciel-welniana.pl/
wjaspers [at] nuaire [dot] com
05-Oct-2006 12:42
If you need to remove excess slashes from links (or convert incorrect ones), you may want to use this function:
<?php
/**
* Replaces excess / slashes, and converts \ slashes into /
*
* Author: William Jaspers, wjapers[at]nuaire[dot]com
* Created: 04 Oct 2006 11:38 AM
* @param string $input // The string to strip and replace slashes in
* @returns string
*/
function fixSlashes($input)
{
return preg_replace('/(\/|\\\)++/','/',$input);
}
?>
norten at gmail dot com
01-Oct-2006 11:16
In Hungary we use the "," character to separate the integer and the fraction part of numbers. This little script replaces all "," between two numbers to the English-style ".":
<?php
$text = "The result is 23,3 m.";
$text = preg_replace("/([0-9])([,])([0-9])/","$1.$3",$text);
print $text;
//output: The result is 23,3 m.
?>
Alexey Lebedev
07-Sep-2006 05:21
Wasted several hours because of this:
$str='It's a string with HTML entities';
preg_replace('~&#(\d+);~e', 'code2utf($1)', $str);
This code must convert numeric html entities to utf8. And it does with a little exception. It treats wrong codes starting with �
The reason is that code2utf will be called with leading zero, exactly what the pattern matches - code2utf(039).
And it does matter! PHP treats 039 as octal number.
Try print(011);
Solution:
preg_replace('~�*(\d+);~e', 'code2utf($1)', $str);
06-Sep-2006 06:21
Note that it is important for the $count parameter to be initialized with "some" value (it does'nt matter what value or data type you choose).
Otherwise preg_replace() will leave $count untouched.
See this example:
preg_replace('/a/', 't', 'x', -1, $test);
var_dump($test);
Will output NULL.
While this:
$test = 'foo';
preg_replace('/a/', 't', 'x', -1, $test);
var_dump($test);
Will output int(0) as you may have expected.
Vadtec
05-Sep-2006 06:45
Sevendust at aaoclan dot be, you have an error in your reqex array. At least in the $BBCode array. Look at $BBCode[1].
$BBCode[1] = '/[/b]/';
To the regex engine, this will appear as /[/b with everthing after the b becoming useless information. This is because you enclosed the regex in / /, as you should. There is a simple solution to this, either do not put the regex in / / which turns the string into a literal regex (will match only what is in quotes), or escape the extra /, as in '/[\/b]/'. However, to also be sure that the regex matches the [ ] and properly, you should also escape them, as in '/\[\/b\]/', which is increditbly ugly.
In your example, it would be much simpler to simply quote the regex without / /. This forces the regex engine to look *only* for that string and replace it. I tried using your class last night and kept getting errors such as "Unknown modifier 'b' in <file here> on line X". After removing the / / from the qouted string, it worked great.
As for my reason using preg_replace(), I had a much more challenging task. I wanted to be able to match BB code with its start and end BB tags to convert it to HTML for display, but also to convert the HTML back into BB code if necessary. I did this so that I could store the HTML code directly in a DB rather than having to parse the code as I queried the text from the DB. I know this is rather compicated, but I will need to be able to pull the text from the DB without the BB code for display on a remote site which would not have my BB code to decode the BB tags. (Plus, I just wanted to see if I could do it anyways.)
Assume that you are using CSS to format the elements of a page. Using <span class="whatever">whatever</span> is an excellent way to do this. So assume you have a CSS class called "term" and you want to use the BB tag [t]whatever[/t] to mark it, but also want the code to replace <span class="term">whatever</span> with the proper BB tags. Using the method from Sevendust, you can only go from BB tags to HTML tags, not HTML tags to BB tags. This is because tags like </span> could apply to any number of BB tags.
So, after messing with the regexes for a while, I came up with this solution.
<?php
function convert_bb($text) {
$temp = split("\n", $text);
$bb_code[0] = '/(\[t\])(.*)(\[\/t\])/'; // Notice how you have to escape each regex identifier so that it will match literally.
$bb_replace[0] = '<span class="term">${2}</span>';
$temp2 = preg_replace($bb_code, $bb_replace, $temp);
$temp2 = implode("\n", $temp2);
return $temp2;
}
function convert_html($text) {
$temp = split("\n", $text);
$html_code[0] = '/(<span class="term">)(.*)(<\/span>)/';
$html_replace[0] = '[t]${2}[/t]';
$temp2 = preg_replace($html_code, $html_replace, $temp);
$temp2 = implode("\n", $temp2);
return $temp2;
}
?>
Yes, I know its ugly and hard to read the regexes, but it will match the start and end of and set of BB or HTML tags and covert them accordingly. As you can see, the BB tags will be matched like so:
[t](any ammount of text here)[/t]
will be replaced with
<span class="term">(whatever text was between the [t] and [/t] tags)</span>
Going from HTML tags to BB tags works exactly in reverse:
<span class="term">(any ammount of text here)</span>
will be will be replaced with
[t](whatever text was between the <span calss="term"> and </span> tags)[/t]
Note that if you want the regex to match just <span> you will have to change it to just /(<span>)(.*)(<\/span>)/.
Put this in a PHP script and give it a try. For every regex you add to the $bb_code/$bb_replace set, it will convert that BB tag set to its corresponding HTML tag set while preserving whatever is between the BB tags, and vice versa.
I hope this helps someone out.
Vadtec
Sevendust at aaoclan dot be
19-Aug-2006 10:43
<?php
// BBCode parser by Sevendust
Class BBParse
{
Var $InputString;
Var $OutputString;
Function BBParse ( $InputString, $OutputString )
{
If ( $This -> Input == '' )
{
return_error ( 'You have to provide at least a message of 20 characters!', true );
}
else
{
// Define some default BB code tags, such as bold, italic, and url.
$BBCode[0] = '/[b]/';
$BBCode[1] = '/[/b]/';
$BBCode[2] = '/[i]/';
$BBCode[3] = '/[/i]/';
$BBCode[4] = '/[url]/';
$BBCode[5] = '/[/url]/';
// Replacement strings, in HTML ofcourse.
$BBReplace[0] = '<b>';
$BBReplace[1] = '</b>';
$BBReplace[2] = '<i>';
$BBReplace[3] = '</i>';
$BBReplace[4] = '<a href=' . $InputString . '>';
$BBReplace[5] = '</a>';
$BBParsedOutput = Preg_Replace ( $BBCode, $BBReplace, $InputString );
}
}
}
?>
tim at t-network dot nl
19-Jul-2006 01:58
This function has a little quirk.
When you are trying to use backreferences in the pattern, you MUST use \\n, and not $n. $n doesn't work.
kurt at yachthub dot com
09-Jul-2006 10:20
Fix up for most common bad punctuation round commas and fullstops, remove white space, make the first letter of a sentence uppercase and replace dubious characters like &, ", ', etc. with special html characters.
<?
$string=" harry's house.4. 2m long 3.5m wide.63\" but great . seating , for 7.pjljk.cost is $12, 00.00.Good buy .0.0. 1.1m\n";
echo "<pre>$string</pre><p>";
$pat[0] = '/\./';
$pat[1] = '/ \./';
$pat[2] = '/\,/';
$pat[3] = '/ \,/';
$pat[4] = '/\n /';
$pat[5] = '/ +/';
$repl[0] = '. ';
$repl[1] = '.';
$repl[2] = ', ';
$repl[3] = ', ';
$repl[4] = '\n';
$repl[5] = ' ';
$string=split("\. ",trim(ucfirst(stripslashes(htmlspecialchars(preg_replace($pat, $repl, $string),ENT_QUOTES)))));
foreach ($string as $key=>$word) {
$string[$key] = ucfirst($word);
}
$string = implode ('. ', $string);
$i=0;
while($i < 10){
$pat[$i] = '/'.$i.'\. /';
$repl[$i] = ''.$i.'.';
$i++;
}
while($i < 36){
$b=$i+55;
$pat[$i] = '/\.'.chr($b).'/';
$repl[$i] = '. '.chr($b).'';
$i++;
}
while($i < 46){
$b=$i-36;
$pat[$i] = '/'.$b.'\, /';
$repl[$i] = ''.$b.',';
$i++;
}
$string = preg_replace($pat, $repl, $string);
echo $string;
?>
www.humer.biz
06-Jul-2006 03:09
@Graham: Your function from march, 16th works only error free, if tag (if it is only one) in your source is closed.
So I came along to add a pseudo-Tag around source and everything runs well ;)
function strip_styles($source=NULL)
{
# and pseudo-Tag
$source = '<parse>'.$source.'</parse>';
[...] rest of function
# and return it this way:
return (str_replace(array('<parse>','</parse>'),"",$source));
}
Sune Rievers
25-May-2006 01:58
Updated version of the link script, since the other version didn't work with links in beginning of line, links without http:// and emails. Oh, and a bf2:// detection too for all you gamers ;)
function make_links_blank($text)
{
return preg_replace(
array(
'/(?(?=<a[^>]*>.+<\/a>)
(?:<a[^>]*>.+<\/a>)
|
([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
)/iex',
'/<a([^>]*)target="?[^"\']+"?/i',
'/<a([^>]+)>/i',
'/(^|\s)(www.[^<> \n\r]+)/iex',
'/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
(\\.[A-Za-z0-9-]+)*)/iex'
),
array(
"stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
'<a\\1',
'<a\\1 target="_blank">',
"stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
"stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
),
$text
);
}
robvdl at gmail dot com
21-Apr-2006 08:15
For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as ’.
Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.
It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.
$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);
heppa(at)web(dot)de
20-Apr-2006 11:37
I just wanted to give an example for some people that have the problem, that their match is taking away too much of the string.
I wanted to have a function that extracts only wanted parameters out of a http query string, and they had to be flexible, eg 'updateItem=1' should be replaced, as well as 'updateCategory=1', but i sometimes ended up having too much replaced from the query.
example:
my query string: 'updateItem=1&itemID=14'
ended up in a query string like this: '4' , which was not really covering the plan ;)
i was using this regexp:
preg_replace('/&?update.*=1&?/','',$query_string);
i discovered, that preg_replace matches the longest possible string, which means that it replaces everything from the first u up to the 1 after itemID=
I assumed, that it would take the shortest possible match.
Ritter
19-Apr-2006 05:08
for those of you with multiline woes like I was having, try:
$str = preg_replace('/<tag[^>](.*)>(.*)<\/tag>/ims','<!-- edited -->', $str);
Eric
10-Apr-2006 02:54
Here recently I needed a way to replace links (<a href="blah.com/blah.php">Blah</a>) with their anchor text, in this case Blah. It might seem simple enough for some..or most, but at the benefit of helping others:
<?php
$value = '<a href="http://www.domain.com/123.html">123</a>';
echo preg_replace('/<a href="(.*?)">(.*?)<\\/a>/i', '$2', $value);
//Output
// 123
?>
sesha_srinivas at yahoo dot com
08-Apr-2006 04:13
If you have a form element displaying the amounts using "$" and ",". Before posting it to the db you can use the following:
$search = array('/,/','/\$/');
$replace = array('','');
$data['amount_limit'] = preg_replace($search,'',$data['amount_limit']);
ciprian dot amariei Mtaiil gmail * com
06-Apr-2006 01:21
I found some situations that my function bellow doesn't
perform as expected. Here is the new version.
<?php
function make_links_blank( $text )
{
return preg_replace(
array(
'/(?(?=<a[^>]*>.+<\/a>)
(?:<a[^>]*>.+<\/a>)
|
([^="\'])((?:https?|ftp):\/\/[^<> \n\r]+)
)/iex',
'/<a([^>]*)target="?[^"\']+"?/i',
'/<a([^>]+)>/i'
),
array(
"stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
'<a\\1',
'<a\\1 target="_blank">'
),
$text
);
}
?>
This function replaces links (http(s)://, ftp://) with respective html anchor tag, and also makes all anchors open in a new window.
ae at instinctive dot de
28-Mar-2006 11:40
Something innovative for a change ;-) For a news system, I have a special format for links:
"Go to the [Blender3D Homepage|http://www.blender3d.org] for more Details"
To get this into a link, use:
$new = preg_replace('/\[(.*?)\|(.*?)\]/', '<a href="$2" target="_blank">$1</a>', $new);
c_stewart0a at yahoo dot com
18-Mar-2006 06:35
In response to elaineseery at hotmail dot com
[quote]if you're new to this function, and getting an error like 'delimiter must not alphanumeric backslash ...[/quote]
Note that if you use arrays for search and replace then you will want to quote your searching expression with / or you will get this error.
However, if you use a single string to search and replace then you will not recieve this error if you do not quote your regular expression in /
Graham Dawson <graham at imdanet dot com>
16-Mar-2006 06:46
I said there was a better way. There is!
The regexp is essentially the same but now I deal with problems that it couldn't handle, such as urls, which tended to screw things up, and the odd placement of a : or ; in the body text, by using functions. This makes it easier to expand to take account of all the things I know I've not taken account of. But here it is in its essential glory. Or mediocrity. Take your pick.
<?php
define('PARSER_ALLOWED_STYLES_',
'text-align,font-family,font-size,text-decoration');
function strip_styles($source=NULL) {
$exceptions = str_replace(',', '|', @constant('PARSER_ALLOWED_STYLES_'));
/* First we want to fix anything that might potentially break the styler stripper, sow e try and replace
* in-text instances of : with its html entity replacement.
*/
function Replacer($text) {
$check = array (
'@:@s',
);
$replace = array(
':',
);
return preg_replace($check, $replace, $text[0]);
}
$source = preg_replace_callback('@>(.*)<@Us', 'Replacer', $source);
$regexp =
'@([^;"]+)?(?<!'. $exceptions. ')(?<!\>\w):(?!\/\/(.+?)\/|<|>)((.*?)[^;"]+)(;)?@is';
$source = preg_replace($regexp, '', $source);
$source = preg_replace('@[a-z]*=""@is', '', $source);
return $source;
}
?>
rybasso
16-Mar-2006 05:33
"Document contains no data" message in FF and 'This page could not be found' in IE occures when you pass too long <i>subject</i> string to preg_replace() with default <i>limit</i>.
Increment the limit to be sure it's larger than a subject lenght.
Ciprian Amariei
16-Mar-2006 06:50
Here is a function that replaces the links (http(s)://, ftp://) with respective html anchor, and also makes all anchors open in a new window.
function make_links_blank( $text )
{
return preg_replace( array(
"/[^\"'=]((http|ftp|https):\/\/[^\s\"']+)/i",
"/<a([^>]*)target=\"?[^\"']+\"?/i",
"/<a([^>]+)>/i"
),
array(
"<a href=\"\\1\">\\1</a>",
"<a\\1",
"<a\\1 target=\"_blank\" >"
),
$text
);
}
felipensp at gmail dot com
13-Mar-2006 01:02
Sorry, I don't know English.
Replacing letters of badword for a definite character.
View example:
<?php
function censured($string, $aBadWords, $sChrReplace) {
foreach ($aBadWords as $key => $word) {
// Regexp for case-insensitive and use the functions
$aBadWords[$key] = "/({$word})/ie";
}
// to substitue badwords for definite character
return preg_replace($aBadWords,
"str_repeat('{$sChrReplace}', strlen('\\1'))",
$string
);
}
// To show modifications
print censured('The nick of my friends are rand, v1d4l0k4, P7rk, ferows.',
array('RAND', 'V1D4L0K4', 'P7RK', 'FEROWS'),
'*'
);
?>
Graham Dawson graham_at_imdanet_dot_com
07-Mar-2006 05:32
Inspired by the query-string cleaner from greenthumb at 4point-webdesign dot com and istvan dot csiszar at weblab dot hu. This little bit of code cleans up any "style" attributes in your tags, leaving behind only styles that you have specifically allowed. Also conveniently strips out nonsense styles. I've not fully tested it yet so I'm not sure if it'll handle features like url(), but that shouldn't be a difficulty.
<?php
/* The string would normally be a form-submitted html file or text string */
$string = '<span style="font-family:arial; font-size:20pt; text-decoration:underline; sausage:bueberry;" width="200">Hello there</span> This is some <div style="display:inline;">test text</div>';
/* Array of styles to allow. */
$except = array('font-family', 'text-decoration');
$allow = implode($except, '|');
/* The monster beast regexp. I was up all night trying to figure this one out. */
$regexp = '@([^;"]+)?(?<!'.$allow.'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';
print str_replace('<', '<', $regexp).'<br/><br/>';
$out = preg_replace($regexp, '', $string);
/* Now lets get rid of any unwanted empty style attributes */
$out = preg_replace('@[a-z]*=""@is', '', $out);
print $out;
?>
This should produce the following:
<span style="font-family:arial; text-decoration:underline;" width="200">Hello there</span> This is some <div >test text</div>
Now, I'm a relative newbie at this so I'm sure there's a better way to do it. There's *always* a better way.
elaineseery at hotmail dot com
15-Feb-2006 10:44
if you're new to this function, and getting an error like
'delimiter must not alphanumeric backslash ...
note that whatever is in $pattern (and only $pattern, not $string, or $replacement) must be enclosed by '/ /' (note the forward slashes)
e.g.
$pattern = '/and/';
$replacement = 'sandy';
$string = 'me and mine';
generates 'me sandy mine'
seems to be obvious to everyone else, but took me a while to figure out!!
jsirovic at gmale dot com
08-Feb-2006 01:23
If the lack of &$count is aggravating in PHP 4.x, try this:
$replaces = 0;
$return .= preg_replace('/(\b' . $substr . ')/ie', '"<$tag>$1<$end_tag>" . (substr($replaces++,0,0))', $s2, $limit);
no-spam@idiot^org^ru
05-Feb-2006 04:21
decodes ie`s escape() result
<?
function unicode_unescape(&$var, $convert_to_cp1251 = false){
$var = preg_replace(
'#%u([\da-fA-F]{4})#mse',
$convert_to_cp1251 ? '@iconv("utf-16","windows-1251",pack("H*","\1"))' : 'pack("H*","\1")',
$var
);
}
//
$str = 'to %u043B%u043E%u043F%u0430%u0442%u0430 or not to %u043B%u043E%u043F%u0430%u0442%u0430';
unicode_unescape($str, true);
echo $str;
?>
leandro[--]ico[at]gm[--]ail[dot]com
05-Feb-2006 01:40
I've found out a really odd error.
When I try to use the 'empty' function in the replacement string (when using the 'e' modifier, of course) the regexp interpreter get stucked at that point.
An examplo of this failure:
<?php
echo $test = preg_replace( "/(bla)/e", "empty(123)", "bla bla ble" );
# it should print something like:
# "1 1 ble"
?>
Very odd, huh?
04-Feb-2006 12:00
fairly useful script to replace normal html entities with ordinal-value entities. Useful for writing to xml documents where entities aren't defined.
<?php
$p='#(\&[\w]+;)#e';
$r="'&#'.ord(html_entity_decode('$1')).';'";
$text=preg_replace($p,$r,$_POST['data']);
?>
01-Feb-2006 02:23
Uh-oh. When I looked at the text in the preview, I had to double the number of backslashes to make it look right.
I'll try again with my original text:
$full_text = preg_replace('/\[p=(\d+)\]/e',
"\"<a href=\\\"./test.php?person=$1\\\">\"
.get_name($1).\"</a>\"",
$short_text);
I hope that it comes out correctly this time :-)
leif at solumslekt dot org
01-Feb-2006 12:24
I've found a use for preg_replace. If you've got eg. a database with persons assiciated with numbers, you may want to input links in a kind of shorthand, like [p=12345], and have it expanded to a full url with a name in it.
This is my solution:
$expanded_text = preg_replace('/\\[p=(\d+)\\]/e',
"\\"<a href=\\\\\\"./test.php?person=$1\\\\\\">\\".get_name($1).\\"</a&>\\"",
$short_text);
It took me some time to work out the proper number of quotes and backslashes.
regards, Leif.
SG_01
20-Jan-2006 08:43
Re: wcc at techmonkeys dot org
You could put this in 1 replace for faster execution as well:
<?php
/*
* Removes all blank lines from a string.
*/
function removeEmptyLines($string)
{
return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>
05-Jan-2006 06:09
First, I have no idea about regexp, all I did has been through trial and error,
I wrote this function which tries to clean crappy ms word html, I use it to clean user pasted code to online wysiwyg online editors from ms word.
Theres a huge space for improvement, I post it here because after searching I could not find any pure php solution, the best alternative however, is tidy, but for those of us who are still using PHP 4 and do not have access to the server, this could be an alternative, use it under your own risk... once again, it was a quickie and I know there can be much better ways to do this:
function decraper($htm, $delstyles=false) {
$commoncrap = array('"'
,'font-weight: normal;'
,'font-style: normal;'
,'line-height: normal;'
,'font-size-adjust: none;'
,'font-stretch: normal;');
$replace = array("'");
$htm = str_replace($commoncrap, $replace, $htm);
$pat = array();
$rep = array();
$pat[0] = '/(<table\s.*)(width=)(\d+%)(\D)/i';
$pat[1] = '/(<td\s.*)(width=)(\d+%)(\D)/i';
$pat[2] = '/(<th\s.*)(width=)(\d+%)(\D)/i';
$pat[3] = '/<td( colspan="[0-9]+")?( rowspan="[0-9]+")?
( width="[0-9]+")?( height="[0-9]+")?.*?>/i';
$pat[4] = '/<tr.*?>/i';
$pat[5]=
'/<\/st1:address>(<\/st1:\w*>)?
<\/p>[\n\r\s]*<p[\s\w="\']*>/i';
$pat[6] = '/<o:p.*?>/i';
$pat[7] = '/<\/o:p>/i';
$pat[8] = '/<o:SmartTagType[^>]*>/i';
$pat[9] = '/<st1:[\w\s"=]*>/i';
$pat[10] = '/<\/st1:\w*>/i';
$pat[11] = '/<p class="MsoNormal"[^>]*>(.*?)<\/p>/i';
$pat[12] = '/ style="margin-top: 0cm;"/i';
$pat[13] = '/<(\w[^>]*) class=([^ |>]*)([^>]*)/i';
$pat[14] = '/<ul(.*?)>/i';
$pat[15] = '/<ol(.*?)>/i';
$pat[17] = '/<br \/> <br \/>/i';
$pat[18] = '/ <br \/>/i';
$pat[19] = '/<!-.*?>/';
$pat[20] = '/\s*style=(""|\'\')/';
$pat[21] = '/ style=[\'"]tab-interval:[^\'"]*[\'"]/i';
$pat[22] = '/behavior:[^;\'"]*;*(\n|\r)*/i';
$pat[23] = '/mso-[^:]*:"[^"]*";/i';
$pat[24] = '/mso-[^;\'"]*;*(\n|\r)*/i';
$pat[25] = '/\s*font-family:[^;"]*;?/i';
$pat[26] = '/margin[^"\';]*;?/i';
$pat[27] = '/text-indent[^"\';]*;?/i';
$pat[28] = '/tab-stops:[^\'";]*;?/i';
$pat[29] = '/border-color: *([^;\'"]*)/i';
$pat[30] = '/border-collapse: *([^;\'"]*)/i';
$pat[31] = '/page-break-before: *([^;\'"]*)/i';
$pat[32] = '/font-variant: *([^;\'"]*)/i';
$pat[33] = '/<span [^>]*><br \/><\/span><br \/>/i';
$pat[34] = '/" "/';
$pat[35] = '/[\t\r\n]/';
$pat[36] = '/\s\s/s';
$pat[37] = '/ style=""/';
$pat[38] = '/<span>(.*?)<\/span>/i';
//empty (no attribs) spans
$pat[39] = '/<span>(.*?)<\/span>/i';
//twice, nested spans
$pat[40] = '/(;\s|\s;)/';
$pat[41] = '/;;/';
$pat[42] = '/";/';
$pat[43] = '/<li(.*?)>/i';
$pat[44] =
'/(<\/b><b>|<\/i><i>|<\/em><em>|
<\/u><u>|<\/strong><strong>)/i';
$rep[0] = '$1$2"$3"$4';
$rep[1] = '$1$2"$3"$4';
$rep[2] = '$1$2"$3"$4';
$rep[3] = '<td$1$2$3$4>';
$rep[4] = '<tr>';
$rep[5] = '<br />';
$rep[6] = '';
$rep[7] = '<br />';
$rep[8] = '';
$rep[9] = '';
$rep[10] = '';
$rep[11] = '$1<br />';
$rep[12] = '';
$rep[13] = '<$1$3';
$rep[14] = '<ul>';
$rep[15] = '<ol>';
$rep[17] = '<br />';
$rep[18] = '<br />';
$rep[19] = '';
$rep[20] = '';
$rep[21] = '';
$rep[22] = '';
$rep[23] = '';
$rep[24] = '';
$rep[25] = '';
$rep[26] = '';
$rep[27] = '';
$rep[28] = '';
$rep[29] = '';
$rep[30] = '';
$rep[31] = '';
$rep[32] = '';
$rep[33] = '<br />';
$rep[34] = '""';
$rep[35] = '';
$rep[36] = '';
$rep[37] = '';
$rep[38] = '$1';
$rep[39] = '$1';
$rep[40] = ';';
$rep[41] = ';';
$rep[42] = '"';
$rep[43] = '<li>';
$rep[44] = '';
if($delstyles===true){
$pat[50] = '/ style=".*?"/';
$rep[50] = '';
}
ksort($pat);
ksort($rep);
return $htm;
}
Hope it helps, critics are more than welcome.
kyle at vivahate dot com
23-Dec-2005 04:08
Here is a regular expression to "slashdotify" html links. This has worked well for me, but if anyone spots errors, feel free to make corrections.
<?php
$url = '<a attr="garbage" href="http://us3.php.net/preg_replace">preg_replace - php.net</a>';
$url = preg_replace( '/<.*href="?(.*:\/\/)?([^ \/]*)([^ >"]*)"?[^>]*>(.*)(<\/a>)/', '<a href="$1$2$3">$4</a> [$2]', $url );
?>
Will output:
<a href="http://us3.php.net/preg_replace">preg_replace - php.net</a> [us3.php.net]
istvan dot csiszar at weblab dot hu
21-Dec-2005 05:53
This is an addition to the previously sent removeEvilTags function. If you don't want to remove the style tag entirely, just certain style attributes within that, then you might find this piece of code useful:
<?php
function removeEvilStyles($tagSource)
{
// this will leave everything else, but:
$evilStyles = array('font', 'font-family', 'font-face', 'font-size', 'font-size-adjust', 'font-stretch', 'font-variant');
$find = array();
$replace = array();
foreach ($evilStyles as $v)
{
$find[] = "/$v:.*?;/";
$replace[] = '';
}
return preg_replace($find, $replace, $tagSource);
}
function removeEvilTags($source)
{
$allowedTags = '<h1><h2><h3><h4><h5><a><img><label>'.
'<p><br><span><sup><sub><ul><li><ol>'.
'<table><tr><td><th><tbody><div><hr><em><b><i>';
$source = strip_tags(stripslashes($source), $allowedTags);
return trim(preg_replace('/<(.*?)>/ie', "'<'.removeEvilStyles('\\1').'>'", $source));
}
?>
triphere
18-Dec-2005 01:13
to remove Bulletin Board Code (remove bbcode)
$body = preg_replace("[\[(.*?)\]]", "", $body);
jcheger at acytec dot com
09-Dec-2005 04:16
Escaping quotes may be very tricky. Magic quotes and preg_quote are not protected against double escaping. This means that an escaped quote will get a double backslash, or even more. preg_quote ("I\'m using regex") will return "I\\'m using regex".
The following example escapes only unescaped single quotes:
<?php
$a = "I'm using regex";
$b = "I\'m using regex";
$patt = "/(?<!\\\)\'/";
$repl = "\\'";
print "a: ".preg_replace ($patt, $repl, $a)."\n";
print "b: ".preg_replace ($patt, $repl, $b)."\n";
?>
and prints:
a: I\'m using regex
b: I\'m using regex
Remark: matching a backslashe require a triple backslash (\\\).
urbanheroes {at} gmail {dot} com
16-Aug-2005 04:00
Here are two functions to trim a string down to a certain size.
"wordLimit" trims a string down to a certain number of words, and adds an ellipsis after the last word, or returns the string if the limit is larger than the number of words in the string.
"stringLimit" trims a string down to a certain number of characters, and adds an ellipsis after the last word, without truncating any words in the middle (it will instead leave it out), or returns the string if the limit is larger than the string size. The length of a string will INCLUDE the length of the ellipsis.
<?php
function wordLimit($string, $length = 50, $ellipsis = '...') {
return count($words = preg_split('/\s+/', ltrim($string), $length + 1)) > $length ?
rtrim(substr($string, 0, strlen($string) - strlen(end($words)))) . $ellipsis :
$string;
}
function stringLimit($string, $length = 50, $ellipsis = '...') {
return strlen($fragment = substr($string, 0, $length + 1 - strlen($ellipsis))) < strlen($string) + 1 ?
preg_replace('/\s*\S*$/', '', $fragment) . $ellipsis : $string;
}
echo wordLimit(' You can limit a string to only so many words.', 6);
// Output: "You can limit a string to..."
echo stringLimit('Or you can limit a string to a certain amount of characters.', 32);
// Output: "Or you can limit a string to..."
?>
avizion at relay dot dk
25-Apr-2005 03:04
Just a note for all FreeBSD users wondering why this function is not present after installing php / mod_php (4 and 5) from ports.
Remember to install:
/usr/ports/devel/php4-pcre (or 5 for -- 5 ;)
That's all... enjoy - and save 30 mins. like I could have used :D
jhm at cotren dot net
19-Feb-2005 06:04
It took me a while to figure this one out, but here is a nice way to use preg_replace to convert a hex encoded string back to clear text
<?php
$text = "PHP rocks!";
$encoded = preg_replace(
"'(.)'e"
,"dechex(ord('\\1'))"
,$text
);
print "ENCODED: $encoded\n";
?>
ENCODED: 50485020726f636b7321
<?php
print "DECODED: ".preg_replace(
"'([\S,\d]{2})'e"
,"chr(hexdec('\\1'))"
,$encoded)."\n";
?>
DECODED: PHP rocks!
gbaatard at iinet dot net dot au
15-Feb-2005 01:56
on the topic of implementing forum code ([b][/b] to <b></b> etc), i found this worked well...
<?php
$body = preg_replace('/\[([biu])\]/i', '<\\1>', $body);
$body = preg_replace('/\[\/([biu])\]/i', '</\\1>', $body);
?>
First line replaces [b] [B] [i] [I] [u] [U] with the appropriate html tags(<b>, <i>, <u>)
Second one does the same for closing tags...
For urls, I use...
<?php
$body = preg_replace('/\s(\w+:\/\/)(\S+)/', ' <a href="\\1\\2" target="_blank">\\1\\2</a>', $body);
?>
and for urls starting with www., i use...
<?php
$body = preg_replace('/\s(www\.)(\S+)/', ' <a href="http://\\1\\2" target="_blank">\\1\\2</a>', $body);
?>
Pop all these lines into a function that receives and returns the text you want 'forum coded' and away you go:)
tash at quakersnet dot com
30-Jan-2005 08:25
A better way for link & email conversaion, i think. :)
<?php
function change_string($str)
{
$str = trim($str);
$str = htmlspecialchars($str);
$str = preg_replace('#(.*)\@(.*)\.(.*)#','<a href="mailto:\\1@\\2.\\3">Send email</a>',$str);
$str = preg_replace('=([^\s]*)(www.)([^\s]*)=','<a href="http://\\2\\3" target=\'_new\'>\\2\\3</a>',$str);
return $str;
}
?>
jw-php at valleyfree dot com
26-Jan-2005 12:28
note the that if you want to replace all backslashes in a string with double backslashes (like addslashes() does but just for backslashes and not quotes, etc), you'll need the following:
$new = preg_replace('/\\\\/','\\\\\\\\',$old);
note the pattern uses 4 backslashes and the replacement uses 8! the reason for 4 slashses in the pattern part has already been explained on this page, but nobody has yet mentioned the need for the same logic in the replacement part in which backslashes are also doubly parsed, once by PHP and once by the PCRE extension. so the eight slashes break down to four slashes sent to PCRE, then two slashes put in the final output.
Nick
21-Jan-2005 07:05
Here is a more secure version of the link conversion code which hopefully make cross site scripting attacks more difficult.
<?php
function convert_links($str) {
$replace = <<<EOPHP
'<a href="'.htmlentities('\\1').htmlentities('\\2').//remove line break
'">'.htmlentities('\\1').htmlentities('\\2').'</a>'
EOPHP;
$str = preg_replace('#(http://)([^\s]*)#e', $replace, $str);
return $str;
}
?>
ignacio paz posse
22-Oct-2004 04:22
I needed to treat exclusively long urls and not shorter ones for which my client prefered to have their complete addresses displayed. Here's the function I end up with:
<?php
function auto_url($txt){
# (1) catch those with url larger than 71 characters
$pat = '/(http|ftp)+(?:s)?:(\\/\\/)'
.'((\\w|\\.)+)(\\/)?(\\S){71,}/i';
$txt = preg_replace($pat, "<a href=\"\\0\" target=\"_blank\">$1$2$3/...</a>",
$txt);
# (2) replace the other short urls provided that they are not contained inside an html tag already.
$pat = '/(?<!href=\")(http|ftp)+(s)?:' .
.'(\\/\\/)((\\w|\\.)+) (\\/)?(\\S)/i';
$txt = preg_replace($pat,"<a href=\"$0\" target=\"_blank\">$0</a> ",
$txt);
return $txt;
}
?>
Note the negative look behind expression added in the second instance for exempting those that are preceded with ' href=" ' (meaning that they were already put inside appropiate html tags by the previous expression)
(get rid of the space between question mark and the last parenthesis group in both regex, I need to put it like that to be able to post this comment)
gabe at mudbuginfo dot com
19-Oct-2004 04:39
It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.
<?php
$pattern = array('/one/', '/two/');
$replace = array('uno', 'dos');
$subject = "test one, one two, one two three";
echo preg_replace($pattern, $replace, $subject, 1);
?>
If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three
However, in reality this will actually return:
test uno, one dos, one two three
silasjpalmer at optusnet dot com dot au
19-Mar-2004 10:00
Using preg_rep to return extracts without breaking the middle of words
(useful for search results)
<?php
$string = "Don't split words";
echo substr($string, 0, 10); // Returns "Don't spli"
$pattern = "/(^.{0,10})(\W+.*$)/";
$replacement = "\${1}";
echo preg_replace($pattern, $replacement, $string); // Returns "Don't"
?>
steven -a-t- acko dot net
09-Feb-2004 01:45
People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.
The example in the docs for /e suffers from this mistake in fact.
With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string
He said: "You're here"
It would become:
He said: \"You\'re here\"
...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:
print ' He said: \"You\'re here\" ';
Output: He said: \"You're here\"
This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.
Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.
The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).
If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);
People using preg_replace with /e should at least be aware of this.
I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).
Peter
02-Nov-2003 09:00
Suppose you want to match '\n' (that's backslash-n, not newline). The pattern you want is not /\\n/ but /\\\\n/. The reason for this is that before the regex engine can interpret the \\ into \, PHP interprets it. Thus, if you write the first, the regex engine sees \n, which is reads as newline. Thus, you have to escape your backslashes twice: once for PHP, and once for the regex engine.
Travis
18-Oct-2003 06:37
I spent some time fighting with this, so hopefully this will help someone else.
Escaping a backslash (\) really involves not two, not three, but four backslashes to work properly.
So to match a single backslash, one should use:
preg_replace('/(\\\\)/', ...);
or to, say, escape single quotes not already escaped, one could write:
preg_replace("/([^\\\\])'/", "\$1\'", ...);
Anything else, such as the seemingly correct
preg_replace("/([^\\])'/", "\$1\'", ...);
gets evaluated as escaping the ] and resulting in an unterminated character class.
I'm not exactly clear on this issue of backslash proliferation, but it seems to involve the combination of PHP string processing and PCRE processing.
|  |