【问题标题】:Generate SEO friendly URLs (slugs) [closed]生成 SEO 友好的 URL(slug)[关闭]
【发布时间】:2011-07-15 10:06:31
【问题描述】:

定义

来自Wikipedia

slug 是 URL 的一部分,它使用 人类可读的关键字。

为了方便用户键入 URL,通常会使用特殊字符 删除或更换为好。例如,重音字符是 通常用英文字母代替;标点 标记通常被去除;和空格(必须编码为 %20 或 +) 被破折号 (-) 或下划线 (_) 替换,它们是 更美观。

上下文

我开发了一个照片分享网站,用户可以在上面上传、分享和查看照片。

所有页面都是自动生成的,无需我掌握标题。因为照片的标题或用户名可能包含重音字符或空格,所以我需要一个函数来自动创建 slug 并保持可读的 URL。

我创建了以下函数,它替换重音字符 (âèêëçî)、删除标点符号和坏字符 (#@&~^!) 并转换破折号中的空格。

问题

  • 你觉得这个功能怎么样?
  • 您知道创建 slug 的其他函数吗?

代码

:

function sluggable($str) {

    $before = array(
        'àáâãäåòóôõöøèéêëðçìíîïùúûüñšž',
        '/[^a-z0-9\s]/',
        array('/\s/', '/--+/', '/---+/')
    );
 
    $after = array(
        'aaaaaaooooooeeeeeciiiiuuuunsz',
        '',
        '-'
    );

    $str = strtolower($str);
    $str = strtr($str, $before[0], $after[0]);
    $str = preg_replace($before[1], $after[1], $str);
    $str = trim($str);
    $str = preg_replace($before[2], $after[2], $str);
 
    return $str;
}

【问题讨论】:

  • 法国人喜欢蜗牛a' la escargot
  • 喜欢使用已经完成的代码:code.google.com/p/php-slugs ?
  • 您可能想在此处删除此问题并在codereview.stackexchange.com 上重新发布,因为那里的反馈和改进更主题化。
  • @maniator: wiki: Slug
  • 法语中没有 áâãäåòóõöøðìíñšž。 (瑞典、捷克等,但不是法语。)

标签: php php string seo friendly-url slug


【解决方案1】:

我喜欢谷歌代码解决方案中的 php-slugs 代码。但如果你想要一个更简单的支持 UTF-8 的:

function format_uri( $string, $separator = '-' )
{
    $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
    $special_cases = array( '&' => 'and', "'" => '');
    $string = mb_strtolower( trim( $string ), 'UTF-8' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
    $string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
    $string = preg_replace("/[$separator]+/u", "$separator", $string);
    return $string;
}

所以

echo format_uri("#@&~^!âèêëçî");

输出

-and-aeeeci

【讨论】:

  • Here's 被转换为here-039-s。更好的选择是简单地删除撇号。
【解决方案2】:

有些人已经链接到 google.com 上的“php-slugs”,但现在他们的页面看起来有点乱,所以如果有人需要,这里是:

// source: https://code.google.com/archive/p/php-slugs/

function my_str_split($string)
{
    $slen=strlen($string);
    for($i=0; $i<$slen; $i++)
    {
        $sArray[$i]=$string{$i};
    }
    return $sArray;
}

function noDiacritics($string)
{
    //cyrylic transcription
    $cyrylicFrom = array('А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ё', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я');
    $cyrylicTo   = array('A', 'B', 'W', 'G', 'D', 'Ie', 'Io', 'Z', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'Ch', 'C', 'Tch', 'Sh', 'Shtch', '', 'Y', '', 'E', 'Iu', 'Ia', 'a', 'b', 'w', 'g', 'd', 'ie', 'io', 'z', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'ch', 'c', 'tch', 'sh', 'shtch', '', 'y', '', 'e', 'iu', 'ia'); 


    $from = array("Á", "À", "Â", "Ä", "Ă", "Ā", "Ã", "Å", "Ą", "Æ", "Ć", "Ċ", "Ĉ", "Č", "Ç", "Ď", "Đ", "Ð", "É", "È", "Ė", "Ê", "Ë", "Ě", "Ē", "Ę", "Ə", "Ġ", "Ĝ", "Ğ", "Ģ", "á", "à", "â", "ä", "ă", "ā", "ã", "å", "ą", "æ", "ć", "ċ", "ĉ", "č", "ç", "ď", "đ", "ð", "é", "è", "ė", "ê", "ë", "ě", "ē", "ę", "ə", "ġ", "ĝ", "ğ", "ģ", "Ĥ", "Ħ", "I", "Í", "Ì", "İ", "Î", "Ï", "Ī", "Į", "IJ", "Ĵ", "Ķ", "Ļ", "Ł", "Ń", "Ň", "Ñ", "Ņ", "Ó", "Ò", "Ô", "Ö", "Õ", "Ő", "Ø", "Ơ", "Œ", "ĥ", "ħ", "ı", "í", "ì", "i", "î", "ï", "ī", "į", "ij", "ĵ", "ķ", "ļ", "ł", "ń", "ň", "ñ", "ņ", "ó", "ò", "ô", "ö", "õ", "ő", "ø", "ơ", "œ", "Ŕ", "Ř", "Ś", "Ŝ", "Š", "Ş", "Ť", "Ţ", "Þ", "Ú", "Ù", "Û", "Ü", "Ŭ", "Ū", "Ů", "Ų", "Ű", "Ư", "Ŵ", "Ý", "Ŷ", "Ÿ", "Ź", "Ż", "Ž", "ŕ", "ř", "ś", "ŝ", "š", "ş", "ß", "ť", "ţ", "þ", "ú", "ù", "û", "ü", "ŭ", "ū", "ů", "ų", "ű", "ư", "ŵ", "ý", "ŷ", "ÿ", "ź", "ż", "ž");
    $to   = array("A", "A", "A", "AE", "A", "A", "A", "A", "A", "AE", "C", "C", "C", "C", "C", "D", "D", "D", "E", "E", "E", "E", "E", "E", "E", "E", "G", "G", "G", "G", "G", "a", "a", "a", "ae", "ae", "a", "a", "a", "a", "ae", "c", "c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e", "e", "e", "e", "e", "g", "g", "g", "g", "g", "H", "H", "I", "I", "I", "I", "I", "I", "I", "I", "IJ", "J", "K", "L", "L", "N", "N", "N", "N", "O", "O", "O", "OE", "O", "O", "O", "O", "CE", "h", "h", "i", "i", "i", "i", "i", "i", "i", "i", "ij", "j", "k", "l", "l", "n", "n", "n", "n", "o", "o", "o", "oe", "o", "o", "o", "o", "o", "R", "R", "S", "S", "S", "S", "T", "T", "T", "U", "U", "U", "UE", "U", "U", "U", "U", "U", "U", "W", "Y", "Y", "Y", "Z", "Z", "Z", "r", "r", "s", "s", "s", "s", "ss", "t", "t", "b", "u", "u", "u", "ue", "u", "u", "u", "u", "u", "u", "w", "y", "y", "y", "z", "z", "z");


    $from = array_merge($from, $cyrylicFrom);
    $to   = array_merge($to, $cyrylicTo);

    $newstring=str_replace($from, $to, $string);
    return $newstring;
}

function makeSlugs($string, $maxlen=0)
{
    $newStringTab=array();
    $string=strtolower(noDiacritics($string));
    if(function_exists('str_split'))
    {
        $stringTab=str_split($string);
    }
    else
    {
        $stringTab=my_str_split($string);
    }

    $numbers=array("0","1","2","3","4","5","6","7","8","9","-");
    //$numbers=array("0","1","2","3","4","5","6","7","8","9");

    foreach($stringTab as $letter)
    {
        if(in_array($letter, range("a", "z")) || in_array($letter, $numbers))
        {
            $newStringTab[]=$letter;
        }
        elseif($letter==" ")
        {
            $newStringTab[]="-";
        }
    }

    if(count($newStringTab))
    {
        $newString=implode($newStringTab);
        if($maxlen>0)
        {
            $newString=substr($newString, 0, $maxlen);
        }

        $newString = removeDuplicates('--', '-', $newString);
    }
    else
    {
        $newString='';
    }

    return $newString;
}


function checkSlug($sSlug)
{
    if(preg_match("/^[a-zA-Z0-9]+[a-zA-Z0-9\-]*$/", $sSlug) == 1)
    {
        return true;
    }

    return false;
}

function removeDuplicates($sSearch, $sReplace, $sSubject)
{
    $i=0;
    do{

        $sSubject=str_replace($sSearch, $sReplace, $sSubject);
        $pos=strpos($sSubject, $sSearch);

        $i++;
        if($i>100)
        {
            die('removeDuplicates() loop error');
        }

    }while($pos!==false);

    return $sSubject;
}

【讨论】:

  • 与其提供大量可怕且不完整的替换列表,不如规范化字符串,然后删除非 ascii 字符
  • @BlueRaja-DannyPflughoeft 由于这是 Google 的原始代码,我不打算对其进行编辑。我鼓励您通过改进此代码添加另一个答案。
  • 我编辑了德语变音符号的匹配。我认为 Ä 应该是 AE、Ü UE 等等。
  • @SirDerpington 我想知道这个答案是否应该是可编辑的,因为它实际上是code.google.com/archive/p/php-slugs的复制粘贴
  • @rybo111 是的,我知道你的意思。我认为应该是因为 - 我不知道为什么 - $to 和 $from 数组中的某些字符丢失了。它只是说“?”而不是实际的字符。
【解决方案3】:
    setlocale(LC_ALL, 'en_US.UTF8');

        function slugify($text)
        {
          // replace non letter or digits by -
          $text = preg_replace('~[^\\pL\d]+~u', '-', $text);

          // trim
          $text = trim($text, '-');

          // transliterate
          $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

          // lowercase
          $text = strtolower($text);

          // remove unwanted characters
          $text = preg_replace('~[^-\w]+~', '', $text);

          if (empty($text))
          {
            return 'n-a';
          }

          return $text;
        }


$slug = slugify($var);

【讨论】:

    【解决方案4】:

    我在网上找到了这个,完全按照你的意愿做,但保留了情况。

    function sluggable($p) {
        $ts = array("/[À-Å]/","/Æ/","/Ç/","/[È-Ë]/","/[Ì-Ï]/","/Ð/","/Ñ/","/[Ò-ÖØ]/","/×/","/[Ù-Ü]/","/[Ý-ß]/","/[à-å]/","/æ/","/ç/","/[è-ë]/","/[ì-ï]/","/ð/","/ñ/","/[ò-öø]/","/÷/","/[ù-ü]/","/[ý-ÿ]/");
        $tn = array("A","AE","C","E","I","D","N","O","X","U","Y","a","ae","c","e","i","d","n","o","x","u","y");
        return preg_replace($ts,$tn, $p);
    }
    

    source

    【讨论】:

    • 这不是很健壮,因为它只能处理列出的字符。西里尔文呢?希伯来语?其他晦涩的非 ASCII 符号,例如 ²º 等?
    • 但是 preg_replace() 比 strtr() 慢。
    【解决方案5】:

    这真的很好用。返回正确的干净 url slug。

    $string = '(1234) S*m@#ith S)&+*t `E}{xam)ple?>land   - - 1!_2)#3)(*4""5';
    
    // remove all non alphanumeric characters except spaces
    $clean =  preg_replace('/[^a-zA-Z0-9\s]/', '', strtolower($string)); 
    
    // replace one or multiple spaces into single dash (-)
    $clean =  preg_replace('!\s+!', '-', $clean); 
    
    echo $clean; // 1234-smith-st-exampleland-12345
    

    【讨论】:

    • 这段代码会导致消除所有不在正则表达式中的字符,它就像一个白名单解决方案。但要小心,因为大多数国际程序员都需要一种将“cafe”转换为“cafe”而不是像这段代码那样转换为“caf”的解决方案。
    【解决方案6】:
    function seourl($phrase, $maxLength = 100000000000000) {
            $result = strtolower($phrase);
    
            $result = preg_replace("~[^A-Za-z0-9-\s]~", "", $result);
            $result = trim(preg_replace("~[\s-]+~", " ", $result));
            $result = trim(substr($result, 0, $maxLength));
            $result = preg_replace("~\s~", "-", $result);
    
            return $result;
        }
    

    【讨论】:

      【解决方案7】:
      function remove_accents($string)
      {
          $a = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ';
          $b = 'aaaaaaaceeeeiiiidnoooooouuuuybsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
          $string = strtr(utf8_decode($string), utf8_decode($a), $b);
          return utf8_encode($string);
      }
      
      function format_slug($title)
      {
          $title = remove_accents($title);
          $title = trim(strtolower($title));
          $title = preg_replace('#[^a-z0-9\\-/]#i', '_', $title);
          return trim(preg_replace('/-+/', '-', $title), '-/');
      }
      

      使用:回显 format_slug($var);

      【讨论】:

        【解决方案8】:

        这是我们使用的类,虽然它可以执行单独的操作,但它也能够将字符串(或路径)转换为 slug 版本(只有 a-z0-9- 在最终版本中输出)。它还做了一些额外的事情,例如将 & 符号 (&amp;) 转换为单词 and

        用法:

        echo (new Str('My Cover Letter & Résumé'))->slugify()->__toString();
        

        我的求职信和简历

        Str类:

        <?php
        
        use RuntimeException;
        use Transliterator;
        
        class Str
        {
            /**
             * Will hold an instance of Transliterator
             * for removing accents from characters.
             * Same instance for all instances of this class is fine.
             */
            private static $accent_transliterator;
            private $string;
        
            public function __construct(string $string)
            {
                $this->string = $string;
            }
        
            public function __toString()
            {
                return $this->string;
            }
        
            public function cleanForUrlPath(): self
            {
                $path = '';
        
                // Loop through path sections (separated by `/`)
                // and slugify each section.
                foreach (explode('/', $this->string) as $section) {
                    $section = (new static($section))->slugify()->__toString();
                    if ($section !== '') {
                        $path .= "/$section";
                    }
                }
        
                // Save the cleaned path
                $this->string = "$path/";
        
                return $this;
            }
        
            public function cleanUpSlugDashes(): self
            {
                // Remove extra dashes
                $this->string = preg_replace('/--+/', '-', $this->string);
        
                // Remove leading and trailing dashes
                $this->string = trim($this->string, '-');
        
                return $this;
            }
        
            /**
             * Replace symbols with word replacements.
             * Eg, `&` becomes ` and `.
             */
            public function convertSymbolsToWords(): self
            {
                $this->string = strtr($this->string, [
                    '@' => ' at ',
                    '%' => ' percent ',
                    '&' => ' and ',
                ]);
        
                return $this;
            }
        
            public static function getSpacerCharacters(
                array $with = [],
                array $without = []
            ): array {
                return array_unique(array_diff(array_merge([
                    ' ', // space
                    '…', // ellipsis
                    '–', // en dash
                    '—', // em dash
                    '/', // slash
                    '\\', // backslash
                    ':', // colon
                    ';', // semi-colon
                    '.', // period
                    '+', // plus sign
                    '#', // pound sign
                    '~', // tilde
                    '_', // underscore
                    '|', // pipe
                ], array_values($with)), array_values($without)));
            }
        
            public function lower(): self
            {
                $this->string = strtolower($this->string);
        
                return $this;
            }
        
            /**
             * Replaces all accented characters
             * with similar ASCII characters.
             */
            public function removeAccents(): self
            {
                // If no accented characters are found,
                // return the given string as-is.
                if (!preg_match('/[\x80-\xff]/', $this->string)) {
                    return $this;
                }
        
                // Instantiate Transliterator if we haven't already
                if (!isset(self::$accent_transliterator)) {
                    self::$accent_transliterator = Transliterator::create(
                        'Any-Latin; Latin-ASCII;'
                    );
        
                    if (self::$accent_transliterator === null) {
                        // @codeCoverageIgnoreStart
                        throw new RuntimeException(
                            'Could not create a transliterator'
                        );
                        // @codeCoverageIgnoreEnd
                    }
                }
        
                // Save transliterated string
                $this->string = (self::$accent_transliterator)->transliterate(
                    $this->string
                );
        
                return $this;
            }
        
            public function replace($search, $replace)
            {
                $this->string = str_replace($search, $replace, $this->string);
        
                return $this;
            }
        
            public function replaceRegex($pattern, $replacement): self
            {
                $this->string = preg_replace($pattern, $replacement, $this->string);
        
                return $this;
            }
        
            /**
             * @param int $length number of bytes to shorten the string to
             */
            public function shorten(int $length): self
            {
                // If the string is already `$length` or shorter,
                // return it as-is.
                if (strlen($this->string) <= $length) {
                    return $this;
                }
        
                // Shorten by 2 additional characters
                // to account for the three periods that are appended.
                // Only need to shorten by 2
                // as there's always at least one character (space) removed
                // when the last word is popped off of the array.
                $length -= 2;
        
                // Shorten the string to `$length` and split into words
                $words = explode(' ', substr($this->string, 0, $length));
        
                // Discard the last word as it's a partial word,
                // or empty if the last character happened to be a space.
                // If there's only one word,
                // then it was longer than `$length`
                // and the truncated version should be returned.
                if (count($words) > 1) {
                    array_pop($words);
                }
        
                // Save the shortened string with "..." appended
                $this->string = rtrim(implode(' ', $words), ':').'...';
        
                return $this;
            }
        
            public function slugify(): self
            {
                // If the string is already a slug
                if (preg_match('/^[a-z0-9\\-]+$/', $this->string)) {
                    return $this;
                }
        
                // - Normalize accents
                // - Normalize symbols
                // - Lowercase
                // - Replace space characters with dashes
                // - Remove non-slug characters
                // - Clean up leading, trailing, and consecutive dashes
                return $this
                    ->removeAccents()
                    ->convertSymbolsToWords()
                    ->lower()
                    ->spacersToDashes()
                    ->replaceRegex('/([^a-z0-9\\-]+)/', '')
                    ->cleanUpSlugDashes();
            }
        
            public function spacersToDashes(): self
            {
                return $this->replace(static::getSpacerCharacters(), '-');
            }
        }
        

        【讨论】:

        • @NorbertBoros 我发布这篇文章已经 7 年多了,虽然大部分内容保持不变(一些清理并将其放入一个独立的类中),但最大的变化是remove_accents() 已被完全重写以利用 PHP's Transliterator class。保留第一个if 语句,然后函数的其余部分可以替换为$transliterator = Transliterator::create('Any-Latin; Latin-ASCII;'); return $transliterator-&gt;transliterate($string);。我也会尝试更新答案。
        • 我实际上将$transliterator 保存到班级以避免每次都重建它。
        • @NorbertBoros 回答更新如果你想要清理版本。乍一看,我认为它适用于 PHP 7.0+。
        • 谢谢!我会尽快测试它。
        猜你喜欢
        • 2011-04-02
        • 2016-01-18
        • 2014-07-09
        • 2011-03-05
        • 2017-02-13
        • 2015-02-04
        • 1970-01-01
        • 2014-11-06
        • 1970-01-01
        相关资源
        最近更新 更多