【问题标题】:.htaccess rule to remove from URL if present?.htaccess 规则从 URL 中删除(如果存在)?
【发布时间】:2020-02-08 11:04:01
【问题描述】:

我们的一个网站出现问题,有时我们会看到一个与我们网站上的任何真实页面无关的长 URL。例如 URL 应该是

https://example.com/browse 

但是一百分之一我们得到这个

https://example.com/index.php/module/action/param1/static/PFBC/js/jquery/rss/signup/static/js/jquery/templates/themes/love/img/icon/asset/css/legal/user/album/tipocorneo/me/browse

它并不总是相同的 URL,它会不时更改,但我的问题是……是否有我们可以使用的重写规则来简单地删除 URL 的额外部分(如果存在)?

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

<IfModule mod_rewrite.c>
    <IfModule mod_negotiation.c>
        Options -MultiViews -Indexes
    </IfModule>
    Options +FollowSymLinks

</IfModule>

<IfModule mod_php5.c>
    php_flag allow_url_include Off
    php_flag expose_php Off
</IfModule>

### Security and Spam ###

# Protect the repository directory
<IfModule mod_rewrite.c>
    RewriteRule "(^|/)\.git" - [F,L]
</IfModule>

ServerSignature Off

# Deny access to all CGI, Perl, Python, Bash, SQL, Template, INI configuration, cache, log, temporary and text files
<FilesMatch "\.(cgi|pl|py|sh|bash|sql|tpl|ini|cache|log|tmp|txt)$">
    <IfModule mod_authz_core.c>
        Require all denied
    </IfModule>
</FilesMatch>

# Leave open the humans.txt and robots.txt file
<FilesMatch "humans\.txt|robots\.txt">
    <IfModule mod_authz_core.c>
        Require all granted
    </IfModule>
</FilesMatch>

# Deny access for "composer.json"
<FilesMatch "composer\.json|sample\.htaccess">
    <IfModule mod_authz_core.c>
        Require all denied
    </IfModule>
</FilesMatch>

# Prevent .htaccess/.htpasswd from being downloaded
<Files ~ "^\.ht">
    <IfModule mod_authz_core.c>
        Require all denied
    </IfModule>
</Files>

<Limit GET POST PUT DELETE HEAD>
    <IfModule mod_authz_core.c>
        <RequireAll>
            Require all granted
            Require not env bad_bot
        </RequireAll>
    </IfModule>
</Limit>

ErrorDocument 400 /error/http/index?code=400
ErrorDocument 401 /error/http/index?code=401
ErrorDocument 402 /error/http/index?code=402
ErrorDocument 403 /error/http/index?code=403
ErrorDocument 404 /error
ErrorDocument 405 /error/http/index?code=405
ErrorDocument 500 /error/http/index?code=500
ErrorDocument 501 /error/http/index?code=501
ErrorDocument 502 /error/http/index?code=502
ErrorDocument 504 /error/http/index?code=504
ErrorDocument 505 /error/http/index?code=505

# URL Rewrite
<IfModule mod_rewrite.c>
    <IfModule mod_env.c>
        # Tell PHP that the mod_rewrite module is ENABLED.
        SetEnv HTTP_MOD_REWRITE On
    </IfModule>

    # Uncomment the following only if HTTPS is enabled. HSTS header increases security of your website & SEO
    # <IfModule mod_headers.c>
    # Header set Strict-Transport-Security "max-age=31536000; preload" env=HTTPS
    # </IfModule>

    # Remove www subdomain in the URL
    # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
    # RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

    # Force the URL to be https (only if you have an SSL certificate). May not be necessary if HSTS is enabled
    # RewriteCond %{SERVER_PORT} 80
    # RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule ^(.*)$ index.php?$1 [L,QSA]

    # Start Bad Bot Prvention
    RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
    RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\Wonder [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
    RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} ^libghttp [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} libwww-perl.* [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus
    RewriteRule ^.* - [F,L]
    # End Bad Bot Prevention
</IfModule>

# Modify Headers
<IfModule mod_headers.c>
    # Cache files
    <FilesMatch "\.(jpe?g|png|gif|ico|webp|swf|mp3|mp4|flv|webm|pdf)$">
        Header set Cache-Control "public"
        Header set Expires "Mon, 20 Apr 2060 20:00:00 GMT"
        Header unset Last-Modified
    </FilesMatch>

    # Cache JavaScript & CSS
    <FilesMatch "\.(js|css)$">
        Header set Cache-Control "public"
        Header set Expires "Mon, 20 Apr 2060 20:00:00 GMT"
        Header unset Last-Modified
    </FilesMatch>
</IfModule>

# Compress files
<IfModule mod_deflate.c>
    # Insert filter
    SetOutputFilter DEFLATE
    <IfModule mod_setenvif.c>
        # Netscape 4.x has some problems...
        BrowserMatch ^Mozilla/4 gzip-only-text/html
        # Netscape 4.06-4.08 have some more problems
        BrowserMatch ^Mozilla/4\.0[678] no-gzip
        # MSIE masquerades as Netscape, but it is fine
        BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
        # Don't compress images/archives/music/video/etc
        SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
        SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
        SetEnvIfNoCase Request_URI \.(?:avi|mov|mp3|mp4|rm|flv|swf|mp?g)$ no-gzip dont-vary
    </IfModule>
    <IfModule mod_headers.c>
        # Make sure proxies don't deliver the wrong content
        Header append Vary User-Agent env=!dont-vary
    </IfModule>
</IfModule>

# Enable Expirations
<IfModule mod_expires.c>
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
    # expire images/css/js/swf files after a month in the client's cache
    ExpiresByType text/css "access plus 31 days"
    ExpiresByType text/javascript "access plus 31 days"
    ExpiresByType application/javascript "access plus 31 days"
    ExpiresByType application/x-javascript "access plus 31 days"
    ExpiresByType application/x-gzip "access plus 31 days"
    ExpiresByType image/gif "access plus 31 days"
    ExpiresByType image/jpeg "access plus 31 days"
    ExpiresByType image/png "access plus 31 days"
    ExpiresByType application/x-shockwave-flash "access plus 31 days"
    ExpiresByType image/vnd.microsoft.icon "access plus 31 days"
    ExpiresByType image/x-icon "access plus 1 year"
</IfModule>

# For the videos extensions
#AddType video/ogg .ogg
AddType video/webm .webm
AddType video/mp4 .mp4
AddType application/rss+xml .xml

【问题讨论】:

  • 您目前如何在.htaccess 中路由 URL? (您目前在.htaccess 中有什么?-请将此添加到您的问题中。)您是否使用路径信息来路由 URL?大概index.php 是一个有效的文件?这个错误的 URL 返回什么响应?您是否确认这不是您网站上的错误造成的?这些请求来自哪里?
  • 您希望在重写规则集的末尾有一个备用规则,该规则捕获所有尚未处理的请求。
  • 是的 index.php 是一个有效的文件并返回 404 page not found。我会将 htaccess 文件添加到问题中。
  • 是的,回退是我的想法。
  • “删除多余的部分” - 你到底是什么意思?最终路径段之前的 URL 路径中的所有内容?虽然 404 可以说是这种格式错误的请求的首选响应?如果您确认这不是您的站点配置错误的结果(您必须先解决),那么这甚至可能是一个恶意请求。 (虽然它看起来确实有点像错误配置?)这些请求来自哪里 - 这应该会给你一个关于原因的重要线索?

标签: .htaccess url


【解决方案1】:

要将/index.php/foo/bar/baz/something 重定向到/something,您可以在.htaccess 文件的最顶部执行以下操作:

RewriteRule ^index\.php/[\w/]+/([\w]+)$ https://%{HTTP_HOST}/$1 [R=302,L]

速记字符类 \w 匹配单词字符(a-zA-Z0-9_),因此匹配给定的示例 URL。

此重定向还规范了该方案,因此将此指令放置在 HTTP 到 HTTPS 重定向之前,以避免在请求 HTTP 时进行第二次重定向。

但是,这种性质的 URL 确实表明您的站点/应用程序中存在错误配置。虽然,它也可能是一个恶意请求(尽管 URL 中的非垃圾邮件内容并没有真正暗示这一点)。

您应该检查服务器访问日志中的 Referer 是否有这些请求,这应该可以为您提供有关其来源的线索。

【讨论】:

  • 我自己非常接近解决这个问题,但经过数小时的跟踪和错误后,我终于设法编写了一条规则,删除了规则测试器上的段,但一旦应用于 .htaccess 文件,服务器就挂了.后来我在网上找到了帮助写这个的人(我稍后会添加他的名字和规则)。但就像你们中的一些人所说的那样,我所能想到的只是配置错误。我花了将近 10 个小时来寻找这个问题,终于找到了问题......它归结为一个 rss 缓存功能,由于托管服务提供商的 inode 限制,它变得混乱,这解释了为什么它有时只会发生
【解决方案2】:

这是donatJ写的正确答案

    RewriteEngine On

    RewriteRule ^index\.php/module/action/param1/.*/(.*+)$ /$1 [L,R=301]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-03-04
    • 2016-08-26
    • 2014-05-20
    • 1970-01-01
    • 2019-02-22
    相关资源
    最近更新 更多