Преглед на файлове

Sitemap: exclude URL patterns via regex

Samael500 преди 9 години
родител
ревизия
871c3bec7f
променени са 2 файла, в които са добавени 23 реда и са изтрити 9 реда
  1. 13 6
      sitemap/Readme.rst
  2. 10 3
      sitemap/sitemap.py

+ 13 - 6
sitemap/Readme.rst

@@ -1,11 +1,10 @@
 Sitemap
 -------
 
-The sitemap plugin generates plain-text or XML sitemaps. You can use the
-``SITEMAP`` variable in your settings file to configure the behavior of the
-plugin.
+This plugin generates plain-text or XML sitemaps. You can use the ``SITEMAP``
+variable in your settings file to configure the behavior of the plugin.
 
-The ``SITEMAP`` variable must be a Python dictionary and can contain three keys:
+The ``SITEMAP`` variable must be a Python dictionary and can contain these keys:
 
 - ``format``, which sets the output format of the plugin (``xml`` or ``txt``)
 
@@ -31,8 +30,16 @@ The ``SITEMAP`` variable must be a Python dictionary and can contain three keys:
 
   Valid frequency values are ``always``, ``hourly``, ``daily``, ``weekly``, ``monthly``,
   ``yearly`` and ``never``.
-  
-You can exclude URLs from being included in the sitemap by adding them to the sitemapExclude array in format 'sitemapExclude = ['login.html', 'signup.html']`.  
+
+You can exclude URLs from being included in the sitemap via regular expressions.
+For example, to exclude all URLs containing ``tag/`` or ``category/`` you can
+use the following ``SITEMAP`` setting.
+
+.. code-block:: python
+
+    SITEMAP = {
+        'exclude': ['tag/', 'category/']
+    }
 
 If a key is missing or a value is incorrect, it will be replaced with the
 default value.

+ 10 - 3
sitemap/sitemap.py

@@ -8,6 +8,7 @@ The sitemap plugin generates plain-text or XML sitemaps.
 
 from __future__ import unicode_literals
 
+import re
 import collections
 import os.path
 
@@ -81,6 +82,8 @@ class SitemapGenerator(object):
             'pages': 0.5
         }
 
+        self.sitemapExclude = []
+
         config = settings.get('SITEMAP', {})
 
         if not isinstance(config, dict):
@@ -89,6 +92,7 @@ class SitemapGenerator(object):
             fmt = config.get('format')
             pris = config.get('priorities')
             chfreqs = config.get('changefreqs')
+            self.sitemapExclude = config.get('exclude', [])
 
             if fmt not in ('xml', 'txt'):
                 warning("sitemap plugin: SITEMAP['format'] must be `txt' or `xml'")
@@ -163,10 +167,13 @@ class SitemapGenerator(object):
         pageurl = '' if page.url == 'index.html' else page.url
         
         #Exclude URLs from the sitemap:
-        sitemapExclude = []
-
         if self.format == 'xml':
-            if pageurl not in sitemapExclude:
+            flag = False
+            for regstr in self.sitemapExclude:
+                if re.match(regstr, pageurl):
+                    flag = True
+                    break
+            if not flag:
                 fd.write(XML_URL.format(self.siteurl, pageurl, lastmod, chfreq, pri))
         else:
             fd.write(self.siteurl + '/' + pageurl + '\n')