Browse Source

Add permalinks plugin

This plugin enables a kind of permalink which can be used to refer to a
piece
of content which is resistant to the file being moved or renamed.

It does this by creating additional output html in `PERMALINK_PATH`
(default permalinks/) which include redirect code to point user at
original
page.

To work each page has to have an additional piece of metadata with the
key `permalink_id` (configurable with `PERMALINK_ID_METADATA_KEY`
which should remain static even through renames and should also
be unique on the site.

This can be generated automatically with the filetime_from_git module
and the `GIT_FILETIME_GENERATE_PERMALINK` option.

This should always be used with `GIT_FILETIME_FOLLOW` to ensure this
persists across renames.

Includes refactor of filetime_from_git module moving it more
to a generic module for useful git stuff
Chris Scutcher 8 years ago
parent
commit
f5d8976329

+ 2 - 0
Readme.rst

@@ -200,6 +200,8 @@ pelicanfly                Lets you type things like ``i ♥ :fa-coffee:`` in you
 
 
 Photos                    Add a photo or a gallery of photos to an article, or include photos in the body text. Resize photos as needed.
 Photos                    Add a photo or a gallery of photos to an article, or include photos in the body text. Resize photos as needed.
 
 
+permalink                 Enables a kind of permalink using html redirects.
+
 Pin to top                Pin Pelican's article(s) to top "Sticky article"
 Pin to top                Pin Pelican's article(s) to top "Sticky article"
 
 
 PlantUML                  Allows you to define UML diagrams directly into rst documents using the great PlantUML tool
 PlantUML                  Allows you to define UML diagrams directly into rst documents using the great PlantUML tool

+ 28 - 40
filetime_from_git/README.rst

@@ -1,10 +1,8 @@
 Use Git commit to determine page date
 Use Git commit to determine page date
 ======================================
 ======================================
-
-If your blog content is versioned via Git, this plugin will set articles'
-and pages' ``metadata['date']`` to correspond to that of the Git commit.
-This plugin depends on the ``gitpython`` python package, which can be
-installed via::
+If the blog content is managed by git repo, this plugin will set articles'
+and pages' ``metadata['date']`` according to git commit. This plugin depends
+on python package ``gitpython``, install::
 
 
     pip install gitpython
     pip install gitpython
 
 
@@ -27,46 +25,36 @@ operations like copy and move will not affect the generated results.
 If you don't want a given article or page to use the Git time, set the
 If you don't want a given article or page to use the Git time, set the
 metadata to ``gittime: off`` to disable it.
 metadata to ``gittime: off`` to disable it.
 
 
-You can also set ``GIT_FILETIME_FOLLOW`` to ``True`` in your settings to
-make the plugin follow file renames — i.e., ensure the creation date matches
-the original file creation date, not the date it was renamed.
-
-FAQ
----
-
-### Q. I get a GitCommandError: 'git rev-list ...' when I run the plugin. Why?
-Be sure to use the correct gitpython module for your distro's Git binary.
-Using the ``GIT_FILETIME_FOLLOW`` option to ``True`` may also make your
-problem go away, as that optino uses a different method to find commits.
-
-Some notes on Git
-~~~~~~~~~~~~~~~~~~
+Other options
+-------------
 
 
-* How to check if a file is managed by Git?
+### GIT_HISTORY_FOLLOWS_RENAME (default True)
+You can also set GIT_HISTORY_FOLLOWS_RENAME to True in your pelican config to 
+make the plugin follow file renames i.e. ensure the creation date matches
+the original file creation date, not the date is was renamed.
 
 
-.. code-block:: sh
+### GIT_GENERATE_PERMALINK (default False)
+Use in combination with permalink plugin to generate permalinks using the original
+commit sha 
 
 
-   git ls-files $file --error-unmatch
+### GIT_SHA_METADATA (default True)
+Adds sha of current and oldest commit to metadata
 
 
-* How to check if a file has changes?
+### GIT_FILETIME_FROM_GIT (default True)
+Enable filetime from git behaviour
 
 
-.. code-block:: sh
+Content specific options
+------------------------
+Adding metadata `gittime` = False will prevent the plugin trying to setting filetime for this
+content.
 
 
-   git diff $file            # compare staging area with working directory
-   git diff --cached $file   # compare HEAD with staged area
-   git diff HEAD $file       # compare HEAD with working directory
+Adding metadata `git_permalink` = False will prevent the plugin from adding permalink for this
+content.
 
 
-* How to get commits related to a file?
-
-.. code-block:: sh
-
-   git status $file
-
-With ``gitpython`` package, it's easier to parse committed time:
-
-.. code-block:: python
+FAQ
+---
 
 
-   repo = Git.repo('/path/to/repo')
-   commits = repo.commits(path='path/to/file')
-   commits[-1].committed_date    # oldest commit time
-   commits[0].committed_date     # latest commit time
+### Q. I get a GitCommandError: 'git rev-list ...' when I run the plugin. What's up?
+Be sure to use the correct gitpython module for your distros git binary.
+Using the GIT_HISTORY_FOLLOWS_RENAME option to True may also make your problem go away as it uses
+a different method to find commits.

+ 1 - 1
filetime_from_git/__init__.py

@@ -1 +1 @@
-from .filetime_from_git import *
+from .registration import *

+ 108 - 0
filetime_from_git/actions.py

@@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+import base64
+import hashlib
+import os
+import logging
+from pelican.utils import strftime
+from .utils import string_to_bool
+from .utils import datetime_from_timestamp
+from .registration import content_git_object_init
+
+
+logger = logging.getLogger(__name__)
+
+
+@content_git_object_init.connect
+def filetime_from_git(content, git_content):
+    '''
+    Update modification and creation times from git
+    '''
+    if not content.settings['GIT_FILETIME_FROM_GIT']:
+        # Disabled for everything
+        return
+
+    if not string_to_bool(content.metadata.get('gittime', 'yes')):
+        # Disable for this content
+        return
+
+    path = content.source_path
+    fs_creation_time = datetime_from_timestamp(os.stat(path).st_ctime, content)
+    fs_modified_time = datetime_from_timestamp(os.stat(path).st_mtime, content)
+
+    # 1. file is not managed by git
+    #    date: fs time
+    # 2. file is staged, but has no commits
+    #    date: fs time
+    # 3. file is managed, and clean
+    #    date: first commit time, update: last commit time or None
+    # 4. file is managed, but dirty
+    #    date: first commit time, update: fs time
+    if git_content.is_managed_by_git():
+        if git_content.is_committed():
+            content.date = git_content.get_oldest_commit_date()
+
+            if git_content.is_modified():
+                content.modified = fs_modified_time
+            else:
+                content.modified = git_content.get_newest_commit_date()
+        else:
+            # File isn't committed
+            content.date = fs_creation_time
+    else:
+        # file is not managed by git
+        content.date = fs_creation_time
+
+    # Clean up content attributes
+    if not hasattr(content, 'modified'):
+        content.modified = content.date
+
+    if hasattr(content, 'date'):
+        content.locale_date = strftime(content.date, content.date_format)
+
+    if hasattr(content, 'modified'):
+        content.locale_modified = strftime(
+            content.modified, content.date_format)
+
+
+@content_git_object_init.connect
+def git_sha_metadata(content, git_content):
+    '''
+    Add sha metadata to content
+    '''
+    if not content.settings['GIT_SHA_METADATA']:
+        return
+
+    if not git_content.is_committed():
+        return
+
+    content.metadata['gitsha_newest'] = str(git_content.get_newest_commit())
+    content.metadata['gitsha_oldest'] = str(git_content.get_oldest_commit())
+
+
+@content_git_object_init.connect
+def git_permalink(content, git_content):
+    '''
+    Add git based permalink id to content metadata
+    '''
+    if not content.settings['GIT_GENERATE_PERMALINK']:
+        return
+
+    if not string_to_bool(content.metadata.get('git_permalink', 'yes')):
+        # Disable for this content
+        return
+
+    if not git_content.is_committed():
+        return
+
+    permalink_hash = hashlib.sha1()
+    permalink_hash.update(str(git_content.get_oldest_commit()))
+    permalink_hash.update(str(git_content.get_oldest_filename()))
+    git_permalink_id = base64.urlsafe_b64encode(permalink_hash.digest())
+    permalink_id_metadata_key = content.settings['PERMALINK_ID_METADATA_KEY']
+
+    if permalink_id_metadata_key in content.metadata:
+        content.metadata[permalink_id_metadata_key] = (
+            ','.join((
+                content.metadata[permalink_id_metadata_key], git_permalink_id)))
+    else:
+        content.metadata[permalink_id_metadata_key] = git_permalink_id

+ 99 - 0
filetime_from_git/content_adapter.py

@@ -0,0 +1,99 @@
+# -*- coding: utf-8 -*-
+"""
+Wraps a content object to provide some git information
+"""
+import logging
+from pelican.utils import memoized
+from .git_wrapper import git_wrapper
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+
+class GitContentAdapter(object):
+    """
+    Wraps a content object to provide some git information
+    """
+    def __init__(self, content):
+        self.content = content
+        self.git = git_wrapper('.')
+        self.tz_name = content.settings.get('TIMEZONE', None)
+        self.follow = content.settings['GIT_HISTORY_FOLLOWS_RENAME']
+
+    @memoized
+    def is_committed(self):
+        '''
+        Is committed
+        '''
+        return len(self.get_commits()) > 0
+
+    @memoized
+    def is_modified(self):
+        '''
+        Has content been modified since last commit
+        '''
+        return self.git.is_file_modified(self.content.source_path)
+
+    @memoized
+    def is_managed_by_git(self):
+        '''
+        Is content stored in a file managed by git
+        '''
+        return self.git.is_file_managed_by_git(self.content.source_path)
+
+    @memoized
+    def get_commits(self):
+        '''
+        Get all commits involving this filename
+        :returns: List of commits newest to oldest
+        '''
+        if not self.is_managed_by_git():
+            return []
+        return self.git.get_commits(self.content.source_path, self.follow)
+
+    @memoized
+    def get_oldest_commit(self):
+        '''
+        Get oldest commit involving this file
+
+        :returns: Oldest commit
+        '''
+        return self.git.get_commits(self.content.source_path, self.follow)[-1]
+
+    @memoized
+    def get_newest_commit(self):
+        '''
+        Get oldest commit involving this file
+
+        :returns: Newest commit
+        '''
+        return self.git.get_commits(self.content.source_path, follow=False)[0]
+
+    @memoized
+    def get_oldest_filename(self):
+        '''
+        Get the original filename of this content. Implies follow
+        '''
+        commit_and_name_iter = self.git.get_commits_and_names_iter(
+            self.content.source_path)
+        _commit, name = commit_and_name_iter.next()
+        return name
+
+    @memoized
+    def get_oldest_commit_date(self):
+        '''
+        Get datetime of oldest commit involving this file
+
+        :returns: Datetime of oldest commit
+        '''
+        oldest_commit = self.get_oldest_commit()
+        return self.git.get_commit_date(oldest_commit, self.tz_name)
+
+    @memoized
+    def get_newest_commit_date(self):
+        '''
+        Get datetime of newest commit involving this file
+
+        :returns: Datetime of newest commit
+        '''
+        newest_commit = self.get_newest_commit()
+        return self.git.get_commit_date(newest_commit, self.tz_name)

+ 0 - 80
filetime_from_git/filetime_from_git.py

@@ -1,80 +0,0 @@
-# -*- coding: utf-8 -*-
-
-import os
-from pelican import signals, contents
-from pelican.utils import strftime, set_date_tzinfo
-from datetime import datetime
-from .git_wrapper import git_wrapper
-
-
-def datetime_from_timestamp(timestamp, content):
-    """
-    Helper function to add timezone information to datetime,
-    so that datetime is comparable to other datetime objects in recent versions
-    that now also have timezone information.
-    """
-    return set_date_tzinfo(
-        datetime.fromtimestamp(timestamp),
-        tz_name=content.settings.get('TIMEZONE', None))
-
-
-def filetime_from_git(content):
-    if isinstance(content, contents.Static):
-        return
-
-    git = git_wrapper('.')
-    tz_name = content.settings.get('TIMEZONE', None)
-
-    gittime = content.metadata.get('gittime', 'yes').lower()
-    gittime = gittime.replace("false", "no").replace("off", "no")
-    if gittime == "no":
-        return
-
-    # 1. file is not managed by git
-    #    date: fs time
-    # 2. file is staged, but has no commits
-    #    date: fs time
-    # 3. file is managed, and clean
-    #    date: first commit time, update: last commit time or None
-    # 4. file is managed, but dirty
-    #    date: first commit time, update: fs time
-    path = content.source_path
-    if git.is_file_managed_by_git(path):
-        commits = git.get_commits(
-            path, follow=content.settings.get('GIT_FILETIME_FOLLOW', False))
-
-        if len(commits) == 0:
-            # never commited, but staged
-            content.date = datetime_from_timestamp(
-                os.stat(path).st_ctime, content)
-        else:
-            # has commited
-            content.date = git.get_commit_date(
-                commits[-1], tz_name)
-
-            if git.is_file_modified(path):
-                # file has changed
-                content.modified = datetime_from_timestamp(
-                    os.stat(path).st_ctime, content)
-            else:
-                # file is not changed
-                if len(commits) > 1:
-                    content.modified = git.get_commit_date(
-                        commits[0], tz_name)
-    else:
-        # file is not managed by git
-        content.date = datetime_from_timestamp(os.stat(path).st_ctime, content)
-
-    if not hasattr(content, 'modified'):
-        content.modified = content.date
-
-    if hasattr(content, 'date'):
-        content.locale_date = strftime(content.date, content.date_format)
-
-    if hasattr(content, 'modified'):
-        content.locale_modified = strftime(
-            content.modified, content.date_format)
-
-
-def register():
-    signals.content_object_init.connect(filetime_from_git)

+ 29 - 5
filetime_from_git/git_wrapper.py

@@ -2,9 +2,10 @@
 """
 """
 Wrap python git interface for compatibility with older/newer version
 Wrap python git interface for compatibility with older/newer version
 """
 """
+import itertools
 import logging
 import logging
 import os
 import os
-from time import mktime, altzone
+from time import mktime
 from datetime import datetime
 from datetime import datetime
 from pelican.utils import set_date_tzinfo
 from pelican.utils import set_date_tzinfo
 from git import Git, Repo
 from git import Git, Repo
@@ -12,6 +13,15 @@ from git import Git, Repo
 DEV_LOGGER = logging.getLogger(__name__)
 DEV_LOGGER = logging.getLogger(__name__)
 
 
 
 
+def grouper(iterable, n, fillvalue=None):
+    '''
+    Collect data into fixed-length chunks or blocks
+    '''
+    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
+    args = [iter(iterable)] * n
+    return itertools.izip_longest(fillvalue=fillvalue, *args)
+
+
 class _GitWrapperCommon(object):
 class _GitWrapperCommon(object):
     '''
     '''
     Wrap git module to provide a more stable interface across versions
     Wrap git module to provide a more stable interface across versions
@@ -51,9 +61,23 @@ class _GitWrapperCommon(object):
         :param path: Path which we will find commits for
         :param path: Path which we will find commits for
         :returns: Sequence of commit objects. Newest to oldest
         :returns: Sequence of commit objects. Newest to oldest
         '''
         '''
-        commit_shas = self.git.log(
-            '--pretty=%H', '--follow', '--', path).splitlines()
-        return [self.repo.commit(shas) for shas in commit_shas]
+        return [
+            commit for commit, _ in self.get_commits_and_names_iter(
+                path)]
+
+    def get_commits_and_names_iter(self, path):
+        '''
+        Get all commits including a given path following renames
+        '''
+        log_result = self.git.log(
+            '--pretty=%H',
+            '--follow',
+            '--name-only',
+            '--',
+            path).splitlines()
+
+        for commit_sha, _, filename in grouper(log_result, 3):
+            yield self.repo.commit(commit_sha), filename
 
 
     def get_commits(self, path, follow=False):
     def get_commits(self, path, follow=False):
         '''
         '''
@@ -87,7 +111,7 @@ class _GitWrapperLegacy(_GitWrapperCommon):
         Get datetime of commit comitted_date
         Get datetime of commit comitted_date
         '''
         '''
         return set_date_tzinfo(
         return set_date_tzinfo(
-            datetime.fromtimestamp(mktime(commit.committed_date) - altzone),
+            datetime.fromtimestamp(mktime(commit.committed_date)),
             tz_name=tz_name)
             tz_name=tz_name)
 
 
 
 

+ 30 - 0
filetime_from_git/registration.py

@@ -0,0 +1,30 @@
+# -*- coding: utf-8 -*-
+"""
+Handle registration and setup for plugin
+"""
+import logging
+from blinker import signal
+from .content_adapter import GitContentAdapter
+from pelican import signals
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+content_git_object_init = signal('content_git_object_init')
+
+def send_content_git_object_init(content):
+    content_git_object_init.send(content, git_content=GitContentAdapter(content))
+
+
+def setup_option_defaults(pelican_inst):
+    pelican_inst.settings.setdefault('GIT_FILETIME_FROM_GIT', True)
+    pelican_inst.settings.setdefault('GIT_HISTORY_FOLLOWS_RENAME', True)
+    pelican_inst.settings.setdefault('GIT_SHA_METADATA', True)
+    pelican_inst.settings.setdefault('GIT_GENERATE_PERMALINK', False)
+
+
+def register():
+    signals.content_object_init.connect(send_content_git_object_init)
+    signals.initialized.connect(setup_option_defaults)
+
+    # Import actions
+    from . import actions

+ 39 - 0
filetime_from_git/utils.py

@@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+"""
+Utility functions
+"""
+from datetime import datetime
+import logging
+from pelican.utils import set_date_tzinfo
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+
+STRING_BOOLS = {
+    'yes': True,
+    'no': False,
+    'true': True,
+    'false': False,
+    '0': False,
+    '1': True,
+    'on': True,
+    'off': False,
+}
+
+
+def string_to_bool(string):
+    '''
+    Convert a string to a bool based
+    '''
+    return STRING_BOOLS[string.strip().lower()]
+
+
+def datetime_from_timestamp(timestamp, content):
+    """
+    Helper function to add timezone information to datetime,
+    so that datetime is comparable to other datetime objects in recent versions
+    that now also have timezone information.
+    """
+    return set_date_tzinfo(
+        datetime.fromtimestamp(timestamp),
+        tz_name=content.settings.get('TIMEZONE', None))

+ 25 - 0
permalinks/README.md

@@ -0,0 +1,25 @@
+permalink
+=========
+
+This plugin enables a kind of permalink which can be used to refer to a piece
+of content which is resistant to the file being moved or renamed.
+
+It does this by creating additional output html in `PERMALINK_PATH`
+(default permalinks/) which include redirect code to point user at original
+page.
+
+To work each page has to have an additional piece of metadata with the key
+`permalink_id` (configurable with `PERMALINK_ID_METADATA_KEY`
+which should remain static even through renames and should also
+be unique on the site.
+
+This can be generated automatically with the filetime_from_git module and
+the `GIT_FILETIME_GENERATE_PERMALINK` option. 
+This should always be used with `GIT_FILETIME_FOLLOW` to ensure this
+persists across renames.
+
+
+Hacky redirects
+---------------
+To make this work with things like github.io I'm forced to use HTML and
+Javascript redirects rather than HTTP redirects which is obviously suboptimal.

+ 1 - 0
permalinks/__init__.py

@@ -0,0 +1 @@
+from .permalinks import register

+ 149 - 0
permalinks/permalinks.py

@@ -0,0 +1,149 @@
+# -*- coding: utf-8 -*-
+"""
+This plugin enables a kind of permalink which can be used to refer to a piece
+of content which is resistant to the file being moved or renamed.
+"""
+import logging
+import itertools
+import os
+import os.path
+from pelican import signals
+from pelican.generators import Generator
+from pelican.utils import mkdir_p
+from pelican.utils import clean_output_dir
+
+logger = logging.getLogger(__name__)
+
+
+def article_url(content):
+    '''
+    Get the URL for an item of content
+    '''
+    return '{content.settings[SITEURL]}/{content.url}'.format(
+        content=content).encode('utf-8')
+
+
+REDIRECT_STRING = '''
+<!DOCTYPE HTML>
+<html lang="en-US">
+    <head>
+        <meta charset="UTF-8">
+        <meta http-equiv="refresh" content="0;url={url}">
+        <script type="text/javascript">
+            window.location.href = "{url}"
+        </script>
+        <title>Page Redirection to {title}</title>
+    </head>
+    <body>
+        If you are not redirected automatically, follow the
+        <a href='{url}'>link to {title}</a>
+    </body>
+</html>
+'''
+
+
+class PermalinkGenerator(Generator):
+    '''
+    Generate a redirect page for every item of content with a
+    permalink_id metadata
+    '''
+    def generate_context(self):
+        '''
+        Setup context
+        '''
+        self.permalink_output_path = os.path.join(
+            self.output_path, self.settings['PERMALINK_PATH'])
+        self.permalink_id_metadata_key = self.settings['PERMALINK_ID_METADATA_KEY']
+
+    def generate_output(self, writer=None):
+        '''
+        Generate redirect files
+        '''
+        logger.info(
+            'Generating permalink files in %r', self.permalink_output_path)
+
+        clean_output_dir(self.permalink_output_path, [])
+        mkdir_p(self.permalink_output_path)
+        for content in itertools.chain(
+                self.context['articles'], self.context['pages']):
+
+            for permalink_id in content.get_permalink_ids_iter():
+                permalink_path = os.path.join(
+                    self.permalink_output_path, permalink_id) + '.html'
+
+                redirect_string = REDIRECT_STRING.format(
+                    url=article_url(content),
+                    title=content.title)
+                open(permalink_path, 'w').write(redirect_string)
+
+
+def get_permalink_ids_iter(self):
+    '''
+    Method to get permalink ids from content. To be bound to the class last thing
+    '''
+    permalink_id_key = self.settings['PERMALINK_ID_METADATA_KEY']
+    permalink_ids_raw = self.metadata.get(permalink_id_key, '')
+
+    for permalink_id in permalink_ids_raw.split(','):
+        if permalink_id:
+            yield permalink_id.strip()
+
+
+def get_permalink_ids(self):
+    '''
+    Method to get permalink ids from content. To be bound to the class last thing
+    '''
+    return list(self.get_permalink_ids_iter())
+
+def get_permalink_path(self):
+    """Get just path component of permalink."""
+    try:
+        first_permalink_id = self.get_permalink_ids_iter().next()
+    except StopIteration:
+        return None
+
+    return '/{settings[PERMALINK_PATH]}/{first_permalink}'.format(
+        settings=self.settings, first_permalink=first_permalink_id)
+
+
+def get_permalink_url(self):
+    '''
+    Get a permalink URL
+    '''
+    return "/".join((self.settings['SITEURL'], self.get_permalink_path()))
+
+
+PERMALINK_METHODS = (
+    get_permalink_ids_iter,
+    get_permalink_ids,
+    get_permalink_url,
+    get_permalink_path,
+)
+
+
+def add_permalink_methods(content_inst):
+    '''
+    Add permalink methods to object
+    '''
+    for permalink_method in PERMALINK_METHODS:
+        setattr(
+            content_inst,
+            permalink_method.__name__,
+            permalink_method.__get__(content_inst, content_inst.__class__))
+
+def add_permalink_option_defaults(pelicon_inst):
+    '''
+    Add perlican defaults
+    '''
+    pelicon_inst.settings.setdefault('PERMALINK_PATH', 'permalinks')
+    pelicon_inst.settings.setdefault('PERMALINK_ID_METADATA_KEY', 'permalink_id')
+
+
+def get_generators(_pelican_object):
+    return PermalinkGenerator
+
+
+def register():
+    signals.get_generators.connect(get_generators)
+    signals.content_object_init.connect(add_permalink_methods)
+    signals.initialized.connect(add_permalink_option_defaults)