소스 검색

Add permalinks plugin

This plugin enables a kind of permalink which can be used to refer to a
piece
of content which is resistant to the file being moved or renamed.

It does this by creating additional output html in `PERMALINK_PATH`
(default permalinks/) which include redirect code to point user at
original
page.

To work each page has to have an additional piece of metadata with the
key `permalink_id` (configurable with `PERMALINK_ID_METADATA_KEY`
which should remain static even through renames and should also
be unique on the site.

This can be generated automatically with the filetime_from_git module
and the `GIT_FILETIME_GENERATE_PERMALINK` option.

This should always be used with `GIT_FILETIME_FOLLOW` to ensure this
persists across renames.

Includes refactor of filetime_from_git module moving it more
to a generic module for useful git stuff
Chris Scutcher 7 년 전
부모
커밋
f5d8976329

+ 2 - 0
Readme.rst

@@ -200,6 +200,8 @@ pelicanfly                Lets you type things like ``i ♥ :fa-coffee:`` in you
 
 Photos                    Add a photo or a gallery of photos to an article, or include photos in the body text. Resize photos as needed.
 
+permalink                 Enables a kind of permalink using html redirects.
+
 Pin to top                Pin Pelican's article(s) to top "Sticky article"
 
 PlantUML                  Allows you to define UML diagrams directly into rst documents using the great PlantUML tool

+ 28 - 40
filetime_from_git/README.rst

@@ -1,10 +1,8 @@
 Use Git commit to determine page date
 ======================================
-
-If your blog content is versioned via Git, this plugin will set articles'
-and pages' ``metadata['date']`` to correspond to that of the Git commit.
-This plugin depends on the ``gitpython`` python package, which can be
-installed via::
+If the blog content is managed by git repo, this plugin will set articles'
+and pages' ``metadata['date']`` according to git commit. This plugin depends
+on python package ``gitpython``, install::
 
     pip install gitpython
 
@@ -27,46 +25,36 @@ operations like copy and move will not affect the generated results.
 If you don't want a given article or page to use the Git time, set the
 metadata to ``gittime: off`` to disable it.
 
-You can also set ``GIT_FILETIME_FOLLOW`` to ``True`` in your settings to
-make the plugin follow file renames — i.e., ensure the creation date matches
-the original file creation date, not the date it was renamed.
-
-FAQ
----
-
-### Q. I get a GitCommandError: 'git rev-list ...' when I run the plugin. Why?
-Be sure to use the correct gitpython module for your distro's Git binary.
-Using the ``GIT_FILETIME_FOLLOW`` option to ``True`` may also make your
-problem go away, as that optino uses a different method to find commits.
-
-Some notes on Git
-~~~~~~~~~~~~~~~~~~
+Other options
+-------------
 
-* How to check if a file is managed by Git?
+### GIT_HISTORY_FOLLOWS_RENAME (default True)
+You can also set GIT_HISTORY_FOLLOWS_RENAME to True in your pelican config to 
+make the plugin follow file renames i.e. ensure the creation date matches
+the original file creation date, not the date is was renamed.
 
-.. code-block:: sh
+### GIT_GENERATE_PERMALINK (default False)
+Use in combination with permalink plugin to generate permalinks using the original
+commit sha 
 
-   git ls-files $file --error-unmatch
+### GIT_SHA_METADATA (default True)
+Adds sha of current and oldest commit to metadata
 
-* How to check if a file has changes?
+### GIT_FILETIME_FROM_GIT (default True)
+Enable filetime from git behaviour
 
-.. code-block:: sh
+Content specific options
+------------------------
+Adding metadata `gittime` = False will prevent the plugin trying to setting filetime for this
+content.
 
-   git diff $file            # compare staging area with working directory
-   git diff --cached $file   # compare HEAD with staged area
-   git diff HEAD $file       # compare HEAD with working directory
+Adding metadata `git_permalink` = False will prevent the plugin from adding permalink for this
+content.
 
-* How to get commits related to a file?
-
-.. code-block:: sh
-
-   git status $file
-
-With ``gitpython`` package, it's easier to parse committed time:
-
-.. code-block:: python
+FAQ
+---
 
-   repo = Git.repo('/path/to/repo')
-   commits = repo.commits(path='path/to/file')
-   commits[-1].committed_date    # oldest commit time
-   commits[0].committed_date     # latest commit time
+### Q. I get a GitCommandError: 'git rev-list ...' when I run the plugin. What's up?
+Be sure to use the correct gitpython module for your distros git binary.
+Using the GIT_HISTORY_FOLLOWS_RENAME option to True may also make your problem go away as it uses
+a different method to find commits.

+ 1 - 1
filetime_from_git/__init__.py

@@ -1 +1 @@
-from .filetime_from_git import *
+from .registration import *

+ 108 - 0
filetime_from_git/actions.py

@@ -0,0 +1,108 @@
+# -*- coding: utf-8 -*-
+import base64
+import hashlib
+import os
+import logging
+from pelican.utils import strftime
+from .utils import string_to_bool
+from .utils import datetime_from_timestamp
+from .registration import content_git_object_init
+
+
+logger = logging.getLogger(__name__)
+
+
+@content_git_object_init.connect
+def filetime_from_git(content, git_content):
+    '''
+    Update modification and creation times from git
+    '''
+    if not content.settings['GIT_FILETIME_FROM_GIT']:
+        # Disabled for everything
+        return
+
+    if not string_to_bool(content.metadata.get('gittime', 'yes')):
+        # Disable for this content
+        return
+
+    path = content.source_path
+    fs_creation_time = datetime_from_timestamp(os.stat(path).st_ctime, content)
+    fs_modified_time = datetime_from_timestamp(os.stat(path).st_mtime, content)
+
+    # 1. file is not managed by git
+    #    date: fs time
+    # 2. file is staged, but has no commits
+    #    date: fs time
+    # 3. file is managed, and clean
+    #    date: first commit time, update: last commit time or None
+    # 4. file is managed, but dirty
+    #    date: first commit time, update: fs time
+    if git_content.is_managed_by_git():
+        if git_content.is_committed():
+            content.date = git_content.get_oldest_commit_date()
+
+            if git_content.is_modified():
+                content.modified = fs_modified_time
+            else:
+                content.modified = git_content.get_newest_commit_date()
+        else:
+            # File isn't committed
+            content.date = fs_creation_time
+    else:
+        # file is not managed by git
+        content.date = fs_creation_time
+
+    # Clean up content attributes
+    if not hasattr(content, 'modified'):
+        content.modified = content.date
+
+    if hasattr(content, 'date'):
+        content.locale_date = strftime(content.date, content.date_format)
+
+    if hasattr(content, 'modified'):
+        content.locale_modified = strftime(
+            content.modified, content.date_format)
+
+
+@content_git_object_init.connect
+def git_sha_metadata(content, git_content):
+    '''
+    Add sha metadata to content
+    '''
+    if not content.settings['GIT_SHA_METADATA']:
+        return
+
+    if not git_content.is_committed():
+        return
+
+    content.metadata['gitsha_newest'] = str(git_content.get_newest_commit())
+    content.metadata['gitsha_oldest'] = str(git_content.get_oldest_commit())
+
+
+@content_git_object_init.connect
+def git_permalink(content, git_content):
+    '''
+    Add git based permalink id to content metadata
+    '''
+    if not content.settings['GIT_GENERATE_PERMALINK']:
+        return
+
+    if not string_to_bool(content.metadata.get('git_permalink', 'yes')):
+        # Disable for this content
+        return
+
+    if not git_content.is_committed():
+        return
+
+    permalink_hash = hashlib.sha1()
+    permalink_hash.update(str(git_content.get_oldest_commit()))
+    permalink_hash.update(str(git_content.get_oldest_filename()))
+    git_permalink_id = base64.urlsafe_b64encode(permalink_hash.digest())
+    permalink_id_metadata_key = content.settings['PERMALINK_ID_METADATA_KEY']
+
+    if permalink_id_metadata_key in content.metadata:
+        content.metadata[permalink_id_metadata_key] = (
+            ','.join((
+                content.metadata[permalink_id_metadata_key], git_permalink_id)))
+    else:
+        content.metadata[permalink_id_metadata_key] = git_permalink_id

+ 99 - 0
filetime_from_git/content_adapter.py

@@ -0,0 +1,99 @@
+# -*- coding: utf-8 -*-
+"""
+Wraps a content object to provide some git information
+"""
+import logging
+from pelican.utils import memoized
+from .git_wrapper import git_wrapper
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+
+class GitContentAdapter(object):
+    """
+    Wraps a content object to provide some git information
+    """
+    def __init__(self, content):
+        self.content = content
+        self.git = git_wrapper('.')
+        self.tz_name = content.settings.get('TIMEZONE', None)
+        self.follow = content.settings['GIT_HISTORY_FOLLOWS_RENAME']
+
+    @memoized
+    def is_committed(self):
+        '''
+        Is committed
+        '''
+        return len(self.get_commits()) > 0
+
+    @memoized
+    def is_modified(self):
+        '''
+        Has content been modified since last commit
+        '''
+        return self.git.is_file_modified(self.content.source_path)
+
+    @memoized
+    def is_managed_by_git(self):
+        '''
+        Is content stored in a file managed by git
+        '''
+        return self.git.is_file_managed_by_git(self.content.source_path)
+
+    @memoized
+    def get_commits(self):
+        '''
+        Get all commits involving this filename
+        :returns: List of commits newest to oldest
+        '''
+        if not self.is_managed_by_git():
+            return []
+        return self.git.get_commits(self.content.source_path, self.follow)
+
+    @memoized
+    def get_oldest_commit(self):
+        '''
+        Get oldest commit involving this file
+
+        :returns: Oldest commit
+        '''
+        return self.git.get_commits(self.content.source_path, self.follow)[-1]
+
+    @memoized
+    def get_newest_commit(self):
+        '''
+        Get oldest commit involving this file
+
+        :returns: Newest commit
+        '''
+        return self.git.get_commits(self.content.source_path, follow=False)[0]
+
+    @memoized
+    def get_oldest_filename(self):
+        '''
+        Get the original filename of this content. Implies follow
+        '''
+        commit_and_name_iter = self.git.get_commits_and_names_iter(
+            self.content.source_path)
+        _commit, name = commit_and_name_iter.next()
+        return name
+
+    @memoized
+    def get_oldest_commit_date(self):
+        '''
+        Get datetime of oldest commit involving this file
+
+        :returns: Datetime of oldest commit
+        '''
+        oldest_commit = self.get_oldest_commit()
+        return self.git.get_commit_date(oldest_commit, self.tz_name)
+
+    @memoized
+    def get_newest_commit_date(self):
+        '''
+        Get datetime of newest commit involving this file
+
+        :returns: Datetime of newest commit
+        '''
+        newest_commit = self.get_newest_commit()
+        return self.git.get_commit_date(newest_commit, self.tz_name)

+ 0 - 80
filetime_from_git/filetime_from_git.py

@@ -1,80 +0,0 @@
-# -*- coding: utf-8 -*-
-
-import os
-from pelican import signals, contents
-from pelican.utils import strftime, set_date_tzinfo
-from datetime import datetime
-from .git_wrapper import git_wrapper
-
-
-def datetime_from_timestamp(timestamp, content):
-    """
-    Helper function to add timezone information to datetime,
-    so that datetime is comparable to other datetime objects in recent versions
-    that now also have timezone information.
-    """
-    return set_date_tzinfo(
-        datetime.fromtimestamp(timestamp),
-        tz_name=content.settings.get('TIMEZONE', None))
-
-
-def filetime_from_git(content):
-    if isinstance(content, contents.Static):
-        return
-
-    git = git_wrapper('.')
-    tz_name = content.settings.get('TIMEZONE', None)
-
-    gittime = content.metadata.get('gittime', 'yes').lower()
-    gittime = gittime.replace("false", "no").replace("off", "no")
-    if gittime == "no":
-        return
-
-    # 1. file is not managed by git
-    #    date: fs time
-    # 2. file is staged, but has no commits
-    #    date: fs time
-    # 3. file is managed, and clean
-    #    date: first commit time, update: last commit time or None
-    # 4. file is managed, but dirty
-    #    date: first commit time, update: fs time
-    path = content.source_path
-    if git.is_file_managed_by_git(path):
-        commits = git.get_commits(
-            path, follow=content.settings.get('GIT_FILETIME_FOLLOW', False))
-
-        if len(commits) == 0:
-            # never commited, but staged
-            content.date = datetime_from_timestamp(
-                os.stat(path).st_ctime, content)
-        else:
-            # has commited
-            content.date = git.get_commit_date(
-                commits[-1], tz_name)
-
-            if git.is_file_modified(path):
-                # file has changed
-                content.modified = datetime_from_timestamp(
-                    os.stat(path).st_ctime, content)
-            else:
-                # file is not changed
-                if len(commits) > 1:
-                    content.modified = git.get_commit_date(
-                        commits[0], tz_name)
-    else:
-        # file is not managed by git
-        content.date = datetime_from_timestamp(os.stat(path).st_ctime, content)
-
-    if not hasattr(content, 'modified'):
-        content.modified = content.date
-
-    if hasattr(content, 'date'):
-        content.locale_date = strftime(content.date, content.date_format)
-
-    if hasattr(content, 'modified'):
-        content.locale_modified = strftime(
-            content.modified, content.date_format)
-
-
-def register():
-    signals.content_object_init.connect(filetime_from_git)

+ 29 - 5
filetime_from_git/git_wrapper.py

@@ -2,9 +2,10 @@
 """
 Wrap python git interface for compatibility with older/newer version
 """
+import itertools
 import logging
 import os
-from time import mktime, altzone
+from time import mktime
 from datetime import datetime
 from pelican.utils import set_date_tzinfo
 from git import Git, Repo
@@ -12,6 +13,15 @@ from git import Git, Repo
 DEV_LOGGER = logging.getLogger(__name__)
 
 
+def grouper(iterable, n, fillvalue=None):
+    '''
+    Collect data into fixed-length chunks or blocks
+    '''
+    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
+    args = [iter(iterable)] * n
+    return itertools.izip_longest(fillvalue=fillvalue, *args)
+
+
 class _GitWrapperCommon(object):
     '''
     Wrap git module to provide a more stable interface across versions
@@ -51,9 +61,23 @@ class _GitWrapperCommon(object):
         :param path: Path which we will find commits for
         :returns: Sequence of commit objects. Newest to oldest
         '''
-        commit_shas = self.git.log(
-            '--pretty=%H', '--follow', '--', path).splitlines()
-        return [self.repo.commit(shas) for shas in commit_shas]
+        return [
+            commit for commit, _ in self.get_commits_and_names_iter(
+                path)]
+
+    def get_commits_and_names_iter(self, path):
+        '''
+        Get all commits including a given path following renames
+        '''
+        log_result = self.git.log(
+            '--pretty=%H',
+            '--follow',
+            '--name-only',
+            '--',
+            path).splitlines()
+
+        for commit_sha, _, filename in grouper(log_result, 3):
+            yield self.repo.commit(commit_sha), filename
 
     def get_commits(self, path, follow=False):
         '''
@@ -87,7 +111,7 @@ class _GitWrapperLegacy(_GitWrapperCommon):
         Get datetime of commit comitted_date
         '''
         return set_date_tzinfo(
-            datetime.fromtimestamp(mktime(commit.committed_date) - altzone),
+            datetime.fromtimestamp(mktime(commit.committed_date)),
             tz_name=tz_name)
 
 

+ 30 - 0
filetime_from_git/registration.py

@@ -0,0 +1,30 @@
+# -*- coding: utf-8 -*-
+"""
+Handle registration and setup for plugin
+"""
+import logging
+from blinker import signal
+from .content_adapter import GitContentAdapter
+from pelican import signals
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+content_git_object_init = signal('content_git_object_init')
+
+def send_content_git_object_init(content):
+    content_git_object_init.send(content, git_content=GitContentAdapter(content))
+
+
+def setup_option_defaults(pelican_inst):
+    pelican_inst.settings.setdefault('GIT_FILETIME_FROM_GIT', True)
+    pelican_inst.settings.setdefault('GIT_HISTORY_FOLLOWS_RENAME', True)
+    pelican_inst.settings.setdefault('GIT_SHA_METADATA', True)
+    pelican_inst.settings.setdefault('GIT_GENERATE_PERMALINK', False)
+
+
+def register():
+    signals.content_object_init.connect(send_content_git_object_init)
+    signals.initialized.connect(setup_option_defaults)
+
+    # Import actions
+    from . import actions

+ 39 - 0
filetime_from_git/utils.py

@@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+"""
+Utility functions
+"""
+from datetime import datetime
+import logging
+from pelican.utils import set_date_tzinfo
+
+DEV_LOGGER = logging.getLogger(__name__)
+
+
+STRING_BOOLS = {
+    'yes': True,
+    'no': False,
+    'true': True,
+    'false': False,
+    '0': False,
+    '1': True,
+    'on': True,
+    'off': False,
+}
+
+
+def string_to_bool(string):
+    '''
+    Convert a string to a bool based
+    '''
+    return STRING_BOOLS[string.strip().lower()]
+
+
+def datetime_from_timestamp(timestamp, content):
+    """
+    Helper function to add timezone information to datetime,
+    so that datetime is comparable to other datetime objects in recent versions
+    that now also have timezone information.
+    """
+    return set_date_tzinfo(
+        datetime.fromtimestamp(timestamp),
+        tz_name=content.settings.get('TIMEZONE', None))

+ 25 - 0
permalinks/README.md

@@ -0,0 +1,25 @@
+permalink
+=========
+
+This plugin enables a kind of permalink which can be used to refer to a piece
+of content which is resistant to the file being moved or renamed.
+
+It does this by creating additional output html in `PERMALINK_PATH`
+(default permalinks/) which include redirect code to point user at original
+page.
+
+To work each page has to have an additional piece of metadata with the key
+`permalink_id` (configurable with `PERMALINK_ID_METADATA_KEY`
+which should remain static even through renames and should also
+be unique on the site.
+
+This can be generated automatically with the filetime_from_git module and
+the `GIT_FILETIME_GENERATE_PERMALINK` option. 
+This should always be used with `GIT_FILETIME_FOLLOW` to ensure this
+persists across renames.
+
+
+Hacky redirects
+---------------
+To make this work with things like github.io I'm forced to use HTML and
+Javascript redirects rather than HTTP redirects which is obviously suboptimal.

+ 1 - 0
permalinks/__init__.py

@@ -0,0 +1 @@
+from .permalinks import register

+ 149 - 0
permalinks/permalinks.py

@@ -0,0 +1,149 @@
+# -*- coding: utf-8 -*-
+"""
+This plugin enables a kind of permalink which can be used to refer to a piece
+of content which is resistant to the file being moved or renamed.
+"""
+import logging
+import itertools
+import os
+import os.path
+from pelican import signals
+from pelican.generators import Generator
+from pelican.utils import mkdir_p
+from pelican.utils import clean_output_dir
+
+logger = logging.getLogger(__name__)
+
+
+def article_url(content):
+    '''
+    Get the URL for an item of content
+    '''
+    return '{content.settings[SITEURL]}/{content.url}'.format(
+        content=content).encode('utf-8')
+
+
+REDIRECT_STRING = '''
+<!DOCTYPE HTML>
+<html lang="en-US">
+    <head>
+        <meta charset="UTF-8">
+        <meta http-equiv="refresh" content="0;url={url}">
+        <script type="text/javascript">
+            window.location.href = "{url}"
+        </script>
+        <title>Page Redirection to {title}</title>
+    </head>
+    <body>
+        If you are not redirected automatically, follow the
+        <a href='{url}'>link to {title}</a>
+    </body>
+</html>
+'''
+
+
+class PermalinkGenerator(Generator):
+    '''
+    Generate a redirect page for every item of content with a
+    permalink_id metadata
+    '''
+    def generate_context(self):
+        '''
+        Setup context
+        '''
+        self.permalink_output_path = os.path.join(
+            self.output_path, self.settings['PERMALINK_PATH'])
+        self.permalink_id_metadata_key = self.settings['PERMALINK_ID_METADATA_KEY']
+
+    def generate_output(self, writer=None):
+        '''
+        Generate redirect files
+        '''
+        logger.info(
+            'Generating permalink files in %r', self.permalink_output_path)
+
+        clean_output_dir(self.permalink_output_path, [])
+        mkdir_p(self.permalink_output_path)
+        for content in itertools.chain(
+                self.context['articles'], self.context['pages']):
+
+            for permalink_id in content.get_permalink_ids_iter():
+                permalink_path = os.path.join(
+                    self.permalink_output_path, permalink_id) + '.html'
+
+                redirect_string = REDIRECT_STRING.format(
+                    url=article_url(content),
+                    title=content.title)
+                open(permalink_path, 'w').write(redirect_string)
+
+
+def get_permalink_ids_iter(self):
+    '''
+    Method to get permalink ids from content. To be bound to the class last thing
+    '''
+    permalink_id_key = self.settings['PERMALINK_ID_METADATA_KEY']
+    permalink_ids_raw = self.metadata.get(permalink_id_key, '')
+
+    for permalink_id in permalink_ids_raw.split(','):
+        if permalink_id:
+            yield permalink_id.strip()
+
+
+def get_permalink_ids(self):
+    '''
+    Method to get permalink ids from content. To be bound to the class last thing
+    '''
+    return list(self.get_permalink_ids_iter())
+
+def get_permalink_path(self):
+    """Get just path component of permalink."""
+    try:
+        first_permalink_id = self.get_permalink_ids_iter().next()
+    except StopIteration:
+        return None
+
+    return '/{settings[PERMALINK_PATH]}/{first_permalink}'.format(
+        settings=self.settings, first_permalink=first_permalink_id)
+
+
+def get_permalink_url(self):
+    '''
+    Get a permalink URL
+    '''
+    return "/".join((self.settings['SITEURL'], self.get_permalink_path()))
+
+
+PERMALINK_METHODS = (
+    get_permalink_ids_iter,
+    get_permalink_ids,
+    get_permalink_url,
+    get_permalink_path,
+)
+
+
+def add_permalink_methods(content_inst):
+    '''
+    Add permalink methods to object
+    '''
+    for permalink_method in PERMALINK_METHODS:
+        setattr(
+            content_inst,
+            permalink_method.__name__,
+            permalink_method.__get__(content_inst, content_inst.__class__))
+
+def add_permalink_option_defaults(pelicon_inst):
+    '''
+    Add perlican defaults
+    '''
+    pelicon_inst.settings.setdefault('PERMALINK_PATH', 'permalinks')
+    pelicon_inst.settings.setdefault('PERMALINK_ID_METADATA_KEY', 'permalink_id')
+
+
+def get_generators(_pelican_object):
+    return PermalinkGenerator
+
+
+def register():
+    signals.get_generators.connect(get_generators)
+    signals.content_object_init.connect(add_permalink_methods)
+    signals.initialized.connect(add_permalink_option_defaults)