Explorar el Código

[Pelican Comment System] add Blogger export script

MinchinWeb hace 7 años
padre
commit
3d153ef57a

+ 4 - 0
pelican_comment_system/CHANGELOG.md

@@ -2,6 +2,10 @@
 All notable changes to this project will be documented in this file.
 This project adheres to [Semantic Versioning](http://semver.org/).
 
+## 1.3.0 - 2017-01-10
+### Added
+- add [blogger_comment_export.py](import/blogger_comment_export.py) script to export comments from Blogger XML export and [associated documentation](docs/import.md)
+
 ## 1.2.2 – 2016-12-19
 ### Fixed
 - Correct jQuery expression in cancelReply method  [PR #820](https://github.com/getpelican/pelican-plugins/pull/820)

+ 1 - 0
pelican_comment_system/Readme.md

@@ -24,6 +24,7 @@ Bernhard Scheirle  | <http://bernhard.scheirle.de> | <https://github.com/Scheirl
 
  - [Quickstart Guide](doc/quickstart.md)
  - [Installation and basic usage](doc/installation.md)
+ - [Import existing comments](docs/import.md)
  - [Avatars and identicons](doc/avatars.md)
  - [Comment Atom feed](doc/feed.md)
  

+ 47 - 0
pelican_comment_system/doc/import.md

@@ -0,0 +1,47 @@
+# Importing Commnets
+
+**Note**: Contributrions to this section are welcomed!
+
+When moving to Pelican and the Pelican Commnet System, it may be desireable to move over your comments as well.
+
+The scritps to support this are found in the `import` directory.
+
+## Blogger
+
+Blogger is good in that it will give you an export of everything, but the bad news is it's one giant XML file. XML is great if you're a computer, but a bit of a pain if you're a human. 
+
+The code I used to export my comments from Blogger is found at [blogger_comment_export.py](https://github.com/MinchinWeb/pelican-plugins/tree/master/pelican_comment_system/import/blogger_comment_export.py).
+
+To use it
+yourself, you will need to first adjust the constants at the beginning of the 
+script (lines 26-33) to point to your Blogger XML export and where you want
+the comments to be exported to. You will also need to install `untangle`
+(available through pip -- `pip install untangle`).
+
+Comments will be exported into folders matching
+the Blogger slug of the post. The email for all authors will be `noreply@blogger.com`. The other file created will be `authors.txt`
+which lists the various comment authors, and a link to the profile
+picture used on Blogger. These pictures will need to be manually downloaded
+and then confiugred using the `PELICAN_COMMENT_SYSTEM_AUTHORS` setting.
+In my case, that looked like this:
+
+```python
+# in pelicanconf.py
+PELICAN_COMMENT_SYSTEM_AUTHORS = {
+    ('PROTIK KHAN', 'noreply@blogger.com'): "images/authors/rabiul_karim.webp",
+    ('Matthew Hartzell', 'noreply@blogger.com'): "images/authors/matthew_hartzell.webp",
+    ('Jens-Peter Labus', 'noreply@blogger.com'): "images/authors/jens-peter_labus.png",
+    ('Bridget', 'noreply@blogger.com'): "images/authors/bridget.jpg",
+    ('melissaclee', 'noreply@blogger.com'): "images/authors/melissa_lee.jpg",
+    ('Melissa', 'noreply@blogger.com'): "images/authors/melissa_lee.jpg"
+}
+```
+
+The script was developed for Python 3.6, but should work on Python 3.4+
+without modification.
+
+For more information on this script on, you can read my
+[blog post](http://blog.minchin.ca/2016/12/blogger-comments-exported.html)
+where I introduced it.
+
+-- Wm. Minchin (@MinchinWeb), January 10, 2017

+ 165 - 0
pelican_comment_system/import/blogger_comment_export.py

@@ -0,0 +1,165 @@
+#! python3.6
+"""
+Export Comments from BLogger XML
+
+Takes in a Blogger export XML file and spits out each comment in a seperate
+file, such that can be used with the [Pelican Comment System]
+(https://bernhard.scheirle.de/posts/2014/March/29/static-comments-via-email/).
+
+May be simple to extend to export posts as well.
+
+For a more detailed desciption, read my blog post at
+    http://blog.minchin.ca/2016/12/blogger-comments-exported.html
+
+Author: Wm. Minchin -- minchinweb@gmail.com
+License: MIT
+Changes:
+
+ - 2016.12.29 -- initial release
+ - 2017.01.10 -- clean-up for addition in Pelican Comment System repo
+"""
+
+from pathlib import Path
+
+import untangle
+
+###############################################################################
+# Constants                                                                   #
+###############################################################################
+
+BLOGGER_EXPORT = r'c:\tmp\blog.xml'
+COMMENTS_DIR = 'comments'
+COMMENT_EXT = '.md'
+AUTHORS_FILENAME = 'authors.txt'
+
+###############################################################################
+# Main Code Body                                                              #
+###############################################################################
+
+authors_and_pics = []
+
+
+def main():
+    obj = untangle.parse(BLOGGER_EXPORT)
+
+    templates = 0
+    posts = 0
+    comments = 0
+    settings = 0
+    others = 0
+
+    for entry in obj.feed.entry:
+        try:
+            full_type = entry.category['term']
+        except TypeError:
+            # if a post is under multiple categories
+            for my_category in entry.category:
+                full_type = my_category['term']
+                # str.find() uses a return of `-1` to denote failure
+                if full_type.find('#') != -1:
+                    break
+            else:
+                others += 1
+
+        simple_type = full_type[full_type.find('#')+1:]
+
+        if 'settings' == simple_type:
+            settings += 1
+        elif 'post' == simple_type:
+            posts += 1
+            # process posts here
+        elif 'comment' == simple_type:
+            comments += 1
+            process_comment(entry, obj)
+        elif 'template' == simple_type:
+            templates += 1
+        else:
+            others += 1
+
+    export_authors()
+
+    print('''
+            {} template
+            {} posts (including drafts)
+            {} comments
+            {} settings
+            {} other entries'''.format(templates,
+                                       posts,
+                                       comments,
+                                       settings,
+                                       others))
+
+
+def process_comment(entry, obj):
+    # e.g. "tag:blogger.com,1999:blog-26967745.post-4115122471434984978"
+    comment_id = entry.id.cdata
+    # in ISO 8601 format, usable as is
+    comment_published = entry.published.cdata
+    comment_body = entry.content.cdata
+    comment_post_id = entry.thr_in_reply_to['ref']
+    comment_author = entry.author.name.cdata
+    comment_author_pic = entry.author.gd_image['src']
+    comment_author_email = entry.author.email.cdata
+
+    # add author and pic to global list
+    global authors_and_pics
+    authors_and_pics.append((comment_author, comment_author_pic))
+
+    # use this for a filename for the comment
+    # e.g. "4115122471434984978"
+    comment_short_id = comment_id[comment_id.find('post-')+5:]
+
+    comment_text = "date: {}\nauthor: {}\nemail: {}\n\n{}\n"\
+                        .format(comment_published,
+                                comment_author,
+                                comment_author_email,
+                                comment_body)
+
+    # article
+    for entry in obj.feed.entry:
+        entry_id = entry.id.cdata
+        if entry_id == comment_post_id:
+            article_entry = entry
+            break
+    else:
+        print("No matching article for comment", comment_id, comment_post_id)
+        # don't process comment further
+        return
+
+    # article slug
+    for link in article_entry.link:
+        if link['rel'] == 'alternate':
+            article_link = link['href']
+            break
+    else:
+        article_title = article_entry.title.cdata
+        print('Could not find slug for', article_title)
+        article_link = article_title.lower().replace(' ', '-')
+
+    article_slug = article_link[article_link.rfind('/')+1:
+                                                    article_link.find('.html')]
+
+    comment_filename = Path(COMMENTS_DIR).resolve()
+    # folder; if it doesn't exist, create it
+    comment_filename = comment_filename / article_slug
+    comment_filename.mkdir(parents=True, exist_ok=True)
+    # write the comment file
+    comment_filename = comment_filename / (comment_short_id + COMMENT_EXT)
+    comment_filename.write_text(comment_text)
+
+
+def export_authors():
+    to_export = set(authors_and_pics)
+    to_export = list(to_export)
+    to_export.sort()
+
+    str_export = ''
+    for i in to_export:
+        str_export += (i[0] + '\t\t' + i[1] + '\n')
+
+    authors_filename = Path(COMMENTS_DIR).resolve() / AUTHORS_FILENAME
+    authors_filename.write_text(str_export)
+
+
+if __name__ == "__main__":
+    main()

+ 3 - 0
pelican_comment_system/pelican_comment_system.py

@@ -23,6 +23,9 @@ from . comment import Comment
 from . import avatars
 
 
+__version__ = "1.3.0"
+
+
 _all_comments = []
 _pelican_writer = None
 _pelican_obj = None