瀏覽代碼

Explicitly set the html parser to make sure no extra tags get added.

BeautifulSoup supports multiple html parsers. Some of those parsers
try to make the html valid by adding/removing tags[1]. This can lead
to useless html, head & body tags in the final document. By explicitly
setting the parser to ’html.parser’ this behaviour can be avoided.

[1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers
bas smit 11 年之前
父節點
當前提交
8d0e643637
共有 1 個文件被更改,包括 1 次插入1 次删除
  1. 1 1
      extract_toc/extract_toc.py

+ 1 - 1
extract_toc/extract_toc.py

@@ -14,7 +14,7 @@ from pelican import signals, readers, contents
 def extract_toc(content):
     if isinstance(content, contents.Static):
         return
-    soup = BeautifulSoup(content._content)
+    soup = BeautifulSoup(content._content,'html.parser')
     filename = content.source_path
     extension = path.splitext(filename)[1][1:]
     toc = ''