Преглед на файлове

Explicitly set the html parser to make sure no extra tags get added.

BeautifulSoup supports multiple html parsers. Some of those parsers
try to make the html valid by adding/removing tags[1]. This can lead
to useless html, head & body tags in the final document. By explicitly
setting the parser to ’html.parser’ this behaviour can be avoided.

[1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers
bas smit преди 11 години
родител
ревизия
8d0e643637
променени са 1 файла, в които са добавени 1 реда и са изтрити 1 реда
  1. 1 1
      extract_toc/extract_toc.py

+ 1 - 1
extract_toc/extract_toc.py

@@ -14,7 +14,7 @@ from pelican import signals, readers, contents
 def extract_toc(content):
     if isinstance(content, contents.Static):
         return
-    soup = BeautifulSoup(content._content)
+    soup = BeautifulSoup(content._content,'html.parser')
     filename = content.source_path
     extension = path.splitext(filename)[1][1:]
     toc = ''