Procházet zdrojové kódy

Explicitly set the html parser to make sure no extra tags get added.

BeautifulSoup supports multiple html parsers. Some of those parsers
try to make the html valid by adding/removing tags[1]. This can lead
to useless html, head & body tags in the final document. By explicitly
setting the parser to ’html.parser’ this behaviour can be avoided.

[1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/#differences-between-parsers
bas smit před 11 roky
rodič
revize
8d0e643637
1 změnil soubory, kde provedl 1 přidání a 1 odebrání
  1. 1 1
      extract_toc/extract_toc.py

+ 1 - 1
extract_toc/extract_toc.py

@@ -14,7 +14,7 @@ from pelican import signals, readers, contents
 def extract_toc(content):
     if isinstance(content, contents.Static):
         return
-    soup = BeautifulSoup(content._content)
+    soup = BeautifulSoup(content._content,'html.parser')
     filename = content.source_path
     extension = path.splitext(filename)[1][1:]
     toc = ''