Mind Flow Productions

Systems & Productivity

The BeautifulSoup function prettify() will format your HTML nicely for human readability. The problem is that prettify uses only a single space for indenting which still leaves the HTML a little hard to read. First I tried to see if the output formatter parameter would help, but there is no internal information on how deep the indent should be because the strings are not indented yet.

Formatting HTML for Humans

def indent(str):
    print('LS:', len(str) - len(str.lstrip(' ')), str)
    input('---')
    return str

tree_soup = BeautifulSoup(raw_html)
pretty_html = tree_soup.prettify(formatter=indent)

So I converted the prettified html to a stringIO object so I could parse each line. Then I counted the leading spaces on each line and did some simple math to get a four space indent.

tree_soup = BeautifulSoup(raw_html)
pretty_html = tree_soup.prettify()

htmlio = StringIO(pretty_html)
beautiful_html = ''
for line in htmlio.readlines():
    count = len(line) - len(line.lstrip(' '))      # count leading spaces
    beautiful_html += (count * 4 - count) * ' ' + line

The ‘beautiful_html’ variable now contains the improved prettified HTML code!