Python Logging: The Missing Guide
Apr 25, 2020Python loggers can be configured a million ways. That’s too much.
We can ease the pain by sticking to consistent practices. Outlined below are the most common issues that I see projects run into.
Naming
Like modules, loggers are hierarchical. Their names should follow the import path of where they’re running. Loggers that stray from this are hard to work with. Use __name__
everywhere to get this for free.
Follow this guideline for both libraries and CLI entry-points. Technically the name of __main__
’s logger doesn’t matter, but inconsistencies tend to spread.
# Module: foo/bar/baz.py
# GOOD: These are equivalent. Use the first one.
LOG = logging.getLogger(__name__)
LOG = logging.getLogger('foo.bar.baz')
# BAD: Cannot predict based on import path.
LOG = logging.getLogger('my_foo_baz')
# BAD: This is the root logger of the hierarchy.
# : Libraries that use this cannot have their
# : logs filtered by level without affecting
# : everything else.
LOG = logging.getLogger()
LOG = logging.getLogger('')
LOG = logging.getLogger(None)
As a rule of thumb, just don’t use the root logger. Attach handlers, sure. But don’t use.
Hierarchy: Handlers & Filters
As a refresher, these settings are available to loggers:
- Min Level
- Handlers
- Filters
Log names, split by period, make up the hierarchy. When logs are issued, they bubble up, visiting each handler, unless stopped by a filter. Or the minimum level. (Details)
This makes it handy to do things like:
- Exclude noisy logs from an import
- Include helpful logs from an import
- Format and print all program logs to
stderr
Let’s demonstrate how this works:
import logging
# Set up silly handler that prints a name it's given when a log arrives
class DumbHandler(logging.Handler):
def __init__(self, title: str, *args, **kwargs):
self.title = title
super().__init__(*args, **kwargs)
def emit(self, record: logging.LogRecord):
print(f"{self.title}: {record.getMessage()}")
# True
logging.root == logging.getLogger('')
logging.root.setLevel(logging.DEBUG)
logging.root.addHandler(DumbHandler('root'))
logging.getLogger('project.foo').addHandler(DumbHandler('project.foo'))
logging.root.info('test')
# root: test
logging.getLogger('project.foo').info('test')
# project.foo: test
# root: test
logging.getLogger('project.foo.bar').setLevel(logging.WARNING)
logging.getLogger('project.foo.bar').info('test')
# <nothing>
Avoid the self.logger
Pattern
All calls to logging.getLogger(name)
will return the same logger instance. Storing in an instance variable pollutes the logging call-site — which should be frequent!
# GOOD: At module level after imports
LOG = logging.getLogger(__name__)
def foo():
LOG.info('...')
# BAD: Adds 8 more chars before message
class Foo:
def __init__(self):
self.logger = logging.getLogger(__name__)
def bar(self):
self.logger.info('...')
# BAD
def foo(logger: logging.Logger):
logger.info('...')
If you need this to add unique handlers for contextual info, keep reading.
Adding Context
This can be done in too many ways. For simplicity, I’ll show one method: my favorite.
It optimizes for:
- Zero special logger instances
- Fewest number of Handlers (1)
- No changes at logging call-site
.
"""
Use `contextvars` to share metadata with your log handler.
This works between imports, threads, and is `asyncio`-friendly.
The main gotcha is putting `copy_context().run` at the
right entry-point(s). And setting the variables soon
enough.
"""
import logging
import time
from contextvars import ContextVar, copy_context
from random import randint
LOG = logging.getLogger(__name__)
ReqID = ContextVar('req_id')
class ContextHandler(logging.Handler):
def emit(self, record: logging.LogRecord):
msg = record.getMessage()
if ReqID.get(None) is not None:
msg = f"[{ReqID.get()}] {msg}"
# Don't print.
# Combine with logging over the network or use
# a custom formatter attached to StreamHandler
print(msg)
def do_work():
LOG.info("starting request...")
ReqID.set(randint(100, 500))
LOG.info("request finished")
# If you want to capture more logs from the program, attach handler
# at a higher level in the hierarchy.
#
# Perhaps even logging.root.
LOG.addHandler(ContextHandler())
LOG.setLevel(logging.INFO)
LOG.info("starting program")
copy_context().run(do_work)
copy_context().run(do_work)
copy_context().run(do_work)
# starting program
#
# starting request...
# [484] request finished
#
# starting request...
# [105] request finished
#
# starting request...
# [381] request finished
For more information on contextvars
, see: https://docs.python.org/3/library/contextvars.html
Logging Over the Network
Use a queue with a flusher thread.
For every log issued, all handlers are run serially in the same thread as the call-site. Uploading messages to a database in-line turns microseconds into 20ms, 100ms, 500ms, and so on.
A project I worked on did this once. Jobs that landed in Europe took significantly longer to run. We figured out it was due to 500-1000ms round-trips for each log message.
import logging
import threading
from queue import Queue
LOG = logging.getLogger(__name__)
class DBHandler(logging.Handler):
def __init__(self, q: Queue, *args, **kwargs):
self.q = q
super().__init__(*args, **kwargs)
def emit(self, record: logging.LogRecord):
# Format, convert log record to row, and send to queue
# for insertion
self.format(record)
row = make_row(record)
self.q.put(row)
def make_row(record: logging.LogRecord) -> "Row":
# Use more columns than this
return {"msg": record.message}
def flusher(q: Queue):
# Upload row instead of print. Perhaps flush every n-seconds
# w/ buffer to have an upper-bound on inserts.
while True:
row = q.get()
print("woohoo, recived:", row["msg"])
q = Queue()
threading.Thread(target=flusher, daemon=True, args=(q,)).start()
LOG.setLevel(logging.INFO)
LOG.addHandler(DBHandler(q))
LOG.info("starting program")
LOG.info("doing some stuff")
LOG.info("mighty cool")
# woohoo, recived: starting program
# woohoo, recived: doing some stuff
# woohoo, recived: mighty cool
For asyncio
in the flusher thread, use janus.Queue
. It’s both asyncio
-compatible and thread-safe: https://gist.github.com/coxley/14a1e3d98898132fa89c4e025cfa840f
Duplicate Logs in Terminal
When logs print more than once, there are too many StreamHandlers
attached. Use logging.removeHandler
for a deterministic starting point.
for h in logging.root.handlers:
if (
isinstance(h, logging.StreamHandler) and
getattr(h, "stream", None) in (sys.stderr, sys.stdout)
):
logging.root.removeHandler(h)
# Add the handler (and formatter) you want
Don’t attach StreamHandler
within libraries. Configure the root logger from the program entry-point.