This weekend a catastrophic bug in log4j2 was
disclosed,
leading to the potential for remote code execution on a huge number of
unpatched endpoints.
In this specific case, it turns out there was not really any safe way to use
the API. Initially it might appear that the issue was the treatment of an
apparently fixed format string as a place to put variable user-specified data,
but as it turns out it just recursively expands the log data forever, looking
for code to execute.
So perhaps the lesson here is nothing technical, just that we should remain
ready to patch, or that
we should pay the
maintainers.
Still, it’s worth considering that injection vulnerabilities of this type exist
pretty much everywhere, usually
in places where the supposed defense against getting catastrophically RCE’d is
to carefully remember that the string that you pass in isn’t that kind of
string.
While not containing anything nearly so pernicious as a place to put a URL that
lets you execute arbitrary attacker-controlled code, Python’s
logging
module does contain a fair amount of confusing indirection around its log
message. Sometimes — if you’re passing a non-zero number of *args — the
parts of the logging module will interpret msg as a format string; other
times it will interpret it as a static string. This is in some sense a
reasonable compromise; you can have format strings and defer formatting if you
want, but also log.warning(f”hi, {attacker_controlled_data}”) is fairly
safe by default. It’s still a somewhat muddled and difficult to document
situation.
Similarly, Twisted’s logging system does always treat its string
argument
as a format string, which is more consistent. However, it does let attackers
put garbage into the log wherever the developer might not have understood the
documentation.1
This is to say nothing of the elephant in the room here: SQL. Almost every SQL
API takes a bunch of strings, and the ones that make you declare an object in
advance (i.e. Java’s PreparedStatement) don’t mind at all if you create one
at runtime.
In the interest of advancing the state of the art just a little here, I’d like
to propose a pattern to encourage the idiomatic separation of user-entered
data (i.e. attacker-controlled payloads) from pre-registration of static,
sensitive data, whether it’s SQL queries, format strings, static HTML or
something else. One where copying and pasting examples won’t instantly subvert
the intended protection. What I suggest would look like this:
2
3
4
5
6
7
8
9
10
11
12
13
with sql_statements.declarations() as d:
create_table = d.declare(“create table foo (bar int, baz str)”)
save_foo = d.declare(“insert into foo values (?, ?)”)
load_by_bar = d.declare(“select * from foo where bar = :bar”)
# later, inside a function
con = sqlite3.connect(“:memory:”)
cur = con.cursor()
create_table.run(cur)
save_foo.run(cur, 3, “hello”)
save_foo.run(cur, 4, “goodbye”)
print((list(load_by_bar.run(cur, bar=3))))
The idea here is that sql_statements.declarations() detects which module it’s
in, and only lets you write those declarations once. Attempting to stick
that inside your function and create some ad-hoc formatted string should
immediately fail with a loud exception; copying this into the wrong part of
your code just won’t work, so you won’t have a chance to create an injection
vulnerability.
If this idea appeals to you, I’ve written an extremely basic prototype here on
github and uploaded it to PyPI
here.
I’m not dropping a 0day on you, there’s not a clear vulnerability here;
it only lets you draw data from explicitly-specified parameters into the
log. If you use it wrong, you just might get an “Unable to format event”
type error, which we’ll go out of our way to not raise back to you as an
exception. It just makes some ugly log messages. ↩