Dynamic Variant Analysis with Python
In this post I will present a way to find performance issues withing Python code with the help of pytest by doing variant analysis.
Variant analysis is the process of using a known vulnerability as a seed to find similar problems in your code.
After checking some of the work Semmle is doing regarding variant analysis I started wondering if there was a way to use the same technique for non-security related problems in code, even more, I started wondering if there was any easier way to accomplish the same goal without having to parse the python code, generating an AST and then running the analysis.
That’s when I remembered the monkeypatch example for preventing remote operations for the requests library on the pytest monkeypatch docs and realized that I could perform some sort of dinamic variant analysis by instrumenting functions, classes or methods with a custom implementation via monkeypatching. This way I could add additional security, performance or other type of checks to the original functions.
Doing dinamic variant analysis, has a nice benefit over static variant analysis, and that is that the number of false positives found will be close to 0.
In order to do this dynamic analysis feasible we would need a way to automatically exercise all the potentially vulnerable places in the code. This might look like a problem at first sight, but is actually not for most projects, since these days most projects count with test suites that should execute a good percentage of the codebase, even more they usually rely on CI system to run this test suites effortlessly. If the test coverage is high, doing analysis in this way should not be a problem at all.
The goal
I’m going to focus on performance related issues here, in particular finding performance issues on Django applications. Although this example will be targeting Django, this technique could be used by any python project with pytest support.
The concrete bug I will be chasing is calls to the django’s length templatetag with a queryset. As can be seen on the django documentation, calling length on a queryset triggers a full evaluation of the queryset which might be a performance hit when the queryset is big enough. This kind of bug might go unnotice because:
- The database used for local development is usually a subset of the one used on production, so evaluating a small queryset is fast enough to go unseen.
- Most projects at an early stage will probably have a small database, but as time passes by and the database increases its size, the performance will be slower and slower.
Show me the code
The buggy template
The view calling the template
The test triggering the call to the templatetag
The instrumentation
Lets explain what this little piece of code is doing. First of all I’m defining a pytest fixture template_length_check
with autouse=True
this will cause the fixture to be called before the execution of each test, which means that each test will be run
with our custom implementation of the length templatetag, queryset_check_length
What do we want our custom implementation to do?
We want it to raise an exception in case the argument used to call the length templatetag is of type QuerySet, otherwise it will call the default length templatetag implementation. Raising an exception will cause the test to fail, so we will be able to find all the buggy templates by checking the failing tests.
Keep in mind that we are doing this only for finding bugs, having this kind of fixtures on your test suite is discouraged, since we are modifying the original behaviour of the function.
Results
When running the test suite, this is how the error looks like:
The full code example can be found here