Your Django project’s startup time impacts how smooth it is to work with. Django has to restart your project every time you run a management command

Django: How to profile and improve startup time

submited by
Style Pass
2023-03-21 23:00:03

Your Django project’s startup time impacts how smooth it is to work with. Django has to restart your project every time you run a management command and when runserver reloads. This involves importing all your apps, and thus all the modules that they import.

As you add more code and dependencies to your project, you may notice its startup time increasing. Development can slow down to a crawl; reports of 10 second startup times are, sadly, not unusual.

In this post, we’ll cover tools for profiling startup time, and tactics for reducing it.How to profile startup timeHere are several tools for profiling startup time.-X importtime: Python’s import profilerPython 3.7 introduced the -X importtime option, a specialized profiler that times all imports. It prints a table to stderr, summarizing microsecond timing information for each imported module: $ python -X importtime -c 'import django' import time: self [us] | cumulative | imported package import time: 136 | 136 | _io import time: 20 | 20 | marshal import time: 256 | 256 | posix import time: 261 | 672 | _frozen_importlib_external ... import time: 499 | 6898 | subprocess import time: 309 | 309 | weakref import time: 60 | 60 | org import time: 10 | 70 | org.python import time: 7 | 77 | org.python.core import time: 187 | 572 | copy import time: 228 | 799 | django.utils.functional import time: 183 | 982 | django.utils.regex_helper import time: 340 | 12228 | django.utils.version import time: 206 | 12434 | django This table totals 72 rows, snipped with ... to keep the blog post readable.Read from the bottom-up to see the “import tree”:Running the code in the django module itself takes 206 microseconds (its self time).Cumulatively, all the imports required for django take 12434 microseconds.The last module that django imported was django.utils.version. This took 340 microseconds inside itself, and a cumulative 12228 microseconds in its imports.Continuing up, you can see other modules as they were imported: django.utils.regex_helper, django.utils.functional, copy, etc.importtime-waterfall: Make pretty charts from -X importtimeThe raw data from -X importtime is useful, but reading through a table of numbers doesn’t make it easy to spot the slow modules. This is where the importtime-waterfall tool by Anthony Sottile can help. It wraps python -X importtime to capture the data, and can output it as an HAR (HTTP Archive) file.The HAR format is designed for profiles of HTTP requests made by a website, but it neatly doubles up for containing module import times as fake “requests“. There are then many viewers you can use to read the data in a waterfall chart.To make a HAR file with importtime-waterfall, install it in your virtual environment, and then invoke it with --har and the module name to profile: $ importtime-waterfall --har django > django.har You can then visualize this file with one of many HAR viewer tools. The one linked in the docs is the HTTP Archive Viewer by Jan Odvarko. Drop the file directly onto its web page to view it as a waterfall chart:(Click to enlarge.)Each module has two timings, expanded on hover:“Waiting” time (purple): the time spent waiting for dependencies to import“Receiving” time (grey): the time spent executing the code in the module (“self” time)Modules with high “Receiving” time are normally the most susceptible to optimization.Beware: the timings are wrong by a factor of 1000. This is because python -X importtime records in microseconds, but HAR files only support integer milliseconds. Rather than lose precision, importtime-waterfall multiplies timings by 1000. Use the waterfall chart to compare relative values, rather than absolute ones.The example makes it clear that when running import django, most of the time is spent actually importing the subprocess module.Run importtime-waterfall on your projectimporttime-waterfall can only profile a module import, and not a full command, like ./manage.py runserver. So to profile a Django project, you’ll need to create a short module that initializes your project at import time. Let’s look at doing so with an example project that reproduces a performance problem I recently encountered.Below is a template. Put it in a file like it.py (standing for import time) next to your manage.py, and update the os.environ line to point to your settings file. import os import django from django.urls import resolve os . environ . setdefault ( "DJANGO_SETTINGS_MODULE" , "example.settings" ) django . setup () resolve ( "/" ) django.setup() initializes Django for your project, loads all apps and their models, and eventually calls all AppConfig.ready() methods. This is normally done automatically in manage.py and your ASGI/WSGI entrypoint.The resolve() call forces Django to load your full URLconf and all referred-to modules. Django loads URLs and views lazily, but you probably care about profiling their import time too, at least to improve runserver restart time.With the wrapper module in place, you can run importtime-waterfall like so: $ importtime-waterfall --har it > it.har The output HAR file will have a lot of entries. For the example project with one app, I got 503 imported modules; on a large client project, I found 4403.Here’s a subsection of the example project’s profile:(Click to enlarge.)This screenshot shows example.views importing distutils.util, and that takes about 26 milliseconds (26 seconds / 1000). distutils is suspect because it’s used for creating Python packages, and rarely needed at runtime. (It’s also deprecated and slated for removal in Python 3.12.)Let’s take a look at what example.views imports from distutils: from distutils.util import strtobool def index ( request ): show_referral_banner = strtobool ( request . GET . get ( "referred" , "false" )) ... Ah, only the strtobool() function. This casts strings like "True" or "0" to booleans. The function is not really related to packaging, but simply an internal convenience function, that happens to do something required in this view.Imagine that after examining the project’s code, you determine that the referred query parameter will only ever be "true" or "false". So you don’t need the complexity of strtobool(), and all the modules it drags in with it. Instead, you can directly compare with "true": def index ( request ): show_referral_banner = request . GET . get ( "referred" , "false" ) == "true" ... After removing the distutils import and re-profiling, the example project drops from 503 to 409 modules, with a 24% / 26 millisecond time saving. Not bad.cProfile: More detailed profilingIt’s often not so easy to spot why an import is slow. Modules with a high “self” time have module-level code that takes a while, but it may be unclear what’s taking time. To determine what’s slow, you can use a “traditional” profiler that captures the runtime of each function call.A simple but effective combo is Python’s built-in profiler module cProfile, along with the visualization conversion tool gprof2dot. cProfile times each function execution, and gprof2dot outputs a graph of time spent per function, with colourization of the “hot paths”. Here’s a quick run-through of using those.First, profile the problematic module with cProfile’s -o option to save the output file, and -m option to select the module: $ python -m cProfile -o example_views.prof -m example.views (Using the slow version of example.views that still imports distutils.)Then install the yelp fork of gprof2dot (which runs on new Pythons), and pipe its output into dot from Graphviz: $ gprof2dot example_views.prof | dot -Tsvg -o example_views.svg If you don’t have dot installed, you can instead copy and paste the output into Graphviz Online, and download the SVG.You can then open the SVG in your browser, for example with Firefox on macOS: $ open -a Firefox example_views.svg The centre of the chart looks like this, when zoomed out:(Click to enlarge.)The chart tends to be huge-mongous. Each box represents a function or module-level execution, and there can be a lot of them. Red boxes have the most execution time and blue ones have the least, with green ones in the middle. Arrows point from callers to callees.Look for red or green boxes as the “hot spots” in the program. Zoom in on suspect areas and trace execution to try and understand what’s going on.After a little inspection of the example graph, I found this fairly well-called function in teal:(Click to enlarge.)The first line in each box is the final name in the module path, line number, and function or class name. You can hover over this for the browser’s label that shows the full module path. Very useful for boxes with a module labelled something unspecific like __init__.In the above screenshot, I’ve added a simulated label to show that the __init__ module in question is from Python’s re module. The function is compile, which compiles a regular expression string. So we can see that 21 milliseconds, almost the entire import time of distutils, is from dependencies creating regular expression objects. (The boxes pointing to compile come from functions in setuptools and logging.)Regular expression compilation is often expensive—at the end of the day, it is another programming language. To avoid import time cost, Django mostly uses a lazy regular expression wrapper that compiles on first use. Looking at this graph, it might make sense to use a similar pattern in logging or setuptools.Ways to improve startup timeSlow startup time is often a “death by a thousand papercuts” situation. There’s no single module import that dominates it. Instead, lots of little things add up to slowness.Correspondingly, improving startup time normally takes a bunch of piecewise targeted changes. Here are some techniques that can help, roughly ordered by their approximate utility. Enjoy!Upgrade PythonUpgrading Python is groundwork for any optimization effort. Python keeps getting faster, especially with the focus of the Faster CPython project from Python 3.11+.Python 3.11 is ~25% faster than Python 3.10, and Python 3.12 promises to be notably faster again. Before you chase the harder gains from improving individual modules, upgrade to Python 3.11+, which requires Django 4.1+. (If only there were a tool to help you upgrade Django….)Remove unused codeRemoving old features or dependencies will provide the easiest wins. When looking through your import waterfall, first look for the dead weight that you can cut, particularly unused dependencies.Swap libraries for leaner alternativesSometimes your code uses a huge library where smaller alternatives exist. For example, if you’re importing numpy for only one linear regression function, statistics.linear_regression() in the standard library is a faster-to-import alternative, albeit less flexible.Copy relevant bits into your projectIf you’re importing a large library for a few small classes or functions, consider including custom versions of those things in your project. For example, in the real world strtobool() case, I couldn’t determine which strings needed to be supported, so I copied the function from distutils into the project.Use fewer librariesSometimes it’s better to avoid a dependency altogether. Imagine you need ACME API, but only a few endpoints. Its official package covers all endpoints and edge cases, and so is slow to import. You might be better off calling their API directly, say with urllib3.Avoid import-time database queriesIt is particularly slow to run database queries at import time. They’re also a potential source of bugs since data fetched at startup can go stale.One pattern is to preload some rarely-changing data: COUNTRIES = dict ( Country . objects . values_list ( "id" , "name" )) Works great until something changes…You can avoid doing the query at import time by wrapping it in a function: def get_countries (): return dict ( Country . objects . values_list ( "id" , "name" )) Code that uses this will then need to call the function and avoid calling it multiple times per request. It might be easier to add caching to the function. A useful library to do so is cachetools, which has a TTL (time-to-live) cache decorator: from cachetools.func import ttl_cache @ttl_cache ( maxsize = 1 , ttl = 600 ) # 10 minutes def get_countries (): return dict ( Country . objects . values_list ( "id" , "name" )) For more on that technique, see my previous post.In Ticket #33143, I proposed that Django deprecate the ability to run queries until after AppConfig.ready() methods run. No action has yet been taken on that ticket—maybe you could give it a go? Until then, we rely on our wits, and profiling, to spot this source of startup slowness.Defer importsDoing some imports “lazily” can reduce start-up time, at the cost of slowing down the first usage of the relevant module. Since it’s normally better to prioritize speed for users, this technique is not always applicable. But it can be fine for dependencies used in already “slow” features.Say you were using openpyxl to generate a Microsoft Excel spreadsheet in a data export view. You might be able to tuck its import within the view function: def export_data ( request ): import openpyxl ... If nothing else imports openpyxl, it will only get loaded on the first call to export_data(). But the import can also fail only at runtime too, so caution and testing are advisable.For example, see Hugo van Kemenade’s pypistats PR #289. By deferring some rarely used imports, plus swapping one dependency, he sped up this CLI tool nearly 10 times.For an example in Django, see PR #13443, in which I deferred the import of jinja2 in django.forms. This saves the ~20ms from importing jinja2 in Django environments where it’s installed but not used within the Django project (a fairly common occurence).(PEP-690 proposed a way to automatically make all imports in a module lazy, but it was rejected by the Python Steering Council, for being too complex. It remains to be seen if any such feature will be added to Python.)Do other kinds of work lazilyYou can defer all kinds of work away from import time, not just database queries or large module imports. If you see something slow happening in a module, try wrapping it in a function to defer it until needed.Say you were using an ACME API client, which could take a little while to instantiate: from acme import AcmeClient acme_client = AcmeClient () You can defer this work by wrapping the instantiation in a function with the functools.cache() decorator: from functools import cache from acme import AcmeClient @cache def get_client (): return AcmeClient () This way the client will only be created on the first call to get_client().Similarly, try to avoid pre-populating large data structures, and instead, compute them lazily. For example, take this large dictionary: widget_codes = { n : get_widget_code ( n ) for n in range ( 100_000 )} It’s just as convenient as a @cached function: from functools import cache @cache def get_widget_code ( n ): return ... Cache is king.Move work to another processAnother option is to extract work into another process, where it can run independently to your main startup. For example, take custom system checks. These are good for improving your code quality, but since they run on nearly every management command, they need to be fast. Slow system checks are better in other processes, such as linting rules or tests.I’ve seen custom system checks that parse the source code of each Django model, in order to check their ASTs. This is an unreasonable burden to place on startup time. A better place to do this kind of check is within a flake8 plugin. Since flake8 is a linter, it already parses code and presents plugins with the AST, and it runs separately to your Django process.Anders Hovmöller’s django-fastdev project moves Django’s system checks into a thread (using monkey-patching). This means that commands start faster, but can fail after starting processing. I feel this is a bit hacky to apply to Django’s system checks, but you may be able to use the technique for custom checks.Monkey-patch a dependencySay you find a useful dependency wastes time setting up stuff that you don’t need. Rather than find an alternative or copy-paste code, you could instead use a targeted monkey-patch to avoid that work.David Ormsbee shared on Twitter how the EdX Platform project monkey-patches their old version of kombu. When kombu loads its entrypoints, it takes a while, but EdX doesn’t need that functionality. In this PR they added lines to monkey-patch entrypoints() within kombu.utils, replacing it with a no-op function: # We monkey patch Kombu's entrypoints listing because scanning through this # accounts for the majority of LMS/Studio startup time for tests, and we don't # use custom Kombu serializers (which is what this is for). Still, this is # pretty evil, and should be taken out when we update Celery to the next version # where it looks like this method of custom serialization has been removed. # # FWIW, this is identical behavior to what happens in Kombu if pkg_resources # isn't available. import kombu.utils kombu . utils . entrypoints = lambda namespace : iter ([]) “Pretty evil”, maybe.Monkey-patching is generally questionable, because the target module may not work properly with the patch, now or in a future version. Adding tests is advisable, as is finding or opening an upstream issue to see if the package can improve.Beware the cost of import time code generationMany “syntactic sugar” tools basically generate Python code at import time, for example, collections.namedtuple(), dataclasses, and attrs. This approach can make them notably slower than vanilla classes. This isn’t to say that such tools aren’t worth the cost, but it’s good to beware of it.For example, a Point dataclass takes about 103 microseconds to create: In [ 1 ]: from dataclasses import dataclass In [ 2 ]: %%time it ... : @dataclass ... : class Point : ... : x : int ... : y : int ... : 103 µs ± 887 ns per loop ( mean ± std . dev . of 7 runs , 10 , 000 loops each ) That’s not huge: you could import about 10,000 such classes per second. But compared to the competition, it could be considered slow.A namedtuple with the same fields takes a fifth of the time, about 18 microseconds: In [ 6 ]: from collections import namedtuple In [ 7 ]: % timeit namedtuple('Point', 'x y') 18 µs ± 96.3 ns per loop ( mean ± std . dev . of 7 runs , 100 , 000 loops each ) namedtuples are probably worth avoiding though, since they add normally-unexpected indexing and iteration over fields.A vanilla class is 30 times faster, at ~3 microseconds: In [ 13 ]: %%time it ... : class Point : ... : def __init__ ( self , x : int , y : int ) -> None : ... : self . x = x ... : self . y = y 3.38 µs ± 49.6 ns per loop ( mean ± std . dev . of 7 runs , 100 , 000 loops each ) These examples aren’t exactly apples-to-apples: you’d use dataclasses or attrs for the extra features they give your classes. But if you aren’t using any of those features, it could be worth using vanilla classes (or figuring out a way to pre-generate the resultant code).Fork CPython to implement “strict modules”Forking CPython may not be practical for you, but it was for Instagram. As detailed in this 2019 blog post, they found their Django project’s startup time grew up to 60(!) seconds. Much of their startup time came from developers accidentally adding import-time work that could be deferred.To overcome the issue, they created “strict modules”, which restrict which operations can be done at import time. This lives in their CPython fork, Cinder, so it’s not available to us mere mortals. But perhaps one day we’ll see a similar feature in main CPython, or maybe a linter could enforce “strictness”.FinMay your startup time start going down,—Adam

Here are several tools for profiling startup time.-X importtime: Python’s import profilerPython 3.7 introduced the -X importtime option, a specialized profiler that times all imports. It prints a table to stderr, summarizing microsecond timing information for each imported module: $ python -X importtime -c 'import django' import time: self [us] | cumulative | imported package import time: 136 | 136 | _io import time: 20 | 20 | marshal import time: 256 | 256 | posix import time: 261 | 672 | _frozen_importlib_external ... import time: 499 | 6898 | subprocess import time: 309 | 309 | weakref import time: 60 | 60 | org import time: 10 | 70 | org.python import time: 7 | 77 | org.python.core import time: 187 | 572 | copy import time: 228 | 799 | django.utils.functional import time: 183 | 982 | django.utils.regex_helper import time: 340 | 12228 | django.utils.version import time: 206 | 12434 | django This table totals 72 rows, snipped with ... to keep the blog post readable.Read from the bottom-up to see the “import tree”:Running the code in the django module itself takes 206 microseconds (its self time).Cumulatively, all the imports required for django take 12434 microseconds.The last module that django imported was django.utils.version. This took 340 microseconds inside itself, and a cumulative 12228 microseconds in its imports.Continuing up, you can see other modules as they were imported: django.utils.regex_helper, django.utils.functional, copy, etc.importtime-waterfall: Make pretty charts from -X importtimeThe raw data from -X importtime is useful, but reading through a table of numbers doesn’t make it easy to spot the slow modules. This is where the importtime-waterfall tool by Anthony Sottile can help. It wraps python -X importtime to capture the data, and can output it as an HAR (HTTP Archive) file.The HAR format is designed for profiles of HTTP requests made by a website, but it neatly doubles up for containing module import times as fake “requests“. There are then many viewers you can use to read the data in a waterfall chart.To make a HAR file with importtime-waterfall, install it in your virtual environment, and then invoke it with --har and the module name to profile: $ importtime-waterfall --har django > django.har You can then visualize this file with one of many HAR viewer tools. The one linked in the docs is the HTTP Archive Viewer by Jan Odvarko. Drop the file directly onto its web page to view it as a waterfall chart:(Click to enlarge.)Each module has two timings, expanded on hover:“Waiting” time (purple): the time spent waiting for dependencies to import“Receiving” time (grey): the time spent executing the code in the module (“self” time)Modules with high “Receiving” time are normally the most susceptible to optimization.Beware: the timings are wrong by a factor of 1000. This is because python -X importtime records in microseconds, but HAR files only support integer milliseconds. Rather than lose precision, importtime-waterfall multiplies timings by 1000. Use the waterfall chart to compare relative values, rather than absolute ones.The example makes it clear that when running import django, most of the time is spent actually importing the subprocess module.Run importtime-waterfall on your projectimporttime-waterfall can only profile a module import, and not a full command, like ./manage.py runserver. So to profile a Django project, you’ll need to create a short module that initializes your project at import time. Let’s look at doing so with an example project that reproduces a performance problem I recently encountered.Below is a template. Put it in a file like it.py (standing for import time) next to your manage.py, and update the os.environ line to point to your settings file. import os import django from django.urls import resolve os . environ . setdefault ( "DJANGO_SETTINGS_MODULE" , "example.settings" ) django . setup () resolve ( "/" ) django.setup() initializes Django for your project, loads all apps and their models, and eventually calls all AppConfig.ready() methods. This is normally done automatically in manage.py and your ASGI/WSGI entrypoint.The resolve() call forces Django to load your full URLconf and all referred-to modules. Django loads URLs and views lazily, but you probably care about profiling their import time too, at least to improve runserver restart time.With the wrapper module in place, you can run importtime-waterfall like so: $ importtime-waterfall --har it > it.har The output HAR file will have a lot of entries. For the example project with one app, I got 503 imported modules; on a large client project, I found 4403.Here’s a subsection of the example project’s profile:(Click to enlarge.)This screenshot shows example.views importing distutils.util, and that takes about 26 milliseconds (26 seconds / 1000). distutils is suspect because it’s used for creating Python packages, and rarely needed at runtime. (It’s also deprecated and slated for removal in Python 3.12.)Let’s take a look at what example.views imports from distutils: from distutils.util import strtobool def index ( request ): show_referral_banner = strtobool ( request . GET . get ( "referred" , "false" )) ... Ah, only the strtobool() function. This casts strings like "True" or "0" to booleans. The function is not really related to packaging, but simply an internal convenience function, that happens to do something required in this view.Imagine that after examining the project’s code, you determine that the referred query parameter will only ever be "true" or "false". So you don’t need the complexity of strtobool(), and all the modules it drags in with it. Instead, you can directly compare with "true": def index ( request ): show_referral_banner = request . GET . get ( "referred" , "false" ) == "true" ... After removing the distutils import and re-profiling, the example project drops from 503 to 409 modules, with a 24% / 26 millisecond time saving. Not bad.cProfile: More detailed profilingIt’s often not so easy to spot why an import is slow. Modules with a high “self” time have module-level code that takes a while, but it may be unclear what’s taking time. To determine what’s slow, you can use a “traditional” profiler that captures the runtime of each function call.A simple but effective combo is Python’s built-in profiler module cProfile, along with the visualization conversion tool gprof2dot. cProfile times each function execution, and gprof2dot outputs a graph of time spent per function, with colourization of the “hot paths”. Here’s a quick run-through of using those.First, profile the problematic module with cProfile’s -o option to save the output file, and -m option to select the module: $ python -m cProfile -o example_views.prof -m example.views (Using the slow version of example.views that still imports distutils.)Then install the yelp fork of gprof2dot (which runs on new Pythons), and pipe its output into dot from Graphviz: $ gprof2dot example_views.prof | dot -Tsvg -o example_views.svg If you don’t have dot installed, you can instead copy and paste the output into Graphviz Online, and download the SVG.You can then open the SVG in your browser, for example with Firefox on macOS: $ open -a Firefox example_views.svg The centre of the chart looks like this, when zoomed out:(Click to enlarge.)The chart tends to be huge-mongous. Each box represents a function or module-level execution, and there can be a lot of them. Red boxes have the most execution time and blue ones have the least, with green ones in the middle. Arrows point from callers to callees.Look for red or green boxes as the “hot spots” in the program. Zoom in on suspect areas and trace execution to try and understand what’s going on.After a little inspection of the example graph, I found this fairly well-called function in teal:(Click to enlarge.)The first line in each box is the final name in the module path, line number, and function or class name. You can hover over this for the browser’s label that shows the full module path. Very useful for boxes with a module labelled something unspecific like __init__.In the above screenshot, I’ve added a simulated label to show that the __init__ module in question is from Python’s re module. The function is compile, which compiles a regular expression string. So we can see that 21 milliseconds, almost the entire import time of distutils, is from dependencies creating regular expression objects. (The boxes pointing to compile come from functions in setuptools and logging.)Regular expression compilation is often expensive—at the end of the day, it is another programming language. To avoid import time cost, Django mostly uses a lazy regular expression wrapper that compiles on first use. Looking at this graph, it might make sense to use a similar pattern in logging or setuptools.

Leave a Comment