Let’s discuss the following python code. It is a simple piece of code one could come up to solve a minor daily task. This script has been written to conduct a simple grid search over a 4D parameter space, it compute the values and stores parameter combinations of interest. It is not wrong to write it that way and it works, a classic throw away code. The story changes when you want to scan big parameter spaces. You will realize that your program will need a increasing time to run. What to do now? A simple solution would be to parallelize it. But! The first step before parallelization should always be a optimization of your program!
The lines 20-26 define the ranges and some arbitrary parameters for our computation. They also define a empty data structure to store some of our computation values. The following quadruple loop (starting from line 28) scans the parameter space (the integers from b=−200b = -200b=−200 to t=200t = 200t=200 resulting in (t−b)4=4096⋅108\left(t - b\right)^{4} = 4096 \cdot 10^{8}(t−b)4=4096⋅108 scanned values). Embedded into it is a computation function and a sub-sequential test to conditionally save the results of interest into the aforementioned data structure (lines 33-37). The computation routine in the lines 4 to 18 does some computations to bring the grid parameters into relation to each other. Then checks the computed values and conditionally returns or does further computation and returns those results later on.
|
|
We will adhere to a couple of simple rules to optimize the shown script.
Avoid re-computation of values if their computations is expensive.
Usual expensive computational operations are trigonometric (e.g. sin(x)
, cos(x)
, acos(x)
, etc.), exponential exp(x)
, square root sqrt(x)
and power x**a
, divisions x/y
and more.
Avoid expensive operations if you can use simpler counter parts.
A example here is the power operation x**2
which sometimes results in the use of a expensive algorithm instead of just x*x
.
Avoid the call too functions or libraries if the operation itself is very basic.
The call of functions in scripting languages has a considerable cost which can easily outweigh the simple operations done by them (prominent example for simple functions is npm’s isEven2
and isOdd3
).
Those rules hold value for this example and for Python or scripting languages. There are exceptions to those rules and differences to other languages.
So lets have a look at the example. Lines 5 and 6 contain trigonometric computations with static values.
|
|
We compute them by hand or pre-compute them if they have a unhandy representation.
|
|
The two subsequent lines contain the power of two operation
|
|
which is easily replaceable with its cheaper multiplication counter part.
|
|
At line 14 and 15 we divide by the square root of our determinant which are two avoidable operations (/
and sqrt())
|
|
replaceable by ‘remembering’ of that value (the same can be done with la/lb further down).
|
|
The last change will be the biggest one. Our function missmatch computes a determinant of a 2×2 matrix and calls a function for that. This is not necessary due to the simple nature of that computation.
|
|
We replace the call to the det() function and the created helper-data-structure with the corresponding definition. This enables us to reuse a variable (the re-using of variables can bring possible dangers with it and should be generally avoided for code clarity).
|
|
Every optimization should be accompanied by time measurements to verify improvements or to undo bad changes. The measurements to the proposed changes are displayed in the following diagram. The displayed bars represent the average time needed for one data point in our 4D grid.
All of our changes had a positive effect on the runtime. The most significant jump in runtime was accomplished by circumventing of the matrix construction and the calculation of its determinant. Check the estimated run times to compute all 4004400^{4}4004 data points (hover over the respective bars). You will see that those small changes can have an effect of hours! The main conclusions you should take with you are:
It is fine to just code something up.
But you should clean it up as soon as you want to scale it (or that there is the possibility that someone wants to scale it). As those small numbers can hurt you and the planet in the long run.
And lastly, avoid library functions in scripting languages for very simple calculations. The overhead could be much costlier than the computation they do.
I included a Julia solution to show what is possible with other languages (and because of my poor Python skills). Not to unfairly compare but to demonstrate that there is much room for improvement. Tow Possible changes would be to avoid the pandas data structure or to use a different intermediate data structure like a linked list. Another change would be to compile the essential pieces to avoid the overhead of the arbitrary variable types and the scripting language in itself.
Here are the six respective sources to the measured versions. The Julia version uses a linked list for the data accumulation and some helper functions (possible call to run it: julia --threads 1 --eval 'include("tests/parameter_search.jl"); test_csv(100);
).
|
|
|
|
**2
|
|
/
and sqrt()
|
|
A
and det()
|
|
|
|