Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit.This tip is already… (2024)

Brandon Rohrer

Data scientist

Report this post

Numba rule of thumb #5: Use @njit rather than @jit.This tip is already outdated, showing how active Numba development is. In version 0.58 and earlier, the default behavior of the compiler was to fall back to regular Python compilation if anything should happen to frustrate the Numba compiler. A small glitch like a data type mismatch could turn a bullet-fast Numba-jitted function into a slower-than-tar Python for loop. And the bad part is that there would be no error, no hint to the developer or user that anything was wrong, other than a mysterious performance drop.The way to get around this was to use @jit(nopython=True) as the decorator for Numba functions. This was so commonly used that it got it's own nickname, @njit. Compiling with @njit ensured that if Numba compilation failed, an error would be thrown. It embodied the software engineering best practice of having all failures be noisy.It was so useful in fact that as of Numba 0.59 (released January 2024) @jit now defaults to nopython=True. Changing the default behavior of the decorator may be breaking change for some code bases, but it comes at the benefit of better-engineered code for many others. And as a bonus, if you are using a recent version of Numba, you can stop worrying about this issue entirely, and just use @jit.

Like Comment

To view or add a comment, sign in

More Relevant Posts

Brandon Rohrer

Data scientist

1d
Report this post
Numba rule of thumb #7: Pass return variables in as input arguments.This avoids initializing a fresh array each time, shaving off precious microseconds.It's natural to write a function that looks like this@njitdef add(a, b):c = np.zeros(a.size)for i in range(a.size):c[i] = a[i] + b[i]return cwhere the result array, c, is created and initialized before it is populated.Often, functions are called repeatedly with arguments of the same shape. (The fact that they are called so often is what makes them appealing targets for speeding up with Numba.) When that is the case, it's possible to use a shortcut.@njitdef add(a, b, c):for i in range(a.size):c[i] = a[i] + b[i]where the result array, c, is created just once, outside the function, and re-used. This way the memory space is pre-allocated and the function can get right to the business at hand.This is such a useful trick that NumPy uses it too. Most NumPy functions have an optional `out` parameter that you can use to pass a pre-allocated results array.The difference is typically just a small fraction of the total compute time, but it's a freebie–an optimization that comes with simpler code and logic. There's no downside! That's a rare thing. Letting it go unclaimed is like leaving the last bite of cheesecake just sitting on the table.

14

2 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist
See Also
Optimizing Performance in Numba: Advanced Techniques for Parallelization - GeeksforGeeks Point in polygon calculation using vector geometric methods with application to geospatial data.Brandon Rohrer on LinkedIn: Numba rule of thumb #6: Call Numba-jitted functions once before kicking…Python Tokenizer.sequences_to_matrix Examples, keras.preprocessing.text.Tokenizer.sequences_to_matrix Python Examples

2d
Report this post
Numba rule of thumb #6: Call Numba-jitted functions once before kicking off the program.This avoids awkward hiccups in execution.Numba functions are so fast because they are pre-compiled to machine code, but this compiling step takes a few moments to complete. The compiler is also "lazy" because it waits until the absolute last possible second. It is a "just in time" or JIT compiler. The upside of this is that it doesn't incur any latency in the program starting up and avoids unnecessary compilations. The downside is that it can make for an unexpected several second pause in the program the first time the function is called.Having that unscheduled pause can knock processes out of synchronization or can make for a bumpy user experience. To take back control of when this occurs you can make a gratuitous first call to your Numba functions during startup, when nothing important is going on yet, and a user will be least annoyed by it. For example, when I’m timing a Numba-jitted function, including the first call in the timing estimate would grossly overestimate the average execution time, so I make sure to call it first outside the loop. This "warms up" the functions, so that they are already compiled by the time they are encountered in the natural flow of the program.It’s a small thing, but small things add up.

20

4 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

3d
Report this post
Sometimes the tests test the code and sometimes the code tests the tests

76

3 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

1w
Report this post
Numba rule of thumb #4: Don't write your own matrix multiplication.The widest, best-paved road in scientific computing is matrix multiplication. NumPy's matrix multiplication has been optimized for your system in ways Numba can't match. Comparing a straightforward Numba for-loop implementation to NumPy's matmul() is sobering.@njitdef matmul_numba(a, b, c):n_i, n_j = a.shapen_j, n_k = b.shapefor i in range(n_i):for j in range(n_j):for k in range(n_k):c[i][k] += a[i][j] * b[j][k]For a pair of 2000 x 2000 matrices, my system shows that matmul_numba() takes 2800 ms, compared to numpy.matmul()'s 125 ms–a more than 20X speedup. You can't beat NumPy's matmul(). But don't let that stop you from trying! One trick you can use is@njit(parallel=True)and substituting Numba's prange() for range(). prange() is a special variant of range() that supports parallelization. Together these instruct Numba to parallelize the matrix operation across multiple threads, as Numpy does. For me, this reduces Numba's run time by a factor of four to 720 ms. It's still 5X slower than NumPy, but we've closed the gap a bit. There are two good lessons here. The first is that there are tricks to speed up Numba even more. The second is that Numba is not the right tool for every job. For large, optimized calculations there may be a better tool. numpy.matmul() is one of these.

190

47 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

1w
Report this post
Numba rule of thumb #3: Don't create intermediate arrays.It's a fine point, but you can shave precious time off your Numba execution by not creating extra arrays. Intermediate arrays can make code more readable, but Numba takes them literally. It takes the extra time to allocate the memory for the intermediate variables.Here's an example from physics simulations--calculating all the pairwise distances between two groups of points. These two functions are identical, except that one makes several stops along the way to the final result.For 5000 points in each group, the distances_intermediate() function takes 600 ms on my machine, while distances_direct() takes 90 ms. This is a contrived example, but it shows how those intermediate arrays can bog you down.@njitdef distances_intermediate(x1, y1, x2, y2, d):dx = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):dx[i, j] = x1[i] - x2[j]dy = np.zeros((y1.size, y2.size))for i in range(y1.size):for j in range(y2.size):dy[i, j] = y1[i] - y2[j]dx_squared = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):dx_squared[i, j] = dx[i, j] ** 2dy_squared = np.zeros((y1.size, y2.size))for i in range(y1.size):for j in range(y2.size):dy_squared[i, j] = dy[i, j] ** 2d_squared = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):d_squared[i, j] = dx_squared[i, j] + dy_squared[i, j]for i in range(x1.size):for j in range(x2.size):d[i, j] = d_squared[i, j] ** .5@njitdef distances_direct(x1, y1, x2, y2, d):for i in range(x1.size):for j in range(x2.size):d[i, j] = ((x1[i] - x2[j]) ** 2 + (y1[i] - y2[j]) ** 2) ** .5

23

4 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

1w
Report this post
Numba rule of thumb #2: Avoid Numpy array operations and functionsThis is a repeat of rule #1 about preferring for loops, but it is so counterintuitive that it bears repeating.Avoid doing any NumPy operations in a Numba-jitted function. Don't create new arrays, don't broadcast existing arrays, don't reshape() or transpose() or concatenate(). (We'll talk about exceptions to this in later rules.)NumPy is fast because it uses pre-compiled, optimized C code. Numba is fast because it compiles Python code in a highly-optimized way. But Numba can't change the optimized NumPy code, so it's stuck trying to shove a square peg into a round hole, and some performance is lost.To demonstrate, here are NumPy and Numba functions that multiply three one-dimensional arrays to get a three-dimensional array, then sums it along its second dimension.def numpy_version(a, b, c, d): d = np.sum( a[:, np.newaxis, np.newaxis] * b[np.newaxis, :, np.newaxis] * c[np.newaxis, np.newaxis, :], axis=1 )@njitdef numba_version(a, b, c, d): for i in range(a.size): for j in range(b.size): for k in range(c.size): d[i, k] += a[i] * b[j] * c[k]With these input argumentsa = np.random.sample(200)b = np.random.sample(300)c = np.random.sample(400)d = np.zeros((200, 400))I get 46.0 ms for the numpy_version() and 2.6 ms for the numba_version(), a speed up of more than 17X. That factor only grows as a, b, and c get larger.

51

6 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

1w
Report this post
Numba Rule of Thumb 1: Try for loops firstYoung Python programmers quickly get for loops beaten out of them. Large for loops are glacially slow. Instead, we are taught vectorization–to put our numbers into arrays before working with them. This allows under-the-hood optimizations of NumPy to speed things up.When working in Numba, it's the opposite. Within a Numba function for loops generally perform better than array operations. For instance, check out these two functions.@njitdef add_arrays(a, b, c):c = a + b@njitdef add_for_loop(a, b, c):for i in range(a.size):c[i] = a[i] + b[i]For 10 million element arrays, the add_arrays() function runs in 35 milliseconds on my machine.The add_for_loop() function runs in 12.6 milliseconds.Numba loves for loops. Even though it operates naturally on NumPy arrays as input arguments, I've found that it runs fastest when I avoid using any array operations in the function. For loops and base Python are your friends. I'm not sure why, but my best guess is that the optimizations that Numpy has already performed conflict with the compile-time optimizations of Numba. (If you know more about this please drop your insights into the comments.)

274

31 Comments

Like Comment

To view or add a comment, sign in
Brandon Rohrer

Data scientist

2w
Report this post
Your first Numba functionIf you’re new to Numba, not to worry. It’s not nearly as intimidating as it sounds. Imagine you have two arrays and you want to add them. You can of course use NumPy’s array operations.import numpy as npn = 10_000_000a = np.random.sample(size=n)b = np.random.sample(size=n)c = a + bThis typically takes 15 ms on my box.But if you need to go even faster you can use Numba. First you'll need to make sure you have it. For me this happens at the command line:python3 -m pip install numbaThen write a function that uses Numba's just-in-time compiler.from numba import jit @jitdef add(a, b, c): for i in range(a.size):c[i] = a[i] + b[i]c = np.zeros(n)add(a, b, c)The first time this function is called it takes a little time to compile, but after that this runs in about 12 ms for me. Faster than even NumPy! In my experience, the more complex the calculation, the greater the benefit of moving to Numba.It’s fun to compare this against base Python to see how far we’ve come. Without the @jit decorator add() takes 2300 ms to run. Numba makes it almost 200 times faster.

62

7 Comments

Like Comment

To view or add a comment, sign in

132,294 followers

1,703 Posts

View Profile

Explore topics

Sales
Marketing
Business Administration
HR Management
Content Management
Engineering
Soft Skills
See All

Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit. This tip is already… (2024)

More Relevant Posts

Explore topics