Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit. This tip is already… (2024)

Brandon Rohrer

Data scientist

  • Report this post

Numba rule of thumb #5: Use @njit rather than @jit.This tip is already outdated, showing how active Numba development is. In version 0.58 and earlier, the default behavior of the compiler was to fall back to regular Python compilation if anything should happen to frustrate the Numba compiler. A small glitch like a data type mismatch could turn a bullet-fast Numba-jitted function into a slower-than-tar Python for loop. And the bad part is that there would be no error, no hint to the developer or user that anything was wrong, other than a mysterious performance drop.The way to get around this was to use @jit(nopython=True) as the decorator for Numba functions. This was so commonly used that it got it's own nickname, @njit. Compiling with @njit ensured that if Numba compilation failed, an error would be thrown. It embodied the software engineering best practice of having all failures be noisy.It was so useful in fact that as of Numba 0.59 (released January 2024) @jit now defaults to nopython=True. Changing the default behavior of the decorator may be breaking change for some code bases, but it comes at the benefit of better-engineered code for many others. And as a bonus, if you are using a recent version of Numba, you can stop worrying about this issue entirely, and just use @jit.

17

Like Comment

To view or add a comment, sign in

More Relevant Posts

  • Brandon Rohrer

    Data scientist

    • Report this post

    Numba rule of thumb #7: Pass return variables in as input arguments.This avoids initializing a fresh array each time, shaving off precious microseconds.It's natural to write a function that looks like this@njitdef add(a, b):c = np.zeros(a.size)for i in range(a.size):c[i] = a[i] + b[i]return cwhere the result array, c, is created and initialized before it is populated.Often, functions are called repeatedly with arguments of the same shape. (The fact that they are called so often is what makes them appealing targets for speeding up with Numba.) When that is the case, it's possible to use a shortcut.@njitdef add(a, b, c):for i in range(a.size):c[i] = a[i] + b[i]where the result array, c, is created just once, outside the function, and re-used. This way the memory space is pre-allocated and the function can get right to the business at hand.This is such a useful trick that NumPy uses it too. Most NumPy functions have an optional `out` parameter that you can use to pass a pre-allocated results array.The difference is typically just a small fraction of the total compute time, but it's a freebie–an optimization that comes with simpler code and logic. There's no downside! That's a rare thing. Letting it go unclaimed is like leaving the last bite of cheesecake just sitting on the table.

    14

    2 Comments

    Like Comment

    To view or add a comment, sign in

    • Report this post

    Numba rule of thumb #6: Call Numba-jitted functions once before kicking off the program.This avoids awkward hiccups in execution.Numba functions are so fast because they are pre-compiled to machine code, but this compiling step takes a few moments to complete. The compiler is also "lazy" because it waits until the absolute last possible second. It is a "just in time" or JIT compiler. The upside of this is that it doesn't incur any latency in the program starting up and avoids unnecessary compilations. The downside is that it can make for an unexpected several second pause in the program the first time the function is called.Having that unscheduled pause can knock processes out of synchronization or can make for a bumpy user experience. To take back control of when this occurs you can make a gratuitous first call to your Numba functions during startup, when nothing important is going on yet, and a user will be least annoyed by it. For example, when I’m timing a Numba-jitted function, including the first call in the timing estimate would grossly overestimate the average execution time, so I make sure to call it first outside the loop. This "warms up" the functions, so that they are already compiled by the time they are encountered in the natural flow of the program.It’s a small thing, but small things add up.

    20

    4 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Sometimes the tests test the code and sometimes the code tests the tests

    76

    3 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Numba rule of thumb #4: Don't write your own matrix multiplication.The widest, best-paved road in scientific computing is matrix multiplication. NumPy's matrix multiplication has been optimized for your system in ways Numba can't match. Comparing a straightforward Numba for-loop implementation to NumPy's matmul() is sobering.@njitdef matmul_numba(a, b, c):n_i, n_j = a.shapen_j, n_k = b.shapefor i in range(n_i):for j in range(n_j):for k in range(n_k):c[i][k] += a[i][j] * b[j][k]For a pair of 2000 x 2000 matrices, my system shows that matmul_numba() takes 2800 ms, compared to numpy.matmul()'s 125 ms–a more than 20X speedup. You can't beat NumPy's matmul(). But don't let that stop you from trying! One trick you can use is@njit(parallel=True)and substituting Numba's prange() for range(). prange() is a special variant of range() that supports parallelization. Together these instruct Numba to parallelize the matrix operation across multiple threads, as Numpy does. For me, this reduces Numba's run time by a factor of four to 720 ms. It's still 5X slower than NumPy, but we've closed the gap a bit. There are two good lessons here. The first is that there are tricks to speed up Numba even more. The second is that Numba is not the right tool for every job. For large, optimized calculations there may be a better tool. numpy.matmul() is one of these.

    190

    47 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Numba rule of thumb #3: Don't create intermediate arrays.It's a fine point, but you can shave precious time off your Numba execution by not creating extra arrays. Intermediate arrays can make code more readable, but Numba takes them literally. It takes the extra time to allocate the memory for the intermediate variables.Here's an example from physics simulations--calculating all the pairwise distances between two groups of points. These two functions are identical, except that one makes several stops along the way to the final result.For 5000 points in each group, the distances_intermediate() function takes 600 ms on my machine, while distances_direct() takes 90 ms. This is a contrived example, but it shows how those intermediate arrays can bog you down.@njitdef distances_intermediate(x1, y1, x2, y2, d):dx = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):dx[i, j] = x1[i] - x2[j]dy = np.zeros((y1.size, y2.size))for i in range(y1.size):for j in range(y2.size):dy[i, j] = y1[i] - y2[j]dx_squared = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):dx_squared[i, j] = dx[i, j] ** 2dy_squared = np.zeros((y1.size, y2.size))for i in range(y1.size):for j in range(y2.size):dy_squared[i, j] = dy[i, j] ** 2d_squared = np.zeros((x1.size, x2.size))for i in range(x1.size):for j in range(x2.size):d_squared[i, j] = dx_squared[i, j] + dy_squared[i, j]for i in range(x1.size):for j in range(x2.size):d[i, j] = d_squared[i, j] ** .5@njitdef distances_direct(x1, y1, x2, y2, d):for i in range(x1.size):for j in range(x2.size):d[i, j] = ((x1[i] - x2[j]) ** 2 + (y1[i] - y2[j]) ** 2) ** .5

    23

    4 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Numba rule of thumb #2: Avoid Numpy array operations and functionsThis is a repeat of rule #1 about preferring for loops, but it is so counterintuitive that it bears repeating.Avoid doing any NumPy operations in a Numba-jitted function. Don't create new arrays, don't broadcast existing arrays, don't reshape() or transpose() or concatenate(). (We'll talk about exceptions to this in later rules.)NumPy is fast because it uses pre-compiled, optimized C code. Numba is fast because it compiles Python code in a highly-optimized way. But Numba can't change the optimized NumPy code, so it's stuck trying to shove a square peg into a round hole, and some performance is lost.To demonstrate, here are NumPy and Numba functions that multiply three one-dimensional arrays to get a three-dimensional array, then sums it along its second dimension.def numpy_version(a, b, c, d): d = np.sum( a[:, np.newaxis, np.newaxis] * b[np.newaxis, :, np.newaxis] * c[np.newaxis, np.newaxis, :], axis=1 )@njitdef numba_version(a, b, c, d): for i in range(a.size): for j in range(b.size): for k in range(c.size): d[i, k] += a[i] * b[j] * c[k]With these input argumentsa = np.random.sample(200)b = np.random.sample(300)c = np.random.sample(400)d = np.zeros((200, 400))I get 46.0 ms for the numpy_version() and 2.6 ms for the numba_version(), a speed up of more than 17X. That factor only grows as a, b, and c get larger.

    51

    6 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Numba Rule of Thumb 1: Try for loops firstYoung Python programmers quickly get for loops beaten out of them. Large for loops are glacially slow. Instead, we are taught vectorization–to put our numbers into arrays before working with them. This allows under-the-hood optimizations of NumPy to speed things up.When working in Numba, it's the opposite. Within a Numba function for loops generally perform better than array operations. For instance, check out these two functions.@njitdef add_arrays(a, b, c):c = a + b@njitdef add_for_loop(a, b, c):for i in range(a.size):c[i] = a[i] + b[i]For 10 million element arrays, the add_arrays() function runs in 35 milliseconds on my machine.The add_for_loop() function runs in 12.6 milliseconds.Numba loves for loops. Even though it operates naturally on NumPy arrays as input arguments, I've found that it runs fastest when I avoid using any array operations in the function. For loops and base Python are your friends. I'm not sure why, but my best guess is that the optimizations that Numpy has already performed conflict with the compile-time optimizations of Numba. (If you know more about this please drop your insights into the comments.)

    274

    31 Comments

    Like Comment

    To view or add a comment, sign in

  • Brandon Rohrer

    Data scientist

    • Report this post

    Your first Numba functionIf you’re new to Numba, not to worry. It’s not nearly as intimidating as it sounds. Imagine you have two arrays and you want to add them. You can of course use NumPy’s array operations.import numpy as npn = 10_000_000a = np.random.sample(size=n)b = np.random.sample(size=n)c = a + bThis typically takes 15 ms on my box.But if you need to go even faster you can use Numba. First you'll need to make sure you have it. For me this happens at the command line:python3 -m pip install numbaThen write a function that uses Numba's just-in-time compiler.from numba import jit @jitdef add(a, b, c): for i in range(a.size):c[i] = a[i] + b[i]c = np.zeros(n)add(a, b, c)The first time this function is called it takes a little time to compile, but after that this runs in about 12 ms for me. Faster than even NumPy! In my experience, the more complex the calculation, the greater the benefit of moving to Numba.It’s fun to compare this against base Python to see how far we’ve come. Without the @jit decorator add() takes 2300 ms to run. Numba makes it almost 200 times faster.

    62

    7 Comments

    Like Comment

    To view or add a comment, sign in

Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit.This tip is already… (30)

Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit.This tip is already… (31)

132,294 followers

  • 1,703 Posts

View Profile

Follow

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit.
This tip is already… (2024)
Top Articles
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 6041

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.