Point in polygon calculation using vector geometric methods with application to geospatial data. (2024)

Eyram SchwingerDepartment of Mathematics, University of GhanaRalph TwumDepartment of Mathematics, University of GhanaThomas KatsekporDepartment of Mathematics, University of GhanaGladys SchwingerInstitute of Environment and Sanitation Studies, University of Ghana

Abstract

In this work, we designed algorithms for the point in polygon problem based on the ray casting algorithm using equations from vector geometry. The algorithms were implemented using the python programming language. We tested the algorithm against the point in polygon algorithms used by the shapely (and by extension geopandas) library and the OpenCV library using points from the google Open Buildings project. Our algorithm in pure python performed much better than the shapely implementation. It also performed better than the OpenCV implementation when combined with the Numba optimization library. We also performed simulations to verify that our algorithm performance was of the order $\mathcal{O}(n).$

Keywords— point in polygon, ray casting, vector geometry, shapely, Numba

Introduction

The point in polygon problem is a fundamental geometric problem with a wide range of applications in computer graphics, painting simulation, robotics, geosciences, remote sensing, and geographic information systems. Therefore, efficient numerical algorithms for solving point in polygon computations are essential.

Many algorithms have been proposed (for instance in [7, 5, 10, 2, 8, 6]).Some of the methods include the winding number method [7, 10, 2, 6], which measures the number of times a polygon encloses a point, thesum of angles method [8, 6], where the sum of the interior angles of the polygon is used, and the ray castingmethod [8, 5]. The ray casting method involvesshooting a ray from the point in question in any direction and countinghow many times the ray intersects the boundaries of the polygon.Most of these methods assume the polygon is convex.

Several of the algorithms are computationally expensive and are sensitive to rounding and truncation errors.[6]states that the time complexities of the preprocessing step rangesfrom $\mathcal{O}(n)$ to $\mathcal{O}(n^{2}).$ In this paper, we design an algorithm for point in polygon computationusing vector geometric methods, with the advantage that the algorithmcan take advantage of processor parallelization.

The work is broken down in the following sections: The method section where we develop our algorithm from the vector geometric definition of the line. In the implementation section, we write the python code for the algorithm developed in methods then in the simulation section we test the algorithm on real world data and against other algorithms.

Method

Our proposed algorithm is based on the ray casting method. Suppose we wantto determine whether a point $P(x,y)$ lies within a closed polygon $R$ with vertices $R_{0},R_{1},\ldots,R_{n}$ , where $R_{0}=R_{n}.$ In the ray casting method, a ray $l$ is drawn moving out from thepoint $P$ in a specific direction. The point $P$ lies inside the polygon $R$ if $l$ intersects the edges of $R$ an odd number of times.In vector geometry, one way of writing out the equation of a lineis the normal form. Let $\boldsymbol{n}$ be a normal vector to theline. Note that a line in the 2D plane has two normals going in oppositedirections that can be denoted $\boldsymbol{n}_{1}$ and $\boldsymbol{n}_{2}$ with $\boldsymbol{n}_{1}=-\boldsymbol{n}_{2},$ so $\boldsymbol{n}$ can be either $\boldsymbol{n}_{1}$ or $\boldsymbol{n}_{2}.$ Assuming $\boldsymbol{a}$ is a known point on the line and $\boldsymbol{r}$ is any point on the line, the equation of the line can be writtenas

\left(\boldsymbol{r}-\boldsymbol{a}\right)\cdot\boldsymbol{n}=0.

(1)

Point in polygon calculation using vector geometric methods with application to geospatial data. (1)

Every point on the line satisfies equation 1.Points that do not lie on the line do not satisfy the equation, hence $\left(\boldsymbol{r}-\boldsymbol{a}\right)\cdot\boldsymbol{n}\neq 0$ with points on opposite sides of the line having different signs.This is illustrated in Figure 1. The point $C$ is onone side of the line while the point $D$ is on the opposite side.Using the concept of perpendicular distance, the position vectorsof the points $C$ and $D$ can be written as $\boldsymbol{c}=\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}_{1}$ and $\boldsymbol{d}=\boldsymbol{f}+\mu\boldsymbol{\hat{n}}_{2},$ where $\hat{\boldsymbol{n}}$ represents the unit vector in the directionof $\boldsymbol{n},$ and $\lambda$ and $\mu$ are positive constants.Let $\boldsymbol{n}=\boldsymbol{n}_{1}.$ Then $\boldsymbol{c}=\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}$ and $\boldsymbol{d}=\boldsymbol{f}-\mu\boldsymbol{\hat{n}}.$ Substituting $\boldsymbol{c}$ and $\boldsymbol{d}$ into the equation of the linegives

	$\displaystyle\left(\boldsymbol{c}-\boldsymbol{a}\right)\cdot\boldsymbol{n}$	$\displaystyle=\left(\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}-\boldsymbol{a}\right)\cdot\boldsymbol{n}$
		$\displaystyle=\left(\boldsymbol{e}-\boldsymbol{a}\right)\cdot\boldsymbol{n}+\lambda\boldsymbol{\hat{n}}\cdot\boldsymbol{n}$
		$\displaystyle=\lambda\boldsymbol{\hat{n}}\cdot\boldsymbol{n}.$

Now $\left(\boldsymbol{e}-\boldsymbol{a}\right)\cdot\boldsymbol{n}=0$ ,since the points $A$ and $E$ lie on the line. Similarly

	$\displaystyle\left(d-\boldsymbol{a}\right)\cdot\boldsymbol{n}$	$\displaystyle=\left(\boldsymbol{f}-\mu\boldsymbol{\hat{n}}-\boldsymbol{a}\right)\cdot\boldsymbol{n}$
		$\displaystyle=\left(\boldsymbol{f}-\boldsymbol{a}\right)\cdot\boldsymbol{n}-\mu\boldsymbol{\hat{n}}\cdot\boldsymbol{n}$
		$\displaystyle=-\mu\boldsymbol{\hat{n}}\cdot\boldsymbol{n}.$

In our work, we choose to shoot a ray from the point $P$ with directionvector $\boldsymbol{d}=(1,0).$ This is a ray shot horizontally andto the right of the point $P.$ Then the normal vectors to the lineare $\boldsymbol{n}=(0,\pm 1).$ We choose the vector $\boldsymbol{n}=(0,1).$ Since the source of the ray is $P,$ the vector $\boldsymbol{p}$ is a point on the line. This gives the equation of the line $l:\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=0.$ Thus the line is the locus of points $\boldsymbol{r}\left(x,y\right)$ such that $\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=0.$ This equation reduces to

	$\displaystyle\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}$	$\displaystyle=0$
	$\displaystyle(x-p_{x},y-p_{y})\cdot\left(n_{x},n_{y}\right)$	$\displaystyle=0$
	$\displaystyle\left(x-p_{x}\right)n_{x}+\left(y-p_{y}\right)n_{y}$	$\displaystyle=0$
	$\displaystyle\left(x-p_{x}\right)\left(0\right)+\left(y-p_{y}\right)\left(1\right)$	$\displaystyle=0$
	$\displaystyle y-p_{y}$	$\displaystyle=0.$

For the first constraint which is looking for ray crossings, if theray $l$ crosses the polygon $R$ at the edge connecting the vertices $R_{i}$ and $R_{i+1},$ then the points $R_{i}$ and $R_{i+1}$ willbe on opposite sides of the ray $l,$ hence will have opposite signs.Let $f\left(\boldsymbol{r}\right)=\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=r_{y}-p_{y}.$ Hence $f\left(\boldsymbol{r}_{i}\right)\cdot f\left(\boldsymbol{r}_{i+1}\right)\leq 0$ if the ray crosses the edge $R_{i},R_{i+1}$ , otherwise $f\left(\boldsymbol{r}_{i}\right)\cdot f\left(\boldsymbol{r}_{i+1}\right)>0.$ This test will however test for all edge crossings of the ray to theleft and right of $P$ .

The second constraint of the ray being cast towards the right fromthe point $P$ will be enforced by testing for the vertices with $x-$ componentsgreater or equal to the $x-$ component of the point $P;$ $r_{x_{i}}\geq p_{x}.$ We tested this contraint in two ways:

1.
Assume the points $R_{i}$ and $R_{i+1}$ are the endpoints of theedge where the ray cast from $P$ intersects the polygon $R.$ Theconstraint is satisfied if $P$ is on the left of the line $R_{i}R_{i+1}.$ To test which side of the line the point $P$ lies, define the directionvector of the line $\boldsymbol{d}=\boldsymbol{r}_{i+1}-\boldsymbol{r}_{i}=(d_{x},d_{y}),$ then $\boldsymbol{n}=\left(-d_{y},d_{x}\right)$ or $\boldsymbol{n}=\left(d_{y},-d_{x}\right).$ Since we want to test which side of the line $P$ lies on, we choose $\boldsymbol{n}$ to be the vector that forms an acute angle withthe vector $\left(1,0\right).$ We obtain the equation of the lineto be $l_{2}:\left(\boldsymbol{r}-\boldsymbol{r}_{i}\right)\cdot\boldsymbol{n}=0.$ Note that the vector $\boldsymbol{r}_{i}$ can be replaced by thevector $\boldsymbol{r}_{i+1}$ in the equation of the line $l_{2}.$ Then the point lies to the left of the edge if $l_{2}:\left(\boldsymbol{p}-\boldsymbol{r}_{i}\right)\cdot\boldsymbol{n}\leq 0.$

The vector form of the equation of the line is given as $\boldsymbol{r}=\boldsymbol{r}_{i}+\lambda\boldsymbol{d}.$ This means that

$\displaystyle(x,y)$	$\displaystyle=(x_{i},y_{i})+\lambda(d_{x},d_{y})$	(2)
$\displaystyle x$	$\displaystyle=x_{i}+\lambda d_{x}$	(3)
$\displaystyle y$	$\displaystyle=y_{i}+\lambda d_{y}.$	(4)

Since the ray is a horizontal line passing through the point $P$ we know that the $y-$ component of the point of intersection betweenthe edge and the ray is $p_{y}.$ This gives

\lambda=\dfrac{p_{y}-y_{i}}{d_{y}}.

(5)

We then substitute $\lambda$ to get $x=x_{i}+\lambda d_{x}.$ Theray is then understood to cross the edge if $x\geq p_{x}.$

This helps us to develop the following algorithm:

1.
Input: $p_{x},$ $p_{y},$ vertices of the polygon $R$ as a 2-dimensionalarray.
2.
Find the values $f_{i}=r_{y}^{i}-p_{y}$ where $r_{y}^{i}$ is the $y-$ component of $R_{i}.$
3.
Calculate sgn_chg ${}_{i}=f_{i}\cdot f_{i+1}.$
4.
Find $i$ where sgn_chg ${}_{i}\leq 0.$ This means there is a ray crossingbetween the vertices $R_{i}$ and $R_{i+1}.$
See Also
Optimizing Performance in Numba: Advanced Techniques for Parallelization - GeeksforGeeks Brandon Rohrer on LinkedIn: Numba rule of thumb #5: Use @njit rather than @jit. This tip is already…Brandon Rohrer on LinkedIn: Numba rule of thumb #6: Call Numba-jitted functions once before kicking…Python Tokenizer.sequences_to_matrix Examples, keras.preprocessing.text.Tokenizer.sequences_to_matrix Python Examples

For each sgn_chg_i found in 4, calculate

	$\displaystyle\lambda$	$\displaystyle=\dfrac{p_{y}-r_{y}^{i}}{d_{y}}$		(6)
	$\displaystyle x$	$\displaystyle=r_{x}^{i}+\lambda d_{x}.$		(7)

6.
Find $c=x\geq p_{x}.$
7.
If $c$ is odd, then $R$ contains $P.$

Implementation

The defined algorithm was implemented in python. The basic implementationis shown in Algorithms 1 and 2. In bothimplementations, lines 2-4 perform the border crossing test usingthe sign change algorithm. Once that is done, lines 6-7 in algorithm1 calculate the normal vector $\boldsymbol{n}$ to thepoint pair where the sign change occured, then lines 9-12 determineif the calculated normal is in the same direction as the vector $(1,0).$ If it is not, the vector $-\boldsymbol{n}$ replaces $\boldsymbol{n}.$ Line 14 then calculates the sign of the point $(x,y)$ in relationto each of the lines. Since we are only shooting the rays in one direction,line 16 counts the number of polygon boundaries that are to the rightof the point $(x,y)$ or contain $(x,y).$ In algorithm 2,line 6 calculates the direction vectors of the point pair $(x_{i},y_{i}),(x_{i+1},y_{i+1})$ where the sign change occured, then line 7 calculates the parameter $\lambda$ in equation 2 using equation 6.Line 8 then calculates the corresponding $x$ using equation 7.Line 9 then counts how many of the calculated $x$ are the same orto the right of the input $x.$

⬇

def pip_test_2(x,y,poly):

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

res2 = poly[rec+1]

n_x = res2[:,1] - res[:,1]

n_y = res[:,0] - res2[:,0]

\parfor i in range(len(n_x)):

if n_x[i] < 0:

n_x[i] = -n_x[i]

n_y[i] = -n_y[i]

\parline_pos = (x-res[:,0])*n_x+(y-res[:,1])*n_y

\parres = count_nonzero(line_pos <= 0)

return res

⬇

def pip_test_1(x,y,poly):

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

\pard = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

return res

Simulation

To test the algorithms, we downloaded data from the Google Open Buildingsproject [12] and shapefile for the map of the Republic of Ghana from the Database of Global Administrative Areas [1]. From [12], we wanted data only for Ghana, so we downloaded the following cells: 0fd_buildings,0e3_buildings, 11d_buildings, 103_buildings, and 0ff_buildings as can be seen in Figure 2. The initial data containedapproximately 54 million points. Initial preprocessing to reduce thenumber of points was done by restricting to only points within thebounding box of Ghana downloaded from [1].

Point in polygon calculation using vector geometric methods with application to geospatial data. (2)

Point in polygon calculation using vector geometric methods with application to geospatial data. (3)

This reduced the number of points to 17,097,823. We run our algorithm with these points as the $x,y$ points and the map of Ghana as the polygon. Figure 3shows the polygon with the map of Ghana and the points to be tested.Figures 4 and 5 show theresults of all points for which our algorithm returned True and Falserespectively. It can be seen that the algorithm works very well.

Point in polygon calculation using vector geometric methods with application to geospatial data. (4)

Point in polygon calculation using vector geometric methods with application to geospatial data. (5)

Point in polygon calculation using vector geometric methods with application to geospatial data. (6)

Point in polygon calculation using vector geometric methods with application to geospatial data. (7)

The algorithm was also tested against other algorithms. Our initialsimulation was done on a Dell XPS 15 9570 with an Intel i7-8750H 6core/12 thread processor and 32GB of RAM running a Jupyter notebook withpython 3.10.8. On this setup, we tested our algorithm against thepoint in polygon algorithms used by shapely [4], geopandas[9] and OpenCV [3].Table 1 shows the execution time of our algorithmcompared to the three other algorithms. It can be seen that shapelyand geopandas took in excess of 16 hours to test their point in polygonalgorithms on the 17,097,823 points. In both shapely and geopandasthe contains method of the polygon and the withinmethod of the point were all tested with consistent results. Notethat geopandas uses shapely’s point in polygon algorithm, hence thesimilarity in execution times. It is only the OpenCV point in polygonalgorithm that was faster than our algorithm. The OpenCV library iswritten in C++ and takes advantage of code optimization features.

Point in polygon algorithm	Execution time
Algorithm 1	18min 34s
Algorithm 2	18min 30s
shapely	16h 2min 36s
geopandas	16h 38min 23s
OpenCV	13min 20s

We therefore combined our algorithm with numba, a high performancepython compiler [11] with both parallel and non-parallelprocessing. These can be seen in Algorithms 3 for numbawithout parallelization and 4 for numba with parallelization.Algorithm 2 was also tested on python 3.11 whichis claimed to have faster execution of code based on its built-in optimizations.The results are shown in Table 2. Fromthe table it can be seen that our algorithm lends itself well to parallelization,as the execution time was cut sharply from 18min 30s down to 3min30s when all 12 processor threads are utilized. Even without parallelization,there was an improvement in execution time from 18min 30s to 13min17s which is slightly better than the execution time for the OpenCVimplementation. Again, our algorithm made some speed gains when runin python 3.11. Our algorithm was also run on a Dell Optiplex 9020with 8GB of RAM and an Intel i5-4590 CPU with 4 cores and 4 threads runningpython 3.10.8. The results are shown in Table 4.From the table it can be seen that despite the lower specificationsin processor and memory, the algorithm still performed appreciablywell. Optimizing the algorithm with numba on 4 cores produces runtimes of 8min 59s and in 16min 3s without parallelization.

⬇

@njit()

def numba_pip(pts,poly):

pip = zeros(pts.shape[0])

for i in prange(pts.shape[0]):

x,y = pts[i,0],pts[i,1]

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

d = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

pip[i] = (res return pip

⬇

@njit(parallel=True)

def numbapip_p(pts,poly):

pip = zeros(pts.shape[0])

for i in prange(pts.shape[0]):

x,y = pts[i,0],pts[i,1]

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

d = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

pip[i] = (res return pip

Algorithm	Platform	Execution Time
Algorithm 3	numba without parallelization (python 3.10.8)	13min 17s
Algorithm 4	numba with parallelization (python 3.10.8)	3min 30s
Algorithm 2	python 3.11	16min 25s
OpenCV	python 3.10.8	13min 20s
OpenCV	python 3.11	16min 45s

Tables 3-8and Figures 6-7show the execution times of the various algorithms for different numberof points from 10 to 17,097,823. It can be seen that generally, shapelyand by extension geopandas are the worst performing methods. In Figure 6(a), the steep slope of the shapely algorithm makes it difficult tosee the performance of the other algorithms. The shapely algorithmis therefore removed in the (b) and (c) parts of the figure to properlyshow the performance of the other algorithms. Figure 6(b) shows the performance of the algorithm for logarithmically spacedpoints from 10–10,000,000 and for 17,097,823 points, while the (c) part shows theperformance of the algorithm for the logarithmically spaced points10, 100 and 1000 points which are not visible in (b). The shapelyexecution time is removed from subsequent plots to allow the executiontimes from the other algorithms to be visualized. It can be seen that,consistently the numba implementation with parallelization performsthe fastest, followed by the numba implementation without parallelization,then the OpenCV implementation, then Algorithm 2, followed by Algorithm1. In Figure 7, the executiontimes for the numba implementations are absent because numba was notsupported on the new python 3.11 as at the time of performing thesimulations. In Tables 7-8,we show the execution times of the algorithms on a linearly spacedscale from 10–10,000 in Table 7 andfrom 1,908,647 to 17,097,823 in Table 8.We perform a linear least squares fit of the data in Table 7to find the equation of the form $ax+b=y$ , where $x$ is the numberof points and $y$ is the execution time using the matrix equation

\left[\begin{array}[]{c|c}\boldsymbol{x}&\boldsymbol{1}\end{array}\right]\left[\begin{array}[]{c}a\\b\end{array}\right]=\left[\boldsymbol{y}\right].

(8)

In Equation 8, $\boldsymbol{x}$ is the vector of the number of points thatwere tested and $\boldsymbol{y}$ is the vector of execution times. $\boldsymbol{1}$ is the vector of ones with the same length as $\boldsymbol{x}$ and $\boldsymbol{y}.$ The slopes and intercepts of the least squareslines of best fit are shown in Table 9. Figure 8shows the least squares lines and the original data used to calculatethem. Figure 9 shows the least squareslines overlaid on the execution times in Table 8to verify the linearity of the algorithms. It can be seen that thelargest deviations in execution times come from the numba algorithmwith parallelization which may be due to the calls to the multipleprocessor threads. Tables 10 and 11show the absolute and relative errors respectively in the executiontimes projected by the least squares lines for the points in Table8 and the actual execution times inthe table. We can say that it is possible to use the least squaresfit of the execution times in Table 7with between 10 and 10,000 points to predict the execution times fora larger number of points as contained in Table 8.In our case, the largest errors in absolute terms were in the opencv implementation 44.48 seconds error while the numba implementation with parallelization had the highest relative error. However and the least error was in the numba implementation without parallelization which was 11 seconds.

Algorithm	Algorithm_1	Algorithm_2	opencv	numba	numba_p	shapely
10	0.000674	0.000804	0.000792	0.000455	0.000193	0.039866
100	0.006780	0.007234	0.005199	0.004525	0.001524	0.384787
1000	0.067971	0.064967	0.050449	0.045648	0.014742	3.987965
10000	0.674826	0.660946	0.496855	0.458177	0.131555	37.005583
100000	6.749296	6.394145	5.031192	4.663205	1.239017	348.450866
1000000	67.799494	64.238579	49.027873	45.881565	12.510255	3460.231409
10000000	642.754256	636.792082	467.686868	453.594473	121.763848	34361.098605
17097823	1117.207423	1055.401415	800.032984	788.796931	202.510374	58475.832875

Point in polygon calculation using vector geometric methods with application to geospatial data. (8)

Algorithm	Execution Time
Algorithm 1	23min 57s
Algorithm 2	23min 2s
Algorithm 2 with numba without parallelization	16min 3s
Algorithm 2 with numba with parallelization	8min 59s

Points	Algorithm_1	Algorithm_2	opencv
10	0.000711	0.000719	0.000720
100	0.006478	0.005915	0.005638
1000	0.062355	0.056861	0.055172
10000	0.616584	0.566055	0.548117
100000	6.154858	5.680473	5.431181
1000000	61.618874	56.637895	56.335213
10000000	606.914380	566.261798	588.910343
17097823	1078.766585	1013.177792	1005.053892

Point in polygon calculation using vector geometric methods with application to geospatial data. (9)

Points	Algorithm_1	Algorithm_2	opencv	numba	numba_p
10	0.001023	0.003242	0.001362	0.002346	0.000979
100	0.009219	0.011926	0.006372	0.007208	0.001654
1000	0.086663	0.081737	0.044268	0.056429	0.015674
10000	0.865499	0.818598	0.441505	0.561596	0.157564
100000	8.678028	8.170648	4.419931	5.635208	1.825092
1000000	85.399976	82.598725	44.758200	55.967566	31.157190
10000000	850.251691	815.232475	421.485714	559.662381	309.712137
17097823	1459.900754	1387.815551	720.694399	951.991756	534.855275

Points	Algorithm_2	Algorithm_1	numba	numba_p	opencv	shapely
10	0.000797	0.000678	0.000455	0.000193	0.000885	0.040110
1120	0.085122	0.080887	0.053803	0.011901	0.060541	3.921390
2230	0.160281	0.170801	0.100509	0.021441	0.119104	7.746753
3340	0.218364	0.224381	0.150193	0.030485	0.176672	11.664765
4450	0.289474	0.298039	0.204831	0.047858	0.239670	15.478461
5560	0.366708	0.391248	0.251546	0.049396	0.283920	19.261769
6670	0.424220	0.449986	0.298660	0.058415	0.331706	23.099383
7780	0.497226	0.523772	0.348369	0.070565	0.387975	27.049710
8890	0.572252	0.640621	0.413792	0.081335	0.448136	30.752390
10000	0.639463	0.667802	0.446130	0.090945	0.494594	34.666056

Points	Algorithm_2	Algorithm_1	numba	numba_p	opencv
1908647	120.244296	129.822878	86.105501	17.027383	93.905902
3807294	239.848545	258.961195	171.758836	33.963956	187.311751
5705941	359.452793	388.099512	257.412170	50.900529	280.717599
7604588	479.057041	517.237829	343.065504	67.837102	374.123447
9503235	598.661289	646.376145	428.718838	84.773674	467.529295
11401882	718.265538	775.514462	514.372172	101.710247	560.935143
13300529	837.869786	904.652779	600.025506	118.646820	654.340991
15199176	957.474034	1033.791096	685.678841	135.583393	747.746839
17097823	1077.078283	1162.929413	771.332175	152.519965	841.152687

Algorithm	Slope	Intercept
Algorithm_2	0.000063	0.010103
Algorithm_1	0.000068	0.004402
numba	0.000045	0.001039
numba_p	0.000009	0.001607
opencv	0.000049	0.008094
shapely	0.003462	0.041119

Point in polygon calculation using vector geometric methods with application to geospatial data. (10)

Point in polygon calculation using vector geometric methods with application to geospatial data. (11)

Points	Algorithm_2	Algorithm_1	numba	numba_p	opencv
1908647	0.260000	0.010000	2.070000	5.350000	0.980000
3807294	1.700000	5.630000	4.950000	11.700000	6.710000
5705941	2.800000	12.620000	8.540000	13.730000	11.750000
7604588	3.680000	20.350000	8.090000	11.360000	17.100000
9503235	4.500000	21.730000	11.970000	14.110000	22.460000
11401882	5.150000	26.140000	6.060000	16.990000	27.720000
13300529	2.730000	30.980000	2.080000	23.980000	33.250000
15199176	13.500000	28.640000	4.270000	27.470000	39.010000
17097823	27.740000	33.020000	11.230000	28.680000	44.480000

Points	Algorithm_2	Algorithm_1	numba	numba_p	opencv
1908647	0.000000	0.000000	0.020000	0.240000	0.010000
3807294	0.010000	0.020000	0.030000	0.260000	0.040000
5705941	0.010000	0.030000	0.030000	0.210000	0.040000
7604588	0.010000	0.040000	0.020000	0.140000	0.050000
9503235	0.010000	0.030000	0.030000	0.140000	0.050000
11401882	0.010000	0.030000	0.010000	0.140000	0.050000
13300529	0.000000	0.040000	0.000000	0.170000	0.050000
15199176	0.010000	0.030000	0.010000	0.170000	0.060000
17097823	0.030000	0.030000	0.010000	0.160000	0.060000

Discussion

The implementation of a point in polygon computation using basic vector equations leads to a very useful algorithm which compares very well to the current algorithms that are in use. Moreover, we can scale the algorithm to higher dimensions by considering vector equations in these dimensions and modifying the algorithm accordingly.

We used a linear least squares fit for the values in Table 7 because the values for each point appear to show a linear relationship between the number of points and the execution times of the algorithm. We can infer that the algorithm running time is of linear order, but with an extremely small slope, which allows it to scale very well.

References

[1]Global administrative areas, gadm database of global administrative areas,version 2.0.www.gadm.org, 2012.www.gadm.org.
[2]DAlciatore and Rick Miranda.A winding number and point-in-polygon algorithm.Glaxo Virtual Anatomy Project Research Report, Department ofMechanical Engineering, Colorado State University, 1995.
[3]G.Bradski.The OpenCV Library.Dr. Dobb’s Journal of Software Tools, 2000.
[4]Sean Gillies etal.Shapely: manipulation and analysis of geometric objects, 2007–.
[5]Eric Haines.Graphics gems IV, chapter Point in polygon strategies, page24–46.1994.
[6]Jianqiang Hao, Jianzhi Sun, YiChen, Qiang Cai, and LiTan.Optimal reliable point-in-polygon test and differential codingboolean operations on polygons.Symmetry, 10:477, 2018.
[7]Kai Hormann and Alexander Agathos.The point in polygon problem for arbitrary polygons.Computational Geometry, 20(3):131–144, November 2001.
[8]Chong-Wei Huang and Tian-Yuan Shih.On the complexity of point-in-polygon algorithms.Computers and Geosciences, 23(1):109–118, 1997.
[9]Kelsey Jordahl, JorisVan den Bossche, Martin Fleischmann, Jacob Wasserman,James McBride, Jeffrey Gerard, Jeff Tratner, Matthew Perry, AdrianGarciaBadaracco, Carson Farmer, GeirArne Hjelle, AlanD. Snow, Micah Cochran, SeanGillies, Lucas Culbertson, Matt Bartos, Nick Eubank, maxalbert, AlekseyBilogur, Sergio Rey, Christopher Ren, Dani Arribas-Bel, Leah Wasser,LeviJohn Wolf, Martin Journois, Joshua Wilson, Adam Greenhall, ChrisHoldgraf, Filipe, and François Leblanc.geopandas/geopandas: v0.12.2, 12 2022.
[10]G.Naresh Kumar and Mallikarjun Bangi.An extension to winding number and point-in-polygon algorithm.IFAC-PapersOnLine, 51(1):548–553, 2018.
[11]SiuKwan Lam, Antoine Pitrou, and Stanley Seibert.Numba: A llvm-based python jit compiler.In Proceedings of the Second Workshop on the LLVM CompilerInfrastructure in HPC, pages 1–6, 2015.
[12]WSirko, SKashubin, MRitter, AAnnkah, YBouchareb, YDauphin, DKeysers,MNeumann, MCisse, and JQuinn.Continental-scale building detection from high resolution satelliteimagery.Technical report, 2021.
[13]J.C. Tallack.Introduction to Vector Analysis.Cambridge University Press, 1970.Reprinted in 2009.
[14]Stefan Vander Walt, JohannesL Schönberger, Juan Nunez-Iglesias,François Boulogne, JoshuaD Warner, Neil Yager, Emmanuelle Gouillart,and Tony Yu.scikit-image: image processing in python.PeerJ, 2:e453, 2014.