Point in polygon calculation using vector geometric methods with application to geospatial data. (2024)

Eyram SchwingerDepartment of Mathematics, University of GhanaRalph TwumDepartment of Mathematics, University of GhanaThomas KatsekporDepartment of Mathematics, University of GhanaGladys SchwingerInstitute of Environment and Sanitation Studies, University of Ghana

Abstract

In this work, we designed algorithms for the point in polygon problem based on the ray casting algorithm using equations from vector geometry. The algorithms were implemented using the python programming language. We tested the algorithm against the point in polygon algorithms used by the shapely (and by extension geopandas) library and the OpenCV library using points from the google Open Buildings project. Our algorithm in pure python performed much better than the shapely implementation. It also performed better than the OpenCV implementation when combined with the Numba optimization library. We also performed simulations to verify that our algorithm performance was of the order π’ͺ​(n).π’ͺ𝑛\mathcal{O}(n).

Keywordsβ€” point in polygon, ray casting, vector geometry, shapely, Numba

Introduction

The point in polygon problem is a fundamental geometric problem with a wide range of applications in computer graphics, painting simulation, robotics, geosciences, remote sensing, and geographic information systems. Therefore, efficient numerical algorithms for solving point in polygon computations are essential.

Many algorithms have been proposed (for instance in [7, 5, 10, 2, 8, 6]).Some of the methods include the winding number method [7, 10, 2, 6], which measures the number of times a polygon encloses a point, thesum of angles method [8, 6], where the sum of the interior angles of the polygon is used, and the ray castingmethod [8, 5]. The ray casting method involvesshooting a ray from the point in question in any direction and countinghow many times the ray intersects the boundaries of the polygon.Most of these methods assume the polygon is convex.

Several of the algorithms are computationally expensive and are sensitive to rounding and truncation errors.[6]states that the time complexities of the preprocessing step rangesfrom π’ͺ​(n)π’ͺ𝑛\mathcal{O}(n) to π’ͺ​(n2).π’ͺsuperscript𝑛2\mathcal{O}(n^{2}).In this paper, we design an algorithm for point in polygon computationusing vector geometric methods, with the advantage that the algorithmcan take advantage of processor parallelization.

The work is broken down in the following sections: The method section where we develop our algorithm from the vector geometric definition of the line. In the implementation section, we write the python code for the algorithm developed in methods then in the simulation section we test the algorithm on real world data and against other algorithms.

Method

Our proposed algorithm is based on the ray casting method. Suppose we wantto determine whether a point P​(x,y)𝑃π‘₯𝑦P(x,y) lies within a closed polygonR𝑅R with vertices R0,R1,…,Rnsubscript𝑅0subscript𝑅1…subscript𝑅𝑛R_{0},R_{1},\ldots,R_{n}, where R0=Rn.subscript𝑅0subscript𝑅𝑛R_{0}=R_{n}.In the ray casting method, a ray l𝑙l is drawn moving out from thepoint P𝑃P in a specific direction. The point P𝑃P lies inside the polygonR𝑅R if l𝑙l intersects the edges of R𝑅R an odd number of times.In vector geometry, one way of writing out the equation of a lineis the normal form. Let 𝒏𝒏\boldsymbol{n} be a normal vector to theline. Note that a line in the 2D plane has two normals going in oppositedirections that can be denoted 𝒏1subscript𝒏1\boldsymbol{n}_{1} and 𝒏2subscript𝒏2\boldsymbol{n}_{2}with 𝒏1=βˆ’π’2,subscript𝒏1subscript𝒏2\boldsymbol{n}_{1}=-\boldsymbol{n}_{2}, so 𝒏𝒏\boldsymbol{n}can be either 𝒏1subscript𝒏1\boldsymbol{n}_{1} or 𝒏2.subscript𝒏2\boldsymbol{n}_{2}. Assuming𝒂𝒂\boldsymbol{a} is a known point on the line and 𝒓𝒓\boldsymbol{r}is any point on the line, the equation of the line can be writtenas

(π’“βˆ’π’‚)⋅𝒏=0.⋅𝒓𝒂𝒏0\left(\boldsymbol{r}-\boldsymbol{a}\right)\cdot\boldsymbol{n}=0.(1)
Point in polygon calculation using vector geometric methods with application to geospatial data. (1)

Every point on the line satisfies equation 1.Points that do not lie on the line do not satisfy the equation, hence(π’“βˆ’π’‚)⋅𝒏≠0⋅𝒓𝒂𝒏0\left(\boldsymbol{r}-\boldsymbol{a}\right)\cdot\boldsymbol{n}\neq 0with points on opposite sides of the line having different signs.This is illustrated in Figure 1. The point C𝐢C is onone side of the line while the point D𝐷D is on the opposite side.Using the concept of perpendicular distance, the position vectorsof the points C𝐢C and D𝐷D can be written as 𝒄=𝒆+λ​𝒏^1π’„π’†πœ†subscriptbold-^𝒏1\boldsymbol{c}=\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}_{1}and 𝒅=𝒇+μ​𝒏^2,π’…π’‡πœ‡subscriptbold-^𝒏2\boldsymbol{d}=\boldsymbol{f}+\mu\boldsymbol{\hat{n}}_{2},where 𝒏^^𝒏\hat{\boldsymbol{n}} represents the unit vector in the directionof 𝒏,𝒏\boldsymbol{n}, and Ξ»πœ†\lambda and ΞΌπœ‡\mu are positive constants.Let 𝒏=𝒏1.𝒏subscript𝒏1\boldsymbol{n}=\boldsymbol{n}_{1}. Then 𝒄=𝒆+λ​𝒏^π’„π’†πœ†bold-^𝒏\boldsymbol{c}=\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}and 𝒅=π’‡βˆ’ΞΌβ€‹π’^.π’…π’‡πœ‡bold-^𝒏\boldsymbol{d}=\boldsymbol{f}-\mu\boldsymbol{\hat{n}}. Substituting𝒄𝒄\boldsymbol{c} and 𝒅𝒅\boldsymbol{d} into the equation of the linegives

(π’„βˆ’π’‚)⋅𝒏⋅𝒄𝒂𝒏\displaystyle\left(\boldsymbol{c}-\boldsymbol{a}\right)\cdot\boldsymbol{n}=(𝒆+λ​𝒏^βˆ’π’‚)⋅𝒏absentβ‹…π’†πœ†bold-^𝒏𝒂𝒏\displaystyle=\left(\boldsymbol{e}+\lambda\boldsymbol{\hat{n}}-\boldsymbol{a}\right)\cdot\boldsymbol{n}
=(π’†βˆ’π’‚)⋅𝒏+λ​𝒏^⋅𝒏absentβ‹…π’†π’‚π’β‹…πœ†bold-^𝒏𝒏\displaystyle=\left(\boldsymbol{e}-\boldsymbol{a}\right)\cdot\boldsymbol{n}+\lambda\boldsymbol{\hat{n}}\cdot\boldsymbol{n}
=λ​𝒏^⋅𝒏.absentβ‹…πœ†bold-^𝒏𝒏\displaystyle=\lambda\boldsymbol{\hat{n}}\cdot\boldsymbol{n}.

Now (π’†βˆ’π’‚)⋅𝒏=0⋅𝒆𝒂𝒏0\left(\boldsymbol{e}-\boldsymbol{a}\right)\cdot\boldsymbol{n}=0,since the points A𝐴A and E𝐸E lie on the line. Similarly

(dβˆ’π’‚)⋅𝒏⋅𝑑𝒂𝒏\displaystyle\left(d-\boldsymbol{a}\right)\cdot\boldsymbol{n}=(π’‡βˆ’ΞΌβ€‹π’^βˆ’π’‚)⋅𝒏absentβ‹…π’‡πœ‡bold-^𝒏𝒂𝒏\displaystyle=\left(\boldsymbol{f}-\mu\boldsymbol{\hat{n}}-\boldsymbol{a}\right)\cdot\boldsymbol{n}
=(π’‡βˆ’π’‚)β‹…π’βˆ’ΞΌβ€‹π’^⋅𝒏absentβ‹…π’‡π’‚π’β‹…πœ‡bold-^𝒏𝒏\displaystyle=\left(\boldsymbol{f}-\boldsymbol{a}\right)\cdot\boldsymbol{n}-\mu\boldsymbol{\hat{n}}\cdot\boldsymbol{n}
=βˆ’ΞΌβ€‹π’^⋅𝒏.absentβ‹…πœ‡bold-^𝒏𝒏\displaystyle=-\mu\boldsymbol{\hat{n}}\cdot\boldsymbol{n}.

In our work, we choose to shoot a ray from the point P𝑃P with directionvector 𝒅=(1,0).𝒅10\boldsymbol{d}=(1,0). This is a ray shot horizontally andto the right of the point P.𝑃P. Then the normal vectors to the lineare 𝒏=(0,Β±1).𝒏0plus-or-minus1\boldsymbol{n}=(0,\pm 1). We choose the vector 𝒏=(0,1).𝒏01\boldsymbol{n}=(0,1).Since the source of the ray is P,𝑃P, the vector 𝒑𝒑\boldsymbol{p}is a point on the line. This gives the equation of the line l:(π’“βˆ’π’‘)⋅𝒏=0.:𝑙⋅𝒓𝒑𝒏0l:\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=0.Thus the line is the locus of points 𝒓​(x,y)𝒓π‘₯𝑦\boldsymbol{r}\left(x,y\right)such that (π’“βˆ’π’‘)⋅𝒏=0.⋅𝒓𝒑𝒏0\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=0.This equation reduces to

(π’“βˆ’π’‘)⋅𝒏⋅𝒓𝒑𝒏\displaystyle\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=0absent0\displaystyle=0
(xβˆ’px,yβˆ’py)β‹…(nx,ny)β‹…π‘₯subscript𝑝π‘₯𝑦subscript𝑝𝑦subscript𝑛π‘₯subscript𝑛𝑦\displaystyle(x-p_{x},y-p_{y})\cdot\left(n_{x},n_{y}\right)=0absent0\displaystyle=0
(xβˆ’px)​nx+(yβˆ’py)​nyπ‘₯subscript𝑝π‘₯subscript𝑛π‘₯𝑦subscript𝑝𝑦subscript𝑛𝑦\displaystyle\left(x-p_{x}\right)n_{x}+\left(y-p_{y}\right)n_{y}=0absent0\displaystyle=0
(xβˆ’px)​(0)+(yβˆ’py)​(1)π‘₯subscript𝑝π‘₯0𝑦subscript𝑝𝑦1\displaystyle\left(x-p_{x}\right)\left(0\right)+\left(y-p_{y}\right)\left(1\right)=0absent0\displaystyle=0
yβˆ’py𝑦subscript𝑝𝑦\displaystyle y-p_{y}=0.absent0\displaystyle=0.

For the first constraint which is looking for ray crossings, if theray l𝑙l crosses the polygon R𝑅R at the edge connecting the verticesRisubscript𝑅𝑖R_{i} and Ri+1,subscript𝑅𝑖1R_{i+1}, then the points Risubscript𝑅𝑖R_{i} and Ri+1subscript𝑅𝑖1R_{i+1} willbe on opposite sides of the ray l,𝑙l, hence will have opposite signs.Let f​(𝒓)=(π’“βˆ’π’‘)⋅𝒏=ryβˆ’py.𝑓𝒓⋅𝒓𝒑𝒏subscriptπ‘Ÿπ‘¦subscript𝑝𝑦f\left(\boldsymbol{r}\right)=\left(\boldsymbol{r}-\boldsymbol{p}\right)\cdot\boldsymbol{n}=r_{y}-p_{y}.Hence f​(𝒓i)β‹…f​(𝒓i+1)≀0⋅𝑓subscript𝒓𝑖𝑓subscript𝒓𝑖10f\left(\boldsymbol{r}_{i}\right)\cdot f\left(\boldsymbol{r}_{i+1}\right)\leq 0if the ray crosses the edge Ri,Ri+1subscript𝑅𝑖subscript𝑅𝑖1R_{i},R_{i+1}, otherwise f​(𝒓i)β‹…f​(𝒓i+1)>0.⋅𝑓subscript𝒓𝑖𝑓subscript𝒓𝑖10f\left(\boldsymbol{r}_{i}\right)\cdot f\left(\boldsymbol{r}_{i+1}\right)>0.This test will however test for all edge crossings of the ray to theleft and right of P𝑃P.

The second constraint of the ray being cast towards the right fromthe point P𝑃P will be enforced by testing for the vertices with xβˆ’limit-fromπ‘₯x-componentsgreater or equal to the xβˆ’limit-fromπ‘₯x-component of the point P;𝑃P; rxiβ‰₯px.subscriptπ‘Ÿsubscriptπ‘₯𝑖subscript𝑝π‘₯r_{x_{i}}\geq p_{x}.We tested this contraint in two ways:

  1. 1.

    Assume the points Risubscript𝑅𝑖R_{i} and Ri+1subscript𝑅𝑖1R_{i+1} are the endpoints of theedge where the ray cast from P𝑃P intersects the polygon R.𝑅R. Theconstraint is satisfied if P𝑃P is on the left of the line Ri​Ri+1.subscript𝑅𝑖subscript𝑅𝑖1R_{i}R_{i+1}.To test which side of the line the point P𝑃P lies, define the directionvector of the line 𝒅=𝒓i+1βˆ’π’“i=(dx,dy),𝒅subscript𝒓𝑖1subscript𝒓𝑖subscript𝑑π‘₯subscript𝑑𝑦\boldsymbol{d}=\boldsymbol{r}_{i+1}-\boldsymbol{r}_{i}=(d_{x},d_{y}),then 𝒏=(βˆ’dy,dx)𝒏subscript𝑑𝑦subscript𝑑π‘₯\boldsymbol{n}=\left(-d_{y},d_{x}\right) or 𝒏=(dy,βˆ’dx).𝒏subscript𝑑𝑦subscript𝑑π‘₯\boldsymbol{n}=\left(d_{y},-d_{x}\right).Since we want to test which side of the line P𝑃P lies on, we choose𝒏𝒏\boldsymbol{n} to be the vector that forms an acute angle withthe vector (1,0).10\left(1,0\right). We obtain the equation of the lineto be l2:(π’“βˆ’π’“i)⋅𝒏=0.:subscript𝑙2⋅𝒓subscript𝒓𝑖𝒏0l_{2}:\left(\boldsymbol{r}-\boldsymbol{r}_{i}\right)\cdot\boldsymbol{n}=0.Note that the vector 𝒓isubscript𝒓𝑖\boldsymbol{r}_{i} can be replaced by thevector 𝒓i+1subscript𝒓𝑖1\boldsymbol{r}_{i+1} in the equation of the line l2.subscript𝑙2l_{2}.Then the point lies to the left of the edge if l2:(π’‘βˆ’π’“i)⋅𝒏≀0.:subscript𝑙2⋅𝒑subscript𝒓𝑖𝒏0l_{2}:\left(\boldsymbol{p}-\boldsymbol{r}_{i}\right)\cdot\boldsymbol{n}\leq 0.

  2. 2.

    The vector form of the equation of the line is given as 𝒓=𝒓i+λ​𝒅.𝒓subscriptπ’“π‘–πœ†π’…\boldsymbol{r}=\boldsymbol{r}_{i}+\lambda\boldsymbol{d}.This means that

    (x,y)π‘₯𝑦\displaystyle(x,y)=(xi,yi)+λ​(dx,dy)absentsubscriptπ‘₯𝑖subscriptπ‘¦π‘–πœ†subscript𝑑π‘₯subscript𝑑𝑦\displaystyle=(x_{i},y_{i})+\lambda(d_{x},d_{y})(2)
    xπ‘₯\displaystyle x=xi+λ​dxabsentsubscriptπ‘₯π‘–πœ†subscript𝑑π‘₯\displaystyle=x_{i}+\lambda d_{x}(3)
    y𝑦\displaystyle y=yi+λ​dy.absentsubscriptπ‘¦π‘–πœ†subscript𝑑𝑦\displaystyle=y_{i}+\lambda d_{y}.(4)

    Since the ray is a horizontal line passing through the point P𝑃Pwe know that the yβˆ’limit-from𝑦y-component of the point of intersection betweenthe edge and the ray is py.subscript𝑝𝑦p_{y}. This gives

    Ξ»=pyβˆ’yidy.πœ†subscript𝑝𝑦subscript𝑦𝑖subscript𝑑𝑦\lambda=\dfrac{p_{y}-y_{i}}{d_{y}}.(5)

    We then substitute Ξ»πœ†\lambda to get x=xi+λ​dx.π‘₯subscriptπ‘₯π‘–πœ†subscript𝑑π‘₯x=x_{i}+\lambda d_{x}. Theray is then understood to cross the edge if xβ‰₯px.π‘₯subscript𝑝π‘₯x\geq p_{x}.

This helps us to develop the following algorithm:

  1. 1.

    Input: px,subscript𝑝π‘₯p_{x}, py,subscript𝑝𝑦p_{y}, vertices of the polygon R𝑅R as a 2-dimensionalarray.

  2. 2.

    Find the values fi=ryiβˆ’pysubscript𝑓𝑖superscriptsubscriptπ‘Ÿπ‘¦π‘–subscript𝑝𝑦f_{i}=r_{y}^{i}-p_{y} where ryisuperscriptsubscriptπ‘Ÿπ‘¦π‘–r_{y}^{i} is theyβˆ’limit-from𝑦y-component of Ri.subscript𝑅𝑖R_{i}.

  3. 3.

    Calculate sgn_chg=ifiβ‹…fi+1.{}_{i}=f_{i}\cdot f_{i+1}.

  4. 4.

    Find i𝑖i where sgn_chg≀i0.{}_{i}\leq 0. This means there is a ray crossingbetween the vertices Risubscript𝑅𝑖R_{i} and Ri+1.subscript𝑅𝑖1R_{i+1}.

  5. 5.

    For each sgn_chgi found in 4, calculate

    Ξ»πœ†\displaystyle\lambda=pyβˆ’ryidyabsentsubscript𝑝𝑦superscriptsubscriptπ‘Ÿπ‘¦π‘–subscript𝑑𝑦\displaystyle=\dfrac{p_{y}-r_{y}^{i}}{d_{y}}(6)
    xπ‘₯\displaystyle x=rxi+λ​dx.absentsuperscriptsubscriptπ‘Ÿπ‘₯π‘–πœ†subscript𝑑π‘₯\displaystyle=r_{x}^{i}+\lambda d_{x}.(7)
  6. 6.

    Find c=xβ‰₯px.𝑐π‘₯subscript𝑝π‘₯c=x\geq p_{x}.

  7. 7.

    If c𝑐c is odd, then R𝑅R contains P.𝑃P.

Implementation

The defined algorithm was implemented in python. The basic implementationis shown in Algorithms 1 and 2. In bothimplementations, lines 2-4 perform the border crossing test usingthe sign change algorithm. Once that is done, lines 6-7 in algorithm1 calculate the normal vector 𝒏𝒏\boldsymbol{n} to thepoint pair where the sign change occured, then lines 9-12 determineif the calculated normal is in the same direction as the vector (1,0).10(1,0).If it is not, the vector βˆ’π’π’-\boldsymbol{n} replaces 𝒏.𝒏\boldsymbol{n}.Line 14 then calculates the sign of the point (x,y)π‘₯𝑦(x,y) in relationto each of the lines. Since we are only shooting the rays in one direction,line 16 counts the number of polygon boundaries that are to the rightof the point (x,y)π‘₯𝑦(x,y) or contain (x,y).π‘₯𝑦(x,y). In algorithm 2,line 6 calculates the direction vectors of the point pair (xi,yi),(xi+1,yi+1)subscriptπ‘₯𝑖subscript𝑦𝑖subscriptπ‘₯𝑖1subscript𝑦𝑖1(x_{i},y_{i}),(x_{i+1},y_{i+1})where the sign change occured, then line 7 calculates the parameterΞ»πœ†\lambda in equation 2 using equation 6.Line 8 then calculates the corresponding xπ‘₯x using equation 7.Line 9 then counts how many of the calculated xπ‘₯x are the same orto the right of the input x.π‘₯x.

⬇

def pip_test_2(x,y,poly):

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

res2 = poly[rec+1]

n_x = res2[:,1] - res[:,1]

n_y = res[:,0] - res2[:,0]

\parfor i in range(len(n_x)):

if n_x[i] < 0:

n_x[i] = -n_x[i]

n_y[i] = -n_y[i]

\parline_pos = (x-res[:,0])*n_x+(y-res[:,1])*n_y

\parres = count_nonzero(line_pos <= 0)

return res

⬇

def pip_test_1(x,y,poly):

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

\pard = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

return res

Simulation

To test the algorithms, we downloaded data from the Google Open Buildingsproject [12] and shapefile for the map of the Republic of Ghana from the Database of Global Administrative Areas [1]. From [12], we wanted data only for Ghana, so we downloaded the following cells: 0fd_buildings,0e3_buildings, 11d_buildings, 103_buildings, and 0ff_buildings as can be seen in Figure 2. The initial data containedapproximately 54 million points. Initial preprocessing to reduce thenumber of points was done by restricting to only points within thebounding box of Ghana downloaded from [1].

Point in polygon calculation using vector geometric methods with application to geospatial data. (2)

Point in polygon calculation using vector geometric methods with application to geospatial data. (3)

This reduced the number of points to 17,097,823. We run our algorithm with these points as the x,yπ‘₯𝑦x,y points and the map of Ghana as the polygon. Figure 3shows the polygon with the map of Ghana and the points to be tested.Figures 4 and 5 show theresults of all points for which our algorithm returned True and Falserespectively. It can be seen that the algorithm works very well.

Point in polygon calculation using vector geometric methods with application to geospatial data. (4)

Point in polygon calculation using vector geometric methods with application to geospatial data. (5)

Point in polygon calculation using vector geometric methods with application to geospatial data. (6)

Point in polygon calculation using vector geometric methods with application to geospatial data. (7)

The algorithm was also tested against other algorithms. Our initialsimulation was done on a Dell XPS 15 9570 with an Intel i7-8750H 6core/12 thread processor and 32GB of RAM running a Jupyter notebook withpython 3.10.8. On this setup, we tested our algorithm against thepoint in polygon algorithms used by shapely [4], geopandas[9] and OpenCV [3].Table 1 shows the execution time of our algorithmcompared to the three other algorithms. It can be seen that shapelyand geopandas took in excess of 16 hours to test their point in polygonalgorithms on the 17,097,823 points. In both shapely and geopandasthe contains method of the polygon and the withinmethod of the point were all tested with consistent results. Notethat geopandas uses shapely’s point in polygon algorithm, hence thesimilarity in execution times. It is only the OpenCV point in polygonalgorithm that was faster than our algorithm. The OpenCV library iswritten in C++ and takes advantage of code optimization features.

Point in polygon algorithmExecution time
Algorithm 118min 34s
Algorithm 218min 30s
shapely16h 2min 36s
geopandas16h 38min 23s
OpenCV13min 20s

We therefore combined our algorithm with numba, a high performancepython compiler [11] with both parallel and non-parallelprocessing. These can be seen in Algorithms 3 for numbawithout parallelization and 4 for numba with parallelization.Algorithm 2 was also tested on python 3.11 whichis claimed to have faster execution of code based on its built-in optimizations.The results are shown in Table 2. Fromthe table it can be seen that our algorithm lends itself well to parallelization,as the execution time was cut sharply from 18min 30s down to 3min30s when all 12 processor threads are utilized. Even without parallelization,there was an improvement in execution time from 18min 30s to 13min17s which is slightly better than the execution time for the OpenCVimplementation. Again, our algorithm made some speed gains when runin python 3.11. Our algorithm was also run on a Dell Optiplex 9020with 8GB of RAM and an Intel i5-4590 CPU with 4 cores and 4 threads runningpython 3.10.8. The results are shown in Table 4.From the table it can be seen that despite the lower specificationsin processor and memory, the algorithm still performed appreciablywell. Optimizing the algorithm with numba on 4 cores produces runtimes of 8min 59s and in 16min 3s without parallelization.

⬇

@njit()

def numba_pip(pts,poly):

pip = zeros(pts.shape[0])

for i in prange(pts.shape[0]):

x,y = pts[i,0],pts[i,1]

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

d = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

pip[i] = (res return pip

⬇

@njit(parallel=True)

def numbapip_p(pts,poly):

pip = zeros(pts.shape[0])

for i in prange(pts.shape[0]):

x,y = pts[i,0],pts[i,1]

rec = (poly[:,1]) - y

rec = where((rec[1:]*rec[:-1]<=0))[0]

res = poly[rec]

d = poly[rec+1] - res

l = (y - res[:,1])/d[:,1]

x_new = res[:,0] + l*d[:,0]

res = count_nonzero(x_new > x)

pip[i] = (res return pip

AlgorithmPlatformExecution Time
Algorithm 3numba without parallelization (python 3.10.8)13min 17s
Algorithm 4numba with parallelization (python 3.10.8)3min 30s
Algorithm 2python 3.1116min 25s
OpenCVpython 3.10.813min 20s
OpenCVpython 3.1116min 45s

Tables 3-8and Figures 6-7show the execution times of the various algorithms for different numberof points from 10 to 17,097,823. It can be seen that generally, shapelyand by extension geopandas are the worst performing methods. In Figure 6(a), the steep slope of the shapely algorithm makes it difficult tosee the performance of the other algorithms. The shapely algorithmis therefore removed in the (b) and (c) parts of the figure to properlyshow the performance of the other algorithms. Figure 6(b) shows the performance of the algorithm for logarithmically spacedpoints from 10–10,000,000 and for 17,097,823 points, while the (c) part shows theperformance of the algorithm for the logarithmically spaced points10, 100 and 1000 points which are not visible in (b). The shapelyexecution time is removed from subsequent plots to allow the executiontimes from the other algorithms to be visualized. It can be seen that,consistently the numba implementation with parallelization performsthe fastest, followed by the numba implementation without parallelization,then the OpenCV implementation, then Algorithm 2, followed by Algorithm1. In Figure 7, the executiontimes for the numba implementations are absent because numba was notsupported on the new python 3.11 as at the time of performing thesimulations. In Tables 7-8,we show the execution times of the algorithms on a linearly spacedscale from 10–10,000 in Table 7 andfrom 1,908,647 to 17,097,823 in Table 8.We perform a linear least squares fit of the data in Table 7to find the equation of the form a​x+b=yπ‘Žπ‘₯𝑏𝑦ax+b=y, where xπ‘₯x is the numberof points and y𝑦y is the execution time using the matrix equation

[π’™πŸ]​[ab]=[π’š].delimited-[]𝒙1delimited-[]π‘Žπ‘delimited-[]π’š\left[\begin{array}[]{c|c}\boldsymbol{x}&\boldsymbol{1}\end{array}\right]\left[\begin{array}[]{c}a\\b\end{array}\right]=\left[\boldsymbol{y}\right].(8)

In Equation 8, 𝒙𝒙\boldsymbol{x} is the vector of the number of points thatwere tested and π’šπ’š\boldsymbol{y} is the vector of execution times.𝟏1\boldsymbol{1} is the vector of ones with the same length as 𝒙𝒙\boldsymbol{x}and π’š.π’š\boldsymbol{y}. The slopes and intercepts of the least squareslines of best fit are shown in Table 9. Figure 8shows the least squares lines and the original data used to calculatethem. Figure 9 shows the least squareslines overlaid on the execution times in Table 8to verify the linearity of the algorithms. It can be seen that thelargest deviations in execution times come from the numba algorithmwith parallelization which may be due to the calls to the multipleprocessor threads. Tables 10 and 11show the absolute and relative errors respectively in the executiontimes projected by the least squares lines for the points in Table8 and the actual execution times inthe table. We can say that it is possible to use the least squaresfit of the execution times in Table 7with between 10 and 10,000 points to predict the execution times fora larger number of points as contained in Table 8.In our case, the largest errors in absolute terms were in the opencv implementation 44.48 seconds error while the numba implementation with parallelization had the highest relative error. However and the least error was in the numba implementation without parallelization which was 11 seconds.

AlgorithmAlgorithm_1Algorithm_2opencvnumbanumba_pshapely
100.0006740.0008040.0007920.0004550.0001930.039866
1000.0067800.0072340.0051990.0045250.0015240.384787
10000.0679710.0649670.0504490.0456480.0147423.987965
100000.6748260.6609460.4968550.4581770.13155537.005583
1000006.7492966.3941455.0311924.6632051.239017348.450866
100000067.79949464.23857949.02787345.88156512.5102553460.231409
10000000642.754256636.792082467.686868453.594473121.76384834361.098605
170978231117.2074231055.401415800.032984788.796931202.51037458475.832875
Point in polygon calculation using vector geometric methods with application to geospatial data. (8)
AlgorithmExecution Time
Algorithm 123min 57s
Algorithm 223min 2s
Algorithm 2 with numba without parallelization16min 3s
Algorithm 2 with numba with parallelization8min 59s
PointsAlgorithm_1Algorithm_2opencv
100.0007110.0007190.000720
1000.0064780.0059150.005638
10000.0623550.0568610.055172
100000.6165840.5660550.548117
1000006.1548585.6804735.431181
100000061.61887456.63789556.335213
10000000606.914380566.261798588.910343
170978231078.7665851013.1777921005.053892
Point in polygon calculation using vector geometric methods with application to geospatial data. (9)
PointsAlgorithm_1Algorithm_2opencvnumbanumba_p
100.0010230.0032420.0013620.0023460.000979
1000.0092190.0119260.0063720.0072080.001654
10000.0866630.0817370.0442680.0564290.015674
100000.8654990.8185980.4415050.5615960.157564
1000008.6780288.1706484.4199315.6352081.825092
100000085.39997682.59872544.75820055.96756631.157190
10000000850.251691815.232475421.485714559.662381309.712137
170978231459.9007541387.815551720.694399951.991756534.855275
PointsAlgorithm_2Algorithm_1numbanumba_popencvshapely
100.0007970.0006780.0004550.0001930.0008850.040110
11200.0851220.0808870.0538030.0119010.0605413.921390
22300.1602810.1708010.1005090.0214410.1191047.746753
33400.2183640.2243810.1501930.0304850.17667211.664765
44500.2894740.2980390.2048310.0478580.23967015.478461
55600.3667080.3912480.2515460.0493960.28392019.261769
66700.4242200.4499860.2986600.0584150.33170623.099383
77800.4972260.5237720.3483690.0705650.38797527.049710
88900.5722520.6406210.4137920.0813350.44813630.752390
100000.6394630.6678020.4461300.0909450.49459434.666056
PointsAlgorithm_2Algorithm_1numbanumba_popencv
1908647120.244296129.82287886.10550117.02738393.905902
3807294239.848545258.961195171.75883633.963956187.311751
5705941359.452793388.099512257.41217050.900529280.717599
7604588479.057041517.237829343.06550467.837102374.123447
9503235598.661289646.376145428.71883884.773674467.529295
11401882718.265538775.514462514.372172101.710247560.935143
13300529837.869786904.652779600.025506118.646820654.340991
15199176957.4740341033.791096685.678841135.583393747.746839
170978231077.0782831162.929413771.332175152.519965841.152687
AlgorithmSlopeIntercept
Algorithm_20.0000630.010103
Algorithm_10.0000680.004402
numba0.0000450.001039
numba_p0.0000090.001607
opencv0.0000490.008094
shapely0.0034620.041119
Point in polygon calculation using vector geometric methods with application to geospatial data. (10)
Point in polygon calculation using vector geometric methods with application to geospatial data. (11)
PointsAlgorithm_2Algorithm_1numbanumba_popencv
19086470.2600000.0100002.0700005.3500000.980000
38072941.7000005.6300004.95000011.7000006.710000
57059412.80000012.6200008.54000013.73000011.750000
76045883.68000020.3500008.09000011.36000017.100000
95032354.50000021.73000011.97000014.11000022.460000
114018825.15000026.1400006.06000016.99000027.720000
133005292.73000030.9800002.08000023.98000033.250000
1519917613.50000028.6400004.27000027.47000039.010000
1709782327.74000033.02000011.23000028.68000044.480000
PointsAlgorithm_2Algorithm_1numbanumba_popencv
19086470.0000000.0000000.0200000.2400000.010000
38072940.0100000.0200000.0300000.2600000.040000
57059410.0100000.0300000.0300000.2100000.040000
76045880.0100000.0400000.0200000.1400000.050000
95032350.0100000.0300000.0300000.1400000.050000
114018820.0100000.0300000.0100000.1400000.050000
133005290.0000000.0400000.0000000.1700000.050000
151991760.0100000.0300000.0100000.1700000.060000
170978230.0300000.0300000.0100000.1600000.060000

Discussion

The implementation of a point in polygon computation using basic vector equations leads to a very useful algorithm which compares very well to the current algorithms that are in use. Moreover, we can scale the algorithm to higher dimensions by considering vector equations in these dimensions and modifying the algorithm accordingly.

We used a linear least squares fit for the values in Table 7 because the values for each point appear to show a linear relationship between the number of points and the execution times of the algorithm. We can infer that the algorithm running time is of linear order, but with an extremely small slope, which allows it to scale very well.

References

  • [1]Global administrative areas, gadm database of global administrative areas,version 2.0.www.gadm.org, 2012.www.gadm.org.
  • [2]DAlciatore and Rick Miranda.A winding number and point-in-polygon algorithm.Glaxo Virtual Anatomy Project Research Report, Department ofMechanical Engineering, Colorado State University, 1995.
  • [3]G.Bradski.The OpenCV Library.Dr. Dobb’s Journal of Software Tools, 2000.
  • [4]Sean Gillies etal.Shapely: manipulation and analysis of geometric objects, 2007–.
  • [5]Eric Haines.Graphics gems IV, chapter Point in polygon strategies, page24–46.1994.
  • [6]Jianqiang Hao, Jianzhi Sun, YiChen, Qiang Cai, and LiTan.Optimal reliable point-in-polygon test and differential codingboolean operations on polygons.Symmetry, 10:477, 2018.
  • [7]Kai Hormann and Alexander Agathos.The point in polygon problem for arbitrary polygons.Computational Geometry, 20(3):131–144, November 2001.
  • [8]Chong-Wei Huang and Tian-Yuan Shih.On the complexity of point-in-polygon algorithms.Computers and Geosciences, 23(1):109–118, 1997.
  • [9]Kelsey Jordahl, JorisVan den Bossche, Martin Fleischmann, Jacob Wasserman,James McBride, Jeffrey Gerard, Jeff Tratner, Matthew Perry, AdrianGarciaBadaracco, Carson Farmer, GeirArne Hjelle, AlanD. Snow, Micah Cochran, SeanGillies, Lucas Culbertson, Matt Bartos, Nick Eubank, maxalbert, AlekseyBilogur, Sergio Rey, Christopher Ren, Dani Arribas-Bel, Leah Wasser,LeviJohn Wolf, Martin Journois, Joshua Wilson, Adam Greenhall, ChrisHoldgraf, Filipe, and FranΓ§ois Leblanc.geopandas/geopandas: v0.12.2, 12 2022.
  • [10]G.Naresh Kumar and Mallikarjun Bangi.An extension to winding number and point-in-polygon algorithm.IFAC-PapersOnLine, 51(1):548–553, 2018.
  • [11]SiuKwan Lam, Antoine Pitrou, and Stanley Seibert.Numba: A llvm-based python jit compiler.In Proceedings of the Second Workshop on the LLVM CompilerInfrastructure in HPC, pages 1–6, 2015.
  • [12]WSirko, SKashubin, MRitter, AAnnkah, YBouchareb, YDauphin, DKeysers,MNeumann, MCisse, and JQuinn.Continental-scale building detection from high resolution satelliteimagery.Technical report, 2021.
  • [13]J.C. Tallack.Introduction to Vector Analysis.Cambridge University Press, 1970.Reprinted in 2009.
  • [14]Stefan Vander Walt, JohannesL SchΓΆnberger, Juan Nunez-Iglesias,FranΓ§ois Boulogne, JoshuaD Warner, Neil Yager, Emmanuelle Gouillart,and Tony Yu.scikit-image: image processing in python.PeerJ, 2:e453, 2014.
Point in polygon calculation using vector geometric methods with application to geospatial data. (2024)
Top Articles
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6043

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.