Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Fixing invalid geometry #652

Closed
dr-jts opened this issue Dec 13, 2020 · 29 comments
Closed

RFC: Fixing invalid geometry #652

dr-jts opened this issue Dec 13, 2020 · 29 comments
Assignees

Comments

@dr-jts
Copy link
Contributor

dr-jts commented Dec 13, 2020

It is desirable to have a way to convert an invalid geometry into a valid one that preserves the character of the input (to some extent). This allows the fixed geometry to be used in JTS operations.

See GeometryFixer prototype code.

Goals

  • Allow converting any geometry to one which is valid, preserving as much as possible of the original appearance

Non-Goals

  • Preserving all vertices in input - some vertices may be dropped to produce a "natural" repair
  • Precision reduction - can be done subsequent to processing
  • Normalization - can be done subsequent to processing
  • Simplification - can be done before processing

Motivating Cases

Descriptions use following terminology: "touch" means touches in a point, "adjacent" means touches along a line.

See XML file for example data test cases.

  • Bow-tie polygon
  • Bow-tie polygon with multiple crossings
  • Checkerboard polygon
  • Polygon with zero area and collapses to a line
  • Polygon with zero area and collapses to a point
  • Polygon with portion of the shell boundary which collapses to a line (internal or external)
  • Polygon with portion of the shell boundary which collapses to a line and encloses area
  • Polygon which overlaps itself
  • Polygon with linework separating it into two adjacent regions (i.e cutting it in two)
  • Polygon with collapsed linework that encloses area (i.e. with an inverted boundary that is adjacent)
  • Polygon with overlapping holes (intersecting in points)
  • Polygon with overlapping holes (intersecting along edges)
  • Polygon with adjacent holes
  • Polygon with holes forming a separated region
  • Polygon with hole adjacent to shell
  • Polygon with holes adjacent to shell and separating a region
  • Polygon with hole overlapping shell
  • Polygon with hole surrounding shell
  • Polygon with bowtie hole overlapping shell - intersection inside
  • Polygon with bowtie hole overlapping shell - intersection outside
  • Polygon with nested holes - disjoint
  • Polygon with nested holes - touching
  • Polygon with nested holes - adjacent
  • MultiPolygon with overlapping elements
  • MultiPolygon with adjacent elements
  • MultiPolygon with polygon with hole overlapping shell and other polygon
  • GeometryCollection with multiple invalid elements
  • Line that collapses to a point
  • Geometry with NaN ordinates

Issues

  • In PostGIS the semantic is to preserve all input vertices if possible. To do this, collapsed edges in polygonal inputs may result in output containing LineStrings. Is this desirable? Perhaps should be an alternate mode?

Semantics

  • Result geometry is always a new object
  • Validity is not checked - input geometry is always processed
    • A validity check can be performed by client, if needed
    • Valid inputs will still have minor transformations applied
  • Empty atomic geometries are valid and are returned unchanged (copied)
  • Remove invalid coordinates (i.e. non-finite X or Y: NaN, Inf, -Inf)
  • Remove repeated points in sequences
  • Point - keep valid coordinate, or EMPTY
  • LineString - fix coordinate list. If left with 1 or 0 coordinates then EMPTY
  • LinearRing - fix coordinate list. If valid return as a ring, otherwise return as a LineString.
  • Polygon- transform into a valid polygon, preserving as much of the extent and vertices as possible
  • MultiPolygon- fix each element, then ensure collection is non-overlapping (via union)
  • GeometryCollection- fix each element

Polygon Repair

There are two approaches for repairing self-intersecting polygons. Both of them start by forming the fully noded arrangement of the polygon linework.

  • the Even-Odd strategy treats every delimited region as a potential interior or exterior area, alternating inwards from the exterior.
  • the Winding Number strategy uses the winding number of each delimited region to determine interior/exterior.
    • this can be extended to use both winding number polarities to determine interior. This has the effect of identifying all "lobes" as interior areas.

See Subramaniam for further explanation.

The nominal shell and hole ring(s) are processed separately, and then the results are combined via a different operation to produce the final result.

A further refinement is to identify holes which are disconnected from (outside) the shell, and treat them separately as additional shells. This aligns with the above semantic of treating all linework lobes as polygons interiors.

Processing Options

The following semantic options may be provided as required.

  • Empty elements in Collections:
    • Option 1: (default) remove them
    • Option 2: keep them
  • Collapsed geometry (keepCollapsed)
    • Option 1: (default) remove them. Result has same dimension as input.
    • Option 2: convert to a representative lower-dimension geometry (note this may result in mixed collections)
  • Single-element collections
    • Option 1: (default) convert to atomic geometry
    • Option 2: Keep result as a single-element collection
    • LinearRing fixing
    • An alternative is to fix linework, node and return all resulting rings.
      • This can be done by converting ring to polygon, fixing polygon, and extracting rings

Prior Art

Algorithm

Polygons

  • For each Polygon ring
    • Convert self-intersecting ring to a valid set of non-adjacent rings, using the Winding Number of enclosed regions
      • This can be done by combining the results of two buffer(0) operations using opposite choices for winding number polarity
  • Disconnected outside holes are converted to shells (and do not participate in subsequent hole removal)
  • Union all repaired holes
  • Difference holes from original shell Polygon
  • Combine with converted outside holes

MultiPolygons

  • Fix each element polygon
  • Union fixed polygons
@mukoki
Copy link
Contributor

mukoki commented Dec 15, 2020

@dr-jts
Copy link
Contributor Author

dr-jts commented Dec 15, 2020

That's a very interesting paper (especially because it talks about JTS and GEOS!).

However, I'm not convinced that a CDT is the best way to fix invalid polygons. One obvious issue is that it requires writing a CDT algorithm, which is a big undertaking. And it's not clear that the complexity provides any better results than the node-polygonize approach above (and likely has worse performance).

The examples of invalidity in the paper are useful. I wonder if that data is available?

@mukoki
Copy link
Contributor

mukoki commented Dec 15, 2020

Didn't JTS already include a CDT algorithm ?
There is a repo on github with a few examples of invalid polygons: https://github.com/tudelft3d/prepair

The approach you propose seems not too far from OpenJUMP's MakeValidOp ;-).
Seems that you consider that shell vs hole information is reliable. What if you get a Polygon with a hole enclosing the shell. Should the hole "erases" the polygon or should we consider that shell and hole must be swapped ?

@dr-jts
Copy link
Contributor Author

dr-jts commented Dec 15, 2020

Didn't JTS already include a CDT algorithm ?

Nope, it has Conforming DT, which is definitely not what is needed.

The whole CDT approach just seems way more complicated than needed, and I don't think it answers the hard questions about how to create valid polygons.

There is a repo on github with a few examples of invalid polygons: https://github.com/tudelft3d/prepair

Great, could be handy.

The approach you propose seems not too far from OpenJUMP's MakeValidOp ;-).

No doubt!

Seems that you consider that shell vs hole information is reliable. What if you get a Polygon with a hole enclosing the shell. Should the hole "erases" the polygon or should we consider that shell and hole must be swapped ?

Yes, that's a good example. Either could be considered "right". Although it does feel suspect to convert to an empty geometry - that throws away all the input information. I've added this case to the examples list. The approach is definitely subject to change.

There's a similar question about a hole surrounding another hole. Should the inner hole turn into an island, or disappear?

@dr-jts dr-jts changed the title RFC: MakeValid RFC: Fixing invalid geometry Mar 16, 2021
@kleunen
Copy link

kleunen commented Jun 13, 2021

I have developed a polygon correct approach (remove self-intersection, correct order, correct inners), based on boost geometry:
https://github.com/kleunen/boost_geometry_correct

The approach for calculating and removing self-intersection is vastly simpler and more efficient than existing libraries. It was developed for tilemaker:
https://github.com/systemed/tilemaker

But can be used for other projects as well.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

I have developed a polygon correct approach (remove self-intersection, correct order, correct inners), based on boost geometry:
The approach for calculating and removing self-intersection is vastly simpler and more efficient than existing libraries.

Thanks, this looks interesting.

@kleunen
Copy link

kleunen commented Jun 13, 2021

What is approach that is used now for removing self-intersection from polygons ?
I looked through the code, but could not find an approch that is used.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

What is approach that is used now for removing self-intersection from polygons ?

The buffer-by-zero heuristic is used to turn self-intersecting rings into valid geometry. Code is here. I this produces a similar semantic to the one advocated in the Boost blog post you referenced.

@kleunen
Copy link

kleunen commented Jun 13, 2021

Ah yes. It is common and does work well in many practical cases. But in cases with holes. It will not generate the second polygon. The approach i took does generate these multipolygons.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

But in cases with holes. It will not generate the second polygon. The approach i took does generate these multipolygons.

Not sure what you mean - do you have an example?

Note the buffer-by-zero call is just the lowest level of GeometryFixer, to fix invalid single rings. For Polygons with holes and MultiPolygons each ring is processed separately, and then recombined using difference and union.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

@kleunen I think GeometryFixer handles the boost_geometry_correct examples in the same way:

POLYGON ((5 0, 2.5 9, 9.5 3.5, 0.5 3.5, 7.5 9, 5 0))

image

POLYGON ((55 10, 141 237, 249 23, 21 171, 252 169, 24 89, 266 73, 55 10))

image

POLYGON ((0 0, 10 0, 0 10, 10 10, 0 0, 5 0, 5 10, 0 10, 0 5, 10 5, 10 0, 0 0))

image

@kleunen
Copy link

kleunen commented Jun 13, 2021

Yes that is the same. I still have to add rebuilding with difference and union. Boost geometry buffer does not allow buffer with 0. So maybe there is difference in implementation there.

@kleunen
Copy link

kleunen commented Jun 13, 2021

Boost buffer add many new points. It seems jts buffer does not. Or is simplification run afterwards?

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

Boost geometry buffer does not allow buffer with 0. So maybe there is difference in implementation there.

Using buffer-by-zero was just a quick way to get this functionality rolled out (since it is available in JTS). You could duplicate that semantic by using a node-and-build-topology approach, of course. But then you have to implement the noding and topology graph building code, which is non-trivial.

In fact I would prefer the GeometryFixer implementation to not depend on buffer, since that introduces substantial complexity and might cause breakages at some point (although buffer has been stable for a long time). But that's a fairly major project, so not sure when it will get done.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

Boost buffer add many new points. It seems jts buffer does not. Or is simplification run afterwards?

Do you mean repeated points? If so, that's a result of de-duplication in the buffer and overlay functions.

@kleunen
Copy link

kleunen commented Jun 13, 2021

Here you can find discussion regarding dissolve/make_valid approach for tilemaker:
systemed/tilemaker#246

No, with buffer i get many additional points around the corners. To "round" them. For every corner on polygon you get 3 dots, maybe this is implementation of boost geometry. JTS implementation probably more efficient, generating less support points, because these additional points are not needed.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

No, with buffer i get many additional points around the corners. To "round" them. For every corner on polygon you get 3 dots, maybe this is implementation of boost geometry. JTS implementation probably more efficient, generating less support points, because these additional points are not needed.

Yes, that must be Boost geometry buffer behaviour. In the JTS buffer algorithm, buffer distance = 0 is treated as a special case, and the original geometry is used as the offset curve with no other modification. (In contrast, a buffer distance >0 invokes several heuristics to simplify the input linework, and then generates buffer fillets and endcaps as per the input parameters).

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 13, 2021

@kleunen One question I have about your algorithm (which I presume also applies to the papers mentioned, although I haven't verified that): how robust is it? Numerical precision limitations can cause nodes computed in a line arrangement to "jump" across one another, which leads to inconsistent topology. JTS contains a lot of code to handle this situation (in particular, buffer checks for consistent noding and falls back to using snap-rounding if it occurs).

However, perhaps this is a case where the effects aren't so severe - although I would be surprised if that was the case.

Unfortunately it's hard to reproduce robustness issues, and they tend to manifest differently for each individual algorithm. Generally the only way to find them is by running many real-world examples. And to detect failure cases, you need an independent way to verify the correctness of computed outputs - which is usually not easy to create. For this fix-invalid-geometry situation, you would need a way not only to verify that the computed result is valid, but that it's valid in a sensible way, according to the desired semantics of the algorithm (since otherwise trivial or wildly malformed output that happened to be valid would be accepted).

@kleunen
Copy link

kleunen commented Jun 14, 2021

There are no robustness analysis in the papers. The approach basicly reuses the existing points on the ring and add additional points on the ring on the intersections.

One thing i noticed you start traversing this ring and then follow the intersections to detect the non intersecting sub polygons. When doing this, you have to detect you are back at the starting point. For this i used a distance between the current point and the start point. If you do not detect this case, the algorithm keeps traversing the same ring infinitly.

https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L230

Also, there is no proof that by generating these pseudo vertices you can follow the subrings and always end up at the starting point. It is like you says. It works in practice but it is not proven you can not end up in a subring which does not have the starting point. And thus start looping infinitly.

Maybe this will make the algorithm more robust. Instead of detecting you are returned at the starting point, you should stop following the ring if you return to a point you already visited.

Moreover duplicate points should be filteren out. But this is same for other algorithms.

@kleunen
Copy link

kleunen commented Jun 14, 2021

In the case of multiple intersections at the same point, there is a choice to be made to which edge to follow, see below:
image

On the left case, I start at the intersection which was the first on the original ring, and then I follow the intersection which leads back the fastest to the end of the original ring. This leads back to the starting point.

On the right case, i take an edge to a different sub-ring. Getting stuck in this top-left ring, never being able to return to the original ring and starting point, without following a self-intersecting point.

There is no proof, why the first approach always works and why i do not need to perform an exhaustive search at an intersection to find out which route to take to get on the ring which leads back to the starting point. I think this is because if you look at where the intersections are on the position of the original ring, they are actually nested, the subrings.

@kleunen
Copy link

kleunen commented Jun 14, 2021

Also note I calculate "where" the intersections are on the original edge of the ring:
https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L118-L119

This is used to order them such, that they are ordered according to how they appear on the original ring:
https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L58-L71

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

@kleunen
Copy link

kleunen commented Jun 14, 2021

Validating the output is indeed laborious. If the algorithms drops an invalid inner or all inners or returns an empty polygon or multi polygon. The geometry is valid, but clearly is not acceptable. But how many nodes and inners should be in the output? You can not really calculate and automatically test. But i guess you are very familiar with these difficulties.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 14, 2021

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

The core problem is that the interesection point of two nearly-collinear lines might be computed to be the opposite side of another line which crosses the two lines. This leads to the computed noded line arrangement still having non-noded intersections. The only solution I've found to handle this is to introducing snapping into the noding subsystem.

@dr-jts
Copy link
Contributor Author

dr-jts commented Jun 14, 2021

Validating the output is indeed laborious. If the algorithms drops an invalid inner or all inners or returns an empty polygon or multi polygon. The geometry is valid, but clearly is not acceptable. But how many nodes and inners should be in the output? You can not really calculate and automatically test. But i guess you are very familiar with these difficulties.

One way to validate is to use a "point probe" algorithm, which is completely different and simpler, and hence a good guarantee of correctness. This involves generating points near every section of input and output linework, and then determining whether they are correctly placed relative to the original linework. If the Non-Zero Winding number rule is used it should be possible to determine the winding number for a point and then confirm it has an appropriate value. This is compute-intensive but relatively straightforward - which computers are good at!

@kleunen
Copy link

kleunen commented Jun 15, 2021

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

The core problem is that the interesection point of two nearly-collinear lines might be computed to be the opposite side of another line which crosses the two lines. This leads to the computed noded line arrangement still having non-noded intersections. The only solution I've found to handle this is to introducing snapping into the noding subsystem.

Yes i think the approach in boost geometry is to simply ignore this. Going the approach that double float or even int have a limit on precision and near this limit issues might occur. The user can always pre snap the geometry before applying a make valid if he/she needs this precision.

But i guess that is a matter of preference. This can occur clearly, because the intersection detection is needed. So if you need this to work correctly, you can pre-snap the nodes. But once the intersections are detected, you only need to detect if you are returned to the starting point when walking the sub-ring. For this, you can use a threshold as well.

@kleunen
Copy link

kleunen commented Jun 15, 2021

The NZW or odd-even can be configured/adjusted. The inner rings are detected, i just convert the orientation before performing the union. The ring detection generates three rings, which need to be combined:

image
1 = CW, 2 = CW, 3 = CCW

I now apply NZW:
image

image

@13661172102
Copy link

i have an multipolygon, wkt like this :
MULTIPOLYGON (((104.84831240300008 26.577005471000064, 104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005, 104.84831240300008 26.577005471000064, 104.84886598700007 26.57734819600006, 104.84935045000009 26.57658099500003, 104.84880356000008 26.576273349000076, 104.84831240300008 26.577005471000064)))

image

when i use GeometryFixer, it bacame to this :
POLYGON ((104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005))

image

while use postgis, it bacame to this:

GEOMETRYCOLLECTION (POLYGON ((104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048),
(104.84880356000008 26.576273349000076, 104.84935045000009 26.57658099500003, 104.84886598700007 26.57734819600006, 104.84831240300008 26.577005471000064, 104.84880356000008 26.576273349000076)),
LINESTRING (104.84831240300008 26.577005471000064, 104.84889922500008 26.57747835400005))

image

i saw this issue:
In PostGIS the semantic is to preserve all input vertices if possible. To do this, collapsed edges in polygonal inputs may result in output containing LineStrings. Is this desirable? Perhaps should be an alternate mode?

for me , i think the one which postgis maked is correct. and an alternate mode will make it more adaptable in a lot of scenarios.

@dr-jts
Copy link
Contributor Author

dr-jts commented May 13, 2024

Fixed by #704

@dr-jts dr-jts closed this as completed May 13, 2024
@dr-jts dr-jts reopened this May 13, 2024
@dr-jts dr-jts closed this as completed May 13, 2024
@13661172102
Copy link

13661172102 commented May 13, 2024 via email

luontola added a commit to luontola/territory-bro that referenced this issue Aug 18, 2024
Why:
- Geometry union will fail if the geometries are not valid. A typical
  cause of invalid geometry is a polygon with a self-intersection. JTS
  will then throw an exception like the following:

org.locationtech.jts.geom.TopologyException: found non-noded intersection between LINESTRING ( 0 0, 1 1 ) and LINESTRING ( 1 0, 0 1 ) [ (0.5, 0.5, NaN) ]
 at org.locationtech.jts.noding.FastNodingValidator.checkValid (FastNodingValidator.java:139)
    org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid (EdgeNodingValidator.java:80)
    org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid (EdgeNodingValidator.java:45)
    org.locationtech.jts.operation.overlay.OverlayOp.computeOverlay (OverlayOp.java:229)
    org.locationtech.jts.operation.overlay.OverlayOp.getResultGeometry (OverlayOp.java:181)
    org.locationtech.jts.operation.overlay.OverlayOp.overlayOp (OverlayOp.java:84)
    org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.getResultGeometry (SnapIfNeededOverlayOp.java:75)
    org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.overlayOp (SnapIfNeededOverlayOp.java:37)
    org.locationtech.jts.geom.GeometryOverlay.overlay (GeometryOverlay.java:76)
    org.locationtech.jts.geom.GeometryOverlay.union (GeometryOverlay.java:157)
    org.locationtech.jts.geom.Geometry.union (Geometry.java:1369)
    territory_bro.gis.geometry$union$invoke__Geometry__DOT_union__3537657.invoke (geometry.clj:49)
    ...
    territory_bro.gis.geometry$union.invokeStatic (geometry.clj:51)
    territory_bro.gis.geometry$union.invoke (geometry.clj:49)

- JTS contains a GeometryFixer class which can be used to fix invalid
  geometries automatically. An older fix was to call Geometry.buffer(0),
  but that will lose part of the polygon. Related issues:
    locationtech/jts#657
    locationtech/jts#652
- This commit also improves logging when there is a crash in the
  projections. Debugging them is easier when the error message shows
  which event caused the crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants