An Introduction To GIS Problems

By: Blake Thompson

About me:

Developer for Open Source Mapping Library Mapnik

im1
im1
im1

There is no magic bullet in GIS.

GIS IS HARD

NoIdea

Common Problems:

Examples:

Using WMS + Quick Moving Map

Generating Image Tiles from TB DB in PostGIS on demand

10 million points in a Polygon for a 256x256 Image Tile

Making good data for a specific application is hard.

Types of GIS Data:

Points

Stored as:

X, Y

Lines

Stored as:

Array of Xs, Ys

Array of Points

Lines have a direction.

Polygons

Stored as:

Array of Xs, Ys

or an:

Array of Arrays of Xs, Ys

Polygons are:

Implicity Closed:

Explicitly Closed:

Closing Requirement Changes Depending on Data Format and Software

Winding order is the direction the path travels

Is determined by Curve Orientation

Curve Orientation is determined by calculating the area of the polygon

The area calculated can be negative

Positive area is Counter Clockwise UNLESS your positive Y-axis is downward, then it is Clockwise

Negative area is Clockwise UNLESS your positive Y-axis is downward, then it is Counter Clockwise.

Polygons can have holes!

image

Holes are determined by Counting Curve Orientations!

image

Ways of determining where a polygon fills:

image
image
image

Invalid Polygons:

Self intersections are typically the worst and most common problem in polygon data.

Most of the problems with rendering are from invalid polygons.

Rendered polygons can make it easy to see problems

image
image

Collections:

These are collections of the same base type tagged with the same metadata

Other Collections exist which can contain a variety of base types

Other problems exist in GIS related to how we find data within our GIS systems

How do we find our data? Spatial Indexes

Fancy way of say directions in your data

Find your way to a part of your data fast

So how does this work?

Step 1: Find Bounding Box of Data

image

Step 2: Make a Tree Structure

How? Create the tree by making bounding boxes of your bounding boxes from your data

image

Performance depends on how fast you can find data in the tree

Once you narrow down to a section of a tree, you still must iterate through every bounding box in that branch to check for intersection.

Next you must iterate through EVERY point in any line or polygon to make sure that you actually do intersect

This is done as part of checking for intesection (for example finding all data at a point)

This can be very slow if the tree is not well organized

This also means that checking what intersects with a polygon in your tree can be much slower because you must repeatedly iterate through that polygon

Common Problems in Spatial Indexes:

More Points in Polygon means more time checking for intersections.

This is worse in polygons with Lots of holes!

Lots of polygons that overlap. Cause Big Tree Branches.

Multipolygon, multilines, or multipoints! They have one bounding box!

Make your indexes work for you!

You might have too much data! Break apart your geometry data first by some some metadata.

Breaking out data into different space is the magic of Vector Tiles.

Simplify data whenever possible!

Expect Gotchas!

Don't expect one solution to solve everything for you.

Breaking out detailed data into Vector Tiles might make display easier, but may not solve your Nearest Neighbor problem.

Iterate! You learn more from failure.

Systems grow and problems WILL change.

Always think how you can:

Simplify your code, processes or system.

Test Everything.

Mapnik has over 2,000 visual test cases and over 10,000 assertions in unit tests

AVOID MAKING GIS HARDER

Stop doing this:

image