By: Blake Thompson
About me:
Developer for Open Source Mapping Library Mapnik
There is no magic bullet in GIS.
GIS IS HARD
Common Problems:
Examples:
Using WMS + Quick Moving Map
Generating Image Tiles from TB DB in PostGIS on demand
10 million points in a Polygon for a 256x256 Image Tile
Making good data for a specific application is hard.
Types of GIS Data:
Stored as:
X, Y
Stored as:
Array of Xs, Ys
Array of Points
Lines have a direction.
Stored as:
Array of Xs, Ys
or an:
Array of Arrays of Xs, Ys
Polygons are:
Implicity Closed:
Explicitly Closed:
Closing Requirement Changes Depending on Data Format and Software
Winding order is the direction the path travels
Is determined by Curve Orientation
Curve Orientation is determined by calculating the area of the polygon
The area calculated can be negative
Positive area is Counter Clockwise UNLESS your positive Y-axis is downward, then it is Clockwise
Negative area is Clockwise UNLESS your positive Y-axis is downward, then it is Counter Clockwise.
Polygons can have holes!
Holes are determined by Counting Curve Orientations!
Ways of determining where a polygon fills:
Invalid Polygons:
Self intersections are typically the worst and most common problem in polygon data.
Most of the problems with rendering are from invalid polygons.
Rendered polygons can make it easy to see problems
Collections:
Other problems exist in GIS related to how we find data within our GIS systems
How do we find our data? Spatial Indexes
Fancy way of say directions in your data
Find your way to a part of your data fast
So how does this work?
Step 1: Find Bounding Box of Data
Step 2: Make a Tree Structure
How? Create the tree by making bounding boxes of your bounding boxes from your data
Performance depends on how fast you can find data in the tree
Once you narrow down to a section of a tree, you still must iterate through every bounding box in that branch to check for intersection.
Next you must iterate through EVERY point in any line or polygon to make sure that you actually do intersect
This is done as part of checking for intesection (for example finding all data at a point)
This can be very slow if the tree is not well organized
This also means that checking what intersects with a polygon in your tree can be much slower because you must repeatedly iterate through that polygon
Common Problems in Spatial Indexes:
More Points in Polygon means more time checking for intersections.
This is worse in polygons with Lots of holes!
Lots of polygons that overlap. Cause Big Tree Branches.
Multipolygon, multilines, or multipoints! They have one bounding box!
Make your indexes work for you!
You might have too much data! Break apart your geometry data first by some some metadata.
Breaking out data into different space is the magic of Vector Tiles.
Simplify data whenever possible!
Expect Gotchas!
Don't expect one solution to solve everything for you.
Breaking out detailed data into Vector Tiles might make display easier, but may not solve your Nearest Neighbor problem.
Iterate! You learn more from failure.
Systems grow and problems WILL change.
Always think how you can:
Simplify your code, processes or system.
Test Everything.
Mapnik has over 2,000 visual test cases and over 10,000 assertions in unit tests
AVOID MAKING GIS HARDER
Stop doing this: