SQL Server: Spatial Data

In the past several years, the need of businesses to incorporate spatial data into their databases and to manage it using a database system has grown significantly. The most important factor leading to this growth is the proliferation of geographical services and devices, such as Microsoft Bing Maps, low-priced GPS devices, and mobile phones.

Generally, the support of spatial data by a database vendor helps users to make better decisions in several scenarios, such as:

  • Real-estate analysis (“Find a suitable property within 500m of an elementary school”)
  • Consumer-based information (“Find the nearest shopping malls to a given ZIP code”)
  • Market analysis (“Define geographic sales regions and ascertain whether there is a necessity for a new branch office”)

As you already know from Chapter 5, you can use the CREATE TYPE statement to create user-defined data types. The implementation of such types is done using the Common Language Runtime (CLR). Developers use the CLR to implement two data types in relation to spatial data: GEOMETRY and GEOGRAPHY. These two data types are discussed after the following brief look at the different models for representing spatial data.

1. Models for Representing Spatial Data

Generally, there are two different groups of models for representing spatial data:

  • Geodetic spatial models
  • Flat spatial models

Planets are complex objects that can be represented using a flattened sphere (called a spheroid). A good approximation for the representation of Earth (and other planets) is a globe, where locations on the surface are described using latitude and longitude. (Latitude gives the location of a place on Earth north or south of the equator, while longitude specifies the location in relation to a chosen meridian.) Models that use these measures are called geodetic models. Because these models provide a good approximation of spheroids, they provide the most accurate way to represent spatial data.

Flat spatial models (or planar models) use two-dimensional maps to represent Earth. In this case, the spheroid is flattened and projected in a plane. The flattening process results in some deformation of shape and size of the projected (geographic) objects. Flat spatial models work best for small surface areas, because the larger the surface area being represented, the more deformation that occurs.

As you will see in the following two sections, the GEOMETRY data type is based on a flat spatial model, while the GEOGRAPHY data type is based on a geodetic spatial model.

2. GEOMETRY Data Type

The Open Geospatial Consortium (OGC) introduced the term “geometry” to represent spatial features, such as point locations and lines. Therefore, “geometry” represents data in  a two-dimensional plane as points, lines, and polygons using one of the existing flat spatial models.

You can think of “geometry” as a data type with several subtypes, as shown in Figure 30-1. The subclasses are divided into two categories: the base geometry subclasses and the homogeneous collection subclasses. The base geometries include, among others,

Point, LineString, and Polygon, while the homogeneous collections include MultiPoint, MultiLineString, and MultiPolygon. As the names imply, the homogeneous collections are collections of base geometries. In addition to sharing base geometry properties, homogeneous collections have their own properties.

The types in Figure 30-1 that appear in italic font are instantiable, which means they have instances. All instantiable types are implemented as user-defined data types in the Database Engine. The following are the instantiable types:

  • Point A point is a zero-dimensional geometry with single X and Y coordinate values. Therefore, it has a NULL boundary. Optionally, a point can have two additional coordinates: evaluation (Z coordinate) and measure (M coordinate). Points are usually used to build complex spatial types.
  • MultiPoint A multipoint is a collection of zero or more points. The points in a multipoint do not have to be distinct.
  • LineString A line string is a one-dimensional geometry object that has a length and is stored as a sequence of points defining a linear path. Therefore, a line string is defined by a set of points, which define the reference points of it. Linear interpolation between the reference points specifies the resulting line string. A line string is called simple if it does not intersect its interior. The endpoints (the boundary) of a closed line string occupy the same point in space. A line is called a ring if it is both closed and simple.
  • MultiLineString A multiline string is a collection of zero or more line strings.
  • Polygon A polygon is a two-dimensional geometry object with surface. It is stored as a sequence of points defining its exterior bounding ring and zero or more interior rings. The exterior and any interior rings specify the boundary of a polygon, and the space enclosed between the rings specifies the polygon’s interior.
  • MultiPolygon A multipolygon is a collection of zero or more polygons.
  • GeometryCollection A geometry collection is a collection of zero or more geometry objects. In other words, this geometry object can contain instances of any subtype of the GEOMETRY data type.
  • CircularString A circular string is a collection of zero or more continuous circular arc segments. A circular arc segment is a curved segment defined by three points in a two-dimensional plane; the first point cannot be the same as the third point. If all three points of a circular arc segment are collinear, the arc segment is treated as a line segment.
  • CurvePolygon A curve polygon is a closed surface defined by an exterior bounding ring and zero or more interior rings

3. GEOGRAPHY Data Type

While the GEOMETRY data type stores data using X and Y coordinates, the GEOGRAPHY data type stores data as GPS latitude and longitude coordinates. (Longitude represents the horizontal angle and ranges from -180 degrees to +180 degrees, while latitude represents the vertical angle and ranges from -90 degrees to +90 degrees.)

The GEOGRAPHY data type, unlike the GEOMETRY data type, requires the specification of a Spatial Reference System. A Spatial Reference System is a system used to identify a particular coordinate system and is specified by an integer. Information on available integer values can be found in the sys.spatial_reference_systems catalog view. (This view will be discussed later in this chapter.)

NOTE All instantiable types (see Figure 30-1) that are implemented for the GEOMETRY data type are implemented for the GEOGRAPHY data type, too.

4. GEOMETRY vs. GEOGRAPHY

As you already know, the GEOMETRY data type is used in flat spatial models, while the GEOGRAPHY data type is used in geodetic models. The main difference between these two groups of models is that with the GEOMETRY data type, distances and areas are given in the same unit of measurement as the coordinates of the instances. (Therefore, the distance between the points (0,0) and (3,4) will always be 5 units.) This is not the case with the GEOGRAPHY data type, which works with ellipsoidal coordinates that are expressed in degrees of latitude and longitude.

There are also some restrictions placed on the GEOGRAPHY data type. For example, each instance of the GEOGRAPHY data type must fit inside a single hemisphere.

4.1. External Data Formats

The Database Engine supports three external data formats that can be used to represent spatial data in an implementation-independent form:

  • Well-known text (WKT) A text markup language for representing spatial reference systems of spatial objects and transformations between spatial reference systems.
  • Well-known binary (WKB) The binary equivalent of WKT.
  • Geography Markup Language (GML) The XML grammar defined by OGC to express geographical features. GML is an open interchange format for geographic transactions on the Internet.

NOTE All examples shown in this chapter reference the WKT format, because this format is the easiest to read.

The following examples show the syntax of WKT for the selected types:

  • POINT(3,4) The values 3 and 4 specify the X coordinate and Y coordinate, respectively.
  • LINESTRING(0 0, 3 4) The first two values represent the X and Y coordinates of the starting point, while the last two values represent the X and Y coordinates of the end point of the line.
  • POLYGON(300 0, 150 0, 150 150, 300 150, 300 0) Each pair of numbers represents a point on the edge of the polygon. (The end point of the specified polygon is the same as the starting point.)
  • CIRCULARSTRING(0 -12.5, 0 0, 0 12.5) You need at least three points to define a circular string. The first point specifies the start, the second specifies the end of the circular string, and the third must be somewhere along the arc.
  • COMPOUNDCURVE(CIRCULARSTRING(0 -23.43778, 0 0, 0 23.43778), CIRCULARSTRING(0 23.43778, -45 23.43778, -90 23.43778), CIRCULARSTRING(-90 23.43778, -90 0, -90 -23.43778), CIRCULARSTRING(-90 -23.43778, -45 -23.43778, 0 -23.43778) ) The compound curve is a composition of either circular strings only or circular and linear strings.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *