Mastering QGIS
上QQ阅读APP看书,第一时间看更新

Fundamental database concepts

A database is a structured collection of data. Databases provide multiple benefits over data stored in a flat file format, such as shapefile or KML. The benefits include complex queries, complex relationships, scalability, security, data integrity, and transactions, to name a few. Using databases to store geospatial data is relatively easy, considering the aforementioned benefits.

Note

There are multiple types of databases; however, the most common type of database, and the type of database that this chapter will cover, is the relational database.

Database tables

A relational database stores data in tables. A table is composed of rows and columns, where each row is a single data record and each column stores a field value associated with each record. A table can have any number of records; however, each field is uniquely named and stores a specific type of data.

A data type restricts the information that can be stored in a field, and it is very important that an appropriate data type, and its associated parameters, be selected for each field in a table. The common data types are as follows:

  • Integer
  • Float/Real/Decimal
  • Text
  • Date

Each of these data types can have additional constraints set, such as setting a default value, restricting the field size, or prohibiting null values.

In addition to the common data types mentioned previously, some databases support the geometry field type, allowing the following geometry types to be stored:

  • Point
  • Multi-point
  • Line
  • Multi-line
  • Polygon
  • Multi-polygon

The multi-point/line/polygon types store multi-part geometries so that one record has multiple geometry parts associated with it.

Tip

ESRI shapefiles store geospatial data in multi- type geometry, so using multi- type geometry is a good practice if you plan on converting between formats.

Table relationships

A table relationship connects records between tables. The benefit of relating tables is reducing data redundancy and increasing data integrity. In order to relate two tables together, each table must contain an indexed key field.

Note

The process of organizing tables to reduce redundancy is called normalization. Normalization typically involves splitting larger tables into smaller, less redundant tables, followed by defining the relationship between the tables.

A field can be defined as an index. A field set as an index must only contain values that are unique for each record, and therefore, it can be used to identify each record in a table uniquely. An index is useful for two reasons. Firstly, it allows records to be quickly found during a query if the indexed field is part of the query. Secondly, an index can be set to be a primary key for a table, allowing for table relationships to be built.

A primary key is one or more fields that uniquely identify a record in its own table. A foreign key is one or more fields that uniquely identify a record in another table. When a relationship is created, a record(s) from one table is linked to a record(s) of another table. With related tables, more complex queries can be executed and redundancy in the database can be reduced.

Structured Query Language

Structured Query Language (SQL) is a language designed to manage databases and the data contained within them. Covering SQL is a large undertaking and is outside the scope of this book, so we will only cover a quick refresher that is relevant to this chapter.

SQL provides functions to select, insert, delete, and update data. Four commonly used SQL data functions are discussed as follows:

  • SELECT: This retrieves a temporary set of data from one or more tables based on an expression. A basic query is SELECT <field(s)> FROM <table> WHERE <field> <operator> <value>; where <field> is the name of the field from which values must be retrieved and <table> is the table on which the query must be executed. The <operator> part checks for equality (such as =, >=, LIKE) and <value> is the value to compare against the field.
  • INSERT: This inserts new records into a table. The INSERT INTO <table> (<field1>, <field2>, <field3>) VALUES (<value1>, <value2>, <value3>); statement inserts three values into their three respective fields, where <value1>, <value2>, and <value3> are stored in <field1>, <field2>, and <field3> of <table>.
  • UPDATE: This modifies an existing record in a table. The UPDATE <table> SET <field> = <value>; statement updates one field's value, where <value> is stored in <field> of <table>.
  • DELETE: This deletes record(s) from a table. The following statements deletes all records matching the WHERE clause: DELETE FROM <table> WHERE <field> <operator> <value>; where <table> is the table to delete records from, <field> is the name of the field, <operator> checks for equality, and <value> is the value to check against the field.

Another SQL function of interest is view. A view is a stored query that is presented as a table but is actually built dynamically when the view is accessed. To create a view, simply preface a SELECT statement with CREATE VIEW <view_name> AS and a view named <view_name> will be created. You can then treat the new view as if it were a table.