At King County GIS, we connect with data quality at different levels. This article is one of three introductory posts about how King County GIS is working to improve the quality of the County’s GIS data. These three levels of improving and ensuring data quality are:
- GIS Maintenance Prioritization and Data Review
- Validation of Spatial Data Warehouse Objects
- Quality Assessment of Metadata Completeness and Content.
This first post will discuss GIS Maintenance Prioritization and Data Review.
The King County GIS Spatial Data Warehouse (SDW) contains over 1,500 GIS layers and tables. Many of the layers in the SDW are updated less frequently than stated in their metadata, and less frequently than is optimal. As with many organizations, there are fewer resources available for data maintenance than are required to accomplish all maintenance on schedule.
A GIS Governance priority initiative, adopted for 2017, is driving development of a system that will: (a) identify framework layer status for SDW layers, (b) monitor the update frequency of these layers as compared to their stated update frequency, and (c) estimate workload requirements for maintenance of each layer. This will lead to a prioritization of the layers so that staffing resources can be allocated most efficiently.
This prioritization goal dovetails with an ongoing team effort that has been colloquially named the Data Wrangling Team. This “wrangling” is structured around a bi-weekly meeting of Enterprise Operations staff and stewards from other GIS work groups. At these meetings the presenting steward is given the floor to review and demo a specific dataset that they are responsible for. A detailed template is used to guide the discussion through topics ranging from responsible parties and dependent map services to desktop presentation and metadata. Topics that generate follow-up are assigned as reminders via a SharePoint task list. If significant data quality issues are identified, a separate work group is spun off to address the issues and define remediation tasks.
Within the context of this ongoing strategy to continually improve the quality of our data, one example is our points-of-interest dataset. This layer displays points for nearly 40 domains or classes of features, including hospitals, parks, schools, government and commercial buildings, etc. All features in this dataset share a common set of core attributes, and business tables for certain domains are used to support additional business needs.
A comprehensive Python codeset has been developed that, on a domain-by-domain basis, analyzes locational and attribute information using ancillary and related databases. For example, the address assigned to schools based on information from school district websites is compared to independently-created address, ZIP Code, and parcel layers to validate the encoding of this information. Other tests report out internal inconsistencies such as duplicate school names and locational inconsistencies. Domain-specific tests are also added in those situations where the additional business table attributes are added to create unique “children” layers.
Because the core portion of the codeset is agnostic to which domain it reviews and is fully automated for reporting, it is possible to rerun the tests as frequently as required to ensure compliance. As the review continues through each of the domains within the points-of-interest layer, this process will also inform which domains will be removed as they no longer have a supporting business need or owner, as well as define new domains that should be added.
GIS data maintenance at King County is challenging due to the number and complexity of maintained datasets just as for any large city or county. King County GIS continues to move forward on improving data quality through a continually evolving maintenance strategy.
The next post on this Data Quality topic will discuss how the KCGIS Center employs several layers of control to ensure a high degree of validation and correspondence across all the related objects in the SDW.
Mike Leathers is the GIS Data Coordinator in the King County GIS Center.
Pingback: Data quality at King County GIS – Part 2 | GIS & You
Pingback: Data quality at King County GIS – Part 3 | GIS & You