Data styling in web mapping applications with a focus on environmental health

Interpretation of data visualized in maps strongly depends on the experience of map reader and on the way the data are presented. Styling the data is crucial especially when creating maps for users...


Introduction
Presentation of spatial data using web mapping applications (WMAs) is rapidly increasing. Avraam (2009) defines WMAs as applications that enable visualization of geographically referenced data through a web interface available online. In general, visualization of spatial data (referred to as cartographic visualization, geographic visualization or geovisualization, with only slight differences in meaning (Hallisey 2005)) is not a simple and straightforward process. It requires adequate attention and consideration of several factors, such as purpose of the visualization, who the audience is, type of data, level of user interaction, etc. The ability to understand map visualization of data is sometimes referred to as graphicacy, an ability at the level also due to the increasing availability of open and big data (Smith 2016) and tools approachable also for GIS non-specialists (Zastrow 2015).
The major advantage of online mapping is its interactivity, bringing many opportunities (and challenges, too) for data visualization and map styling. Roth (2013) defines cartographic interaction as "the dialogue between a human and a map mediated through a computing device to emphasize digital interaction", thus accentuating the two-way communication. Currently, the major function of online interactive mapping tools is a mixture of data presentation and exploration, in some cases (mostly in research fields of geodemographics, health geography, and urban analytics) including at least basic analytical possibilities (Smith 2016).
Importantly, communicating sensitive health data using maps requires especially careful approach; the representation needs to be accurate, clear, and not misleading for various types of viewers, and several factors needs to be addressed, including limitations and quality of data, data confidentiality, uncertainty of estimates, and risk of misinterpreting (Bell et al. 2006). These issues can be approached, among other, by selecting the appropriate method of map representation and data styling as one of the aspects of building a WMA.
In this paper, we present and analyse various approaches to data styling in interactive WMAs visualizing environmental health data building on a theory of map representation methods and map syntactic types by Pravda (2003Pravda ( , 2006 and Pravda and Kusendová (2007). Map syntactic types encapsulate principles of map sign composition and are defined based on one or more typifying criteria. With subtypes, variants, and subvariants, this approach to classification of map representation methods has hierarchical character and is open to new criteria and types.
Following a cartographic visualization analysis of 21 web environmental health mapping solutions with focus on data styling, we subsequently describe our approach to data styling using OGC standards in our WMA. This application gathers environmental and health data from various sources and aims to facilitate understanding of their spatial patterns and interactions. Collecting such amount of data, the main aim of the paper is to present the approach to their styling in a way that is user-friendly and interpretable to both researchers and general public, though still cartographically correct.

Review of WMAs with environmental health focus
There are many different WMAs available in the environment of world wide web with different topics, designs, and cartographic visualizations. In our analysis, we focused on WMAs communicating themes of environmental health, environmental risks and hazards.
We analysed 21 WMAs (Table 1), 13 of them operating under the head of the World Health Organization (WHO) and dealing with the issues of environmental health and health indicators, three developed under the head of the United Nations Office for Disaster Risk Reduction (UNISDR) and dealing with environmental threats and risks, three being a part of Eurostat, which accumulate different statistical data, e.g. about health or population. The last two applications are Slovak WMAs showing data from the domain of environmental indicators and health for the area of Slovakia. The WMAs were selected predominantly by thematic criteria: the search terms included combinations of keywords health, environment, risk and hazard.
The WMAs were analysed from several viewpoints of data styling, including map syntactic types, data intervals, colour schemes and data series. Overall, we used the following 11 criteria labelled A to K (Table 2): -A -type of presented data -qualitative or quantitative, -B -map syntactic types (Pravda 2002): -S F (Q, Top) -type of qualitative figural signs with topographical location, -S L (Q) -type of qualitative linear signs, -S AD (Q) -type of qualitative discrete areas, -S AD (M, Int) -type of quantitative discrete areas (choropleth type), -S F (M, Diagr) -type of quantitative diagrammatic sign, -S L (M, Diagr) -type of quantitative linear diagrammatic sign, -S AC (M,Isogr) -type of quantitative continuous (isogradational) areas, -C -spatial extent and level of detail, -D -type of data scale -intervals, intervals with limit values, continuous, categories, -E -possibility of data scale change -yes/no, -F -data series -yes/no, -G -colour schemes -1 colour scheme, sequence of multiple colours, different colours, -H -possibility of colour scheme change -yes/no, -I -change in colour scheme -individual colours, default schemes, default schemes in Brewer colours (Brewer 2014), no change possible,  End of Table 2 -J -quality of default colour scheme or colourscartographically correct/wrong, -K -colour schemes varied by themes -yes/no. Table 2 shows that the clear majority of the displayed data are quantitative data, which has an impact on choice of the map representation method. Qualitative data were displayed only in two cases, and four WMAs displayed both qualitative and quantitative data. We have recognized use of seven different map types. The most abundant method is S AD (M, Int), followed by S AD (Q) and S F (M, Diagr). The method S AD (M, Int) is used for quantitative data. To represent qualitative data, tree methods -S AD (Q), S L (Q) and S F (Q, Top) -were used. WMAs (no. 14, 15 and 19) displaying both types of data applied four to five different methods. The number of map representation methods depends also on the diversity of themes.
WMAs displayed different areas, from states of the world, through states of selected regions to only one state. In some WMAs, data were further downscaled to administrative units, e.g. NUTS. The last two WMAs display data for the area of Slovakia. The most frequently used type of data scale was scale with intervals. Two WMAs used two scales, continuous and categorical. In one case, we identified the intervals with limit values and categories. The default colour scheme was used in the majority of WMAs with appropriate colour sequence of either shades of one colour or appropriate colour sequence of two to three colours. Some WMAs have also colour schemes made from various colours. Possibility to change data scale (the classification method) as well as colour scheme was possible in the majority of WMAs under the head of WHO, with default colour schemes designed by Brewer (2014). Data series was used in 10 analysed WMAs. Possibility to make changes in colour schemes of displayed data was available in half of the analysed WMAs, mostly varying between Brewer colour schemes (Brewer 2014). It was possible to change individual colours in the scheme in two cases. The quality of default colour scheme or colours is a subjective criterion; we evaluated 14 WMAs as cartographically correct, 4 as cartographically wrong, and 3 WMAs reached mixed results -some schemes were correct from cartographic point of view, other were wrong. Half of the analysed WMAs differentiated colour schemes for different thematic areas.

Technical background of styling
The presented WMA, available at https://uvp.geonika. sk/map/ (Department of Cartography… 2016), was created with the aim to combine and visualize spatial data related to the issue of environmental health in Slovakia from various sources. It was programmed in Javascript using the OpenLayers library with Apache as web server and GeoServer as map server. Data are stored in the PostgreSQL/PostGIS database system. The overall layout of the application is described elsewhere (Benová, Feciskanin 2014), the technical details of the application will be published once the development is finished. Here we will focus on the technical and cartographical details of data styling.
Only shortly regarding the technical background of data visualization; the application enables interactive display of published map layers, which user can switch on and off in the layer tree. User can move in a map window, zoom in and out, change the orientation of the map between geographical north and the orientation of the S-JTSK coordinate system, change the order of layers, set their transparency, and get information about objects using object identification tool. All these functions support and expand the visualization of data by conventional map styling, which is the focus of the presented paper.
The application uses GeoServer as a map server. It is an open-source map server with user-friendly user interface, which enables comfortable publication of maps in online environment. To style and accordingly render geospatial data, of both vector and raster type, GeoServer uses Styled Layer Descriptor (SLD) markup language, one of the standards defined by the Open Geospatial Consortium (2015), which were developed with the aim to enable interoperability of web solutions for geospatial data. SLD language is based on the XML schema, thus its elements are defined by XML tags. They are used to define metadata of the SLD style as well as rules for rendering the data.
NamedLayer tag is the base of the SLD style XML file. It contains tags that define the name of the styled layer and its styles. Each UserStyle tag can contain name, title and abstract to describe the style. Within FeatureTypeStyle tag it is possible to define different rules for rendering features. Each rule can contain name and title and has to contain symbolizer, which is the part defining the way data render. Within rule data can be filtered using ogc:Filter tag and some of the filter expressions (Open Source Geospatial Foundation 2015a) and rendering can be limited by zoom levels.
The tag name for the symbolizer depends on the type of the features to be rendered (PointSymbolizer for figural signs, LineSymbolizer for linear signs, Poly-gonSymbolizer for areal signs, TextSymbolizer for text labels or RasterSymbolizer for raster data). Subtags of each symbolizer type differ (some subtags are shared, though), which is given by varying requirements for rendering of various data types. In all cases, sizes are specified in pixels (although it can be specified in ground units using a GeoServer SLD extension) and RGB colours in a hexadecimal format (i.e. #RRG-GBB). Full definition of SLD tags can be found in the GeoServer documentation (Open Source Geospatial Foundation 2015b).
In GeoServer, each published WMS layer has to have associated at least one SLD file. Defining more SLD files is possible, with one being set as a default style. This possibility was used in case of publishing time series and parameters separately for men and women in the WMA (for more details see below). The SLD styles were directly used to generate legend for the data, so no additional work was required in this regard.

Data content of the WMA
Data published in the WMA (overall 355 layers) are organized into five thematic groups and 30 subgroups ( Table 3). As a base layer, users can opt for WMTS layer of ZBGIS data provided by Geodesy, Cartography and Cadastre Authority of Slovak Republic or a digital orthophotomap with labels. Moreover, for better orientation users can add data from the first thematic group containing layers of administrative borders and nomenclature. The other four thematic groups represent data on demography, environmental and health indicators, and results of the 2011 Census.
Demographic data are available at the level of regions, districts and municipalities, and for years 2007 or 2008 to 2013. Health indicators are aggregated at the same administrative units; however, the time frame of the data varies. Most of environmental indicators have their own geometry, except for emissions, which are data aggregated at the level of regions and districts, and water, which are data on quality of drinking and bathing water collected at the level of municipalities. Time frames of these data also vary. Results of the 2011 Census are presented at the level of municipalities as well as basic settlement units. S AD (M,Int), also known as choropleth type, is the most frequently used syntactic type in the presented WMA. This is a result of the fact that many data are numeric values of an attribute collected or aggregated at the level of administrative units and then normalized by the size of either population or area (demography data, emissions, health indicators, census data). The values for the respective units were categorized into intervals (more on that see below). The interval boundaries were used in filters, which in this case were compiled from two conditions using logical operation of AND. The appearance of the respective category was again defined in the PolygonSymbolizer tag. Definition of stroke colour is identical for all administrative units (#5E5E5E), stroke size (width) differs based on the hierarchical level of the units.

Map syntactic types in the WMA and styling approaches
It is also possible to visualize S AD (M, Diagr) type or diagram map using SLD file. In the WMA, we needed to visualize relative proportions of selected groups of causes of death (neoplasms, diseases of circulatory system, diseases of respiratory system, diseases of digestive system, external causes of mortality, other causes) in selected years.
Visualization with diagram is enabled by an external graphic within PointSymbolizer tag. It is a chart extension using version of the (deprecated) Google Chart API called Eastwood Charts (GeoSolutions 2013). Inside the OnlineResource tag chart type (cht), chart data (chd), chart background fill (chf), chart colours (chco) and chart rotation (chp) are set. p3 chart type represents flat 3D pie chart (Fig. 1), chart data specify attributes (in our case already calculated percentages) to be displayed in the diagram in the required order, chf defines solid (s) white (FFFFFF) transparent (00) background of the chart, chco defines colour of the diagram in the order respective to the order of attributes, chp defines rotation of the chart in radians, 4.71 is equivalent to 270° so the chart drawing starts at 12:00 (default is 3:00). Size of the diagram is defined within a separate tag Size. We defined the size to be linearly dependant on the overall number of deaths in the administrative units.
The diagram is placed into the centroid of the respective administrative unit. We did not fill the administrative units because any other information for the units can be displayed below the diagram from the layers available in the layer tree.
Styling S AC (M,Isogr) map type was approached similarly as S AD (Q) and S AD (M,Int) syntactic type. These data (air quality, chemical elements in groundwater and soils) were originally in raster continuous form but were reclassified into several categories (often considering the threshold value given by the national legislation) and then vectorised in order to be more easily stored in a spatial database and visualized in the WMA. Based on the categorization and required visualization either simple or combined filter was used. Again, PolygonSymbolizer tag was used for specification of the data appearance. Stroke colour in this case was identical with the respective polygon fill.
For the S(Georelief) type, i.e. georelief (terrain) representation, we applied three various subtypes: numeric -S(Georelief, Num), contour -S(Georelief, Isohyps) and hypsometric -S(Georelief, Hyps). The numeric subtype is applied within the nomenclature over the orthophotomap; selected hills are labelled with their name and elevation. The contour and hypsometric visualizations of terrain can be found in the thematic subgroup Terrain within Environmental indicators. Contours are labelled; in order to achieve the correct orientation of the labels, the contours had to be generated with consistent orientation and vendor option forcing left to right orientation of the labels had to be disabled. Some other vendor options, including label grouping and following line, were used to ensure the labels are drawn correctly. For hypsometric map we applied the combined principle of the colour scheme (Pravda, Kusendová 2007), starting from darker hues going through lighter back to darker for the highest elevations, with changes every 100 meters.

Data intervals and colour schemes
Another aspect needed to consider was selection of colour schemes and definition of classes for relevant data. For qualitative data (e.g. land cover, geomorphological division, tectonic map) we used qualitative colour schemes. To style Corine Land Cover data predefined scheme (EEA 2013) was used. For geomorphological division of the Slovak republic the colour scheme was based on the scheme of its authors (Mazúr, Lukniš . For quantitative data, we approached the selection of colours exploiting the fact that data were grouped into thematic groups and subgroups (Table 5). The colour schemes were either inspired by the ColorBrewer website (Brewer 2014) or designed by the authors. Number of intervals varied between 3 (for choropleth maps of 8 Slovak regions) to 11 (air quality SO 2 ). In case of choropleth mapping (S AD (M,Int) type), number of intervals depended on number of regions, range and distribution of data and where applicable (in case of time series) spatiotemporal data variability (e.g. for a single map of 8 Slovak regions 4 intervals would be excessive (generally three intervals were created), with an exception of a time series with high temporal variability, as in case of distribution of whooping cough or scarlet fever in the regions, where 5 and 4 intervals, respectively, were created). For data where no data value was present, separate interval was created and filled with grey colour (#CCCCCC). If there was a high frequency of zero values (mostly in case of municipalities, but also some cases of districts), separate interval for zero values was created, and filled with white colour. For S AC (M,Isogr) number of intervals depended on data variability mostly.
Intervals were limited by minimum and maximum values of data or data series. Their boundaries were determined by various methods depending on the data nature, including quantiles, natural breaks (Jenks) and manual settings. For dataset of elements in groundwater and soils, values of limit and recommended concentration levels, where applicable, were incorporated into the interval boundaries. To enhance the visual communication of these values, obvious change in colour was applied at such boundaries.

Data series
Data for which time series were available were visualized in annual resolution. The way of displaying the times series differs depending on the syntactic map type. S AC (M,Isogr) time series data (air quality data) differ in geometry, therefore each year is stored as one layer, published in GeoServer, referred to SLD style and made available in the layer tree within respective thematic subgroup (Fig. 2). The style is same for all years, since it is only geometry changing, intervals remain the same.
For choropleth maps (S AD (M,Int)) geometry does not change over time, it is the value of attributes, which changes. Thus, there is no need to have separate layers for the annual data, all can be stored in one attribute table under different attribute names. In this way one layer is published in GeoServer with one default SLD style (for average values of the available time span). SLD styles for annual data can be then added as additional styles. This way of manipulating SLD styles enables selection of year in a dropdown menu (Fig. 3), where data are ordered from the latest to the oldest with the average at the top. The ordering is based on the alphabetic order of the codes that GeoServer automatically generates for each SLD file. Due to a bug in the software, however, the order had to be adjusted manually. Again, for better comparison of data between years, based on the literature review by Brewer and Pickle (2002), intervals for all years are designed to be the same and boundaries to be rounded numbers. This approach facilitates easier understanding of absolute differences and changes in time and is suitable for users of the WMA from the general public who are lacking cartographic education.
The same approach was applied for styling of gender data series. Selected data are visualized separately by sex (i.e. men, women, and together).

Labels
Text labels are styled using TextSymbolizer tag. In the presented WMA, several label layers were created, one as a part of the orthophotomap, labelling towns and villages, mountain summits (name and elevation), water bodies and geomorphological units (Fig. 4), one as a separate nomenclature layer in the thematic group of Topology and other labelling geomorphological division and terrain contours. For all labels, we used FiraSans font, an open-source sans-serif font, because it is free, web-friendly and available in 16 weights, enabling big styling variability.

Conclusions
The presented WMA represents a complex solution for visualization of a wide range of spatial data from the field of environmental health at one place with the aim to facilitate easier comprehension of the complicated relations between human health and environment. The application should serve to a wide audience consisting of various professionals as well as general public. Therefore, close attention was paid to data styling so the data visualization is as clear and unambiguous as possible. In the paper we presented and compared styling approaches for several syntactic map types using Open Geospatial Consortium SLD standard, which is a standard styling technique of GeoServer. SLD schema with its extensions proved to be suitable and sufficient approach for styling of various syntactic map types. Moreover, the application functionality, such as zooming, transparency setting, identification and others, support the data visualization and make the use of data easier and friendlier.
However, there are several limitations of our data styling in the WMA, with regard to styling in general rather than SLD styling. Issue, which is most pronounced for health and demographic data and municipality level of territorial units, is rate reliability. Many municipalities in Slovakia have very few inhabitants (while others have many) and in such cases even with one occurrence of the event rates might be artificially high and unreliable. The reliability could be visualized (in a separate layer or even within the same layer), however, due to really high number of layers that needed to be styled (around 300), this was not manageable by now.
Another improvement that would help users understand the data are animations for time series. Currently, time series visualization is operated in two ways: either by separate layers for available years (when geometry of data changes) or by changing the style of a layer (when geometry does not change, such as for data at the level of administrative units). This obviously does not represent the most appropriate way of time series visualization because it requires excessive user interaction that is not necessary with animations. Nevertheless, time series visualization is not the main objective of the WMA and therefore animations are not available.
To conclude, we have built a complex web mapping system for visualization and basic spatial analysis of environmental, demographic and health data at one place available to professionals as well as general public. Styling the huge amount of data to effectively communicate their content to audiences that are not cartographically educated was challenging and timeconsuming. Despite our efforts, there are still challenges remaining due to the complexity of the task.

Funding
This publication is the result of the project implementation: Comenius University Science Park -2. phase ITMS 2014+ 313021D075 supported by the Research & Innovation Operational Programme funded by the European Regional Development Fund.  His research work is concerned with aerial image processing with digital photogrammetry methods into the form of digital orthophotomaps including automatic procedures of digital aerial image processing -digital aerotriangulation using automatic image matching, discrete altitude point field collection using digital image correlation, and evaluation of the obtained results.
Vladimír PELECH is a PhD candidate at the Department of Cartography, Geoinformatics and Remote Sensing, Faculty of Natural Sciences, Comenius University in Bratislava and a researcher at the Comenius University Science Park in Bratislava, Slovakia. Focus of his research is spatial relationship of selected essential elements in groundwater and cardiovascular diseases. Topic of his dissertation is Methods and tools for modelling geographic information sources for monitoring environmental health.
Tomáš SCHMIDT is a PhD candidate at the Department of Cartography, Geoinformatics and Remote Sensing, Faculty of Natural Sciences, Comenius University in Bratislava and a researcher at the Comenius University Science Park in Bratislava, Slovakia. His areas of interest include database systems, spatial database systems, data analysis, geographical information systems, and web programming.
Hana STANKOVÁ, PhD is a researcher at the Department of Cartography, Geoinformatics and Remote Sensing, Faculty of Natural Sciences, Comenius University in Bratislava and at the Comenius University Science Park in Bratislava, Slovakia. Her main research areas of interests include remote sensing data interpretation, object based image analysis, land cover mapping/ change detection and environmental health.
Juraj VALIŠ, PhD is a researcher at the Department of Cartography, Geoinformatics and Remote Sensing, Faculty of Natural Sciences, Comenius University in Bratislava and at the Comenius University Science Park in Bratislava, Slovakia. His research focuses on geoinformatics, geodesy, real estate cadastre, spatial data harmonization, and implementation of the Directive 2007/2/EC (INSPIRE).
Eva MIČIETOVÁ, Dr Assoc. Prof., PhD is a head of the Department of Cartography, Geoinformatics and Remote Sensing, Faculty of Natural Sciences, Comenius University in Bratislava, Slovakia, and scientific coordinator of activity 2.5 -Enviromedicine for the 21st century -Geographical Information System and Environmental Health of the project University Science Park, ITMS: 26240220086. Her professional specialization includes geoinformatics, geographical databases, spatial modelling of environmental health.