Numerico Road Segmentation

Segmentation is the act or process of splitting into segments. In image processing the term may refer to splitting images into semantically coherent units, or in business dividing a consumer market into subsets of consumers based on some characteristics. In the context of traffic management and the prediction of incidents our objective is to partition the road network into some finite, and consistent road sections meeting certain criteria such that we can accurately and reliably predict incident risk.

At Numerico working with large geospatial datasets we are interested in sub-dividing large continuous networks such as a road transportation network into meaningful groupings of road sections. This discrete representation of a continuous system is a prerequisite to further analysis and modelling, where the requirements of our segmentation are dependent on the analysis or modelling we will perform. The discretisation of the system to segments allows us to derive the same features per segment, and so to fuse underlying data streams into a uniform input for the predictive algorithm.

Typically road networks will be evenly divided into sections at regular intervals by some marker to provide reference points, for example at every kilometer of road. These references are extremely useful for emergency services and maintenance engineers to direct them to specific points along the road where their presence is required. A naive segmentation approach could use these arbitrary intervals to divide the road network, however this does not account for the complexity of road sections for example at junctions where flows are related between different roads and on/off ramps. Our segmentation sought to overcome these drawbacks by creating logic to create road segments using information of the traffic flow direction, the density of induction loop sites, and the length of segments.

To enable meaningful incident risk prediction, it is necessary to have a segmentation that includes the entire road network. As a foundational segmentation we decided to use a segmentation provided by NWB in a shapefile that segmented the road based on road properties, in addition to this we then apply a second layer of logic to combine these segments.

The process of dividing a complex road network (see above) into meaningful groupings is non-trivial and required combining several data sources to achieve the desired outcome.

The required data came in several forms:

* An XML file with live induction loop locations – these data are updated every 4-5 days as measurement sites go on and offline.

* An ERSI shapefile of the road network – this file contains individual line segments with road features and is updated yearly.

* An ERSI shapefile with directionality of roads encoded – this file gives information on travel time between parts of the network and importantly has directionality encoded.

The aim of our segmentation method was to join contiguous road sections with consistent traffic direction such that each segment contains a minimum of two measurement site locations, whilst minimising the overall segment length. In order to achieve this information from the above sources needed merging and rules defined for joining road sections.

Merging of geospatial data may be performed as a spatial join or an attribute level join with a primary key. For the data we were using we needed to perform spatial joins. The approach to our segmentation is described in the below steps:

  1. The XML file with induction loops was parsed and converted to the EPSG:28992 coordinate reference system. Measurement sites were assigned to road sections by creating a spatial buffer of 15 m around each measurement site, and finding roads that intersect with the location. The measurement site was allocated to the closest road section. The road network was then filtered to RWS managed roads and multisite locations analysed.
  2. A key challenge for us was in joining road sections with the same direction of travel, in order to do this we had to join two non-aligned shapefiles. To do this we took a spatial intersection and where the overlap of the two areas was greater than 10 % we inherited directionality into our road network. This gave us directions for most of our road sections, and for the remainder we searched neighbouring segments and iteratively assigned based on the cartesian products. This was repeated until no more segment directions could be assigned.
  3. The crucial step in this segmentation pipeline was to join contiguous segments to ensure all segments have at least two measurement sites. To do this we created an adjacency matrix using the directional information that gives us information on the upstream neighbours for each segment. Segments without neighbours were removed at this stage. Merging was then performed by joining segments first by choosing adjacent upstream segments with the minimum number of sites, and if two potential merging segments have the same number of sites to join the shorter of the two. This process was repeated iteratively on the whole network until no more merging could occur.
  4. Having at minimum two sites in a segment was a requirement since this is useful in looking at statistical aggregates of those two sites, and making difference calculations. A final step in our segmentation was to determine the order of the measurement sites with respect to the direction of travel. For this we measured from the start point of each segment the distance to each site and then established a site ordering by distance.

The result of this segmentation approach was checked by doing a statistical analysis, looking at the number of loops, the segment length, and the number of incidents in each segment. We validated that the distribution of incidents over all the segments is sufficient to use the segmentation for incident risk prediction.

This segmentation pipeline gives a good foundation for further analysis and the development of many more projects built upon a reasoned and consistent network segmentation.


– [limitations of shapefiles](

– [coordinate reference systems](