Predicting Road Accidents

Road accidents carry heavy human and financial costs in every country. Our trials using road traffic data from the Netherlands’ indicate that a proportion of these accidents can be mitigated using predictive modeling to determine risk and suggest interventions before accidents occur.

In the Netherlands, each year there are approximately 20,000 incidents on the main highways, and many more on other minor roads. With the advent of autonomous vehicle technology, one may hope that incidents will be drastically reduced or perhaps completely eliminated. Since continuously improving robotic systems will eclipse human driving ability, virtually eliminating common errors that cause accidents. However, for an unknown period of time human-operated and self-driving cars will share the same roads.

With either human or robotic drivers at the wheel the chances of an accident depend largely on the state of the vehicle’s surroundings, the traffic and environmental conditions. To prevent further accidents, we want to understand when and how these external conditions can be manipulated by traffic-management interventions. The first step towards answering this is to study the predictability of accidents and derive an understanding of the factors that increase the probability there will be a collision. Some of these factors are intuitive, for example the density of cars in a road section, or having limited visibility. However, these factors do not always lead to an accident, and we would like to know if with a detailed understanding of traffic and environmental conditions we can better predict an incident, than a prediction based on averages and intuitive factors. Modern predictive algorithms, in particular deep neural networks, may be able to single out patterns that increase the probability of an incident more than average.

In collaboration with the Dutch Transportation Ministry Rijkswaterstaat. We we have the opportunity to test our predictive models on the national highway system, which is continuously monitored by probably the most dense system of sensors in the world. Magnetic induction loops measure flow (# of vehicles/min), average speed, vehicle size and vehicle type per lane every 0.5-1km on all national highways. These massive data generated by this system is complemented by a detailed database of
incidents, with exact location, duration and other features, compiled from multiple sources. We have begun the incident prediction study selecting segments of 10km to 20km from several highways, in areas with high rates of incidents. Thankfully, collisions and serious incidents are relatively rare in the Netherlands relative to the number of cars and density of the road network, the largest challenge is to tune the system to predict rare events.

We use feedforward and convolutional neural networks (with convolutions in space along the highway and also in time), and every dataset-balancing trick in the book to estimate the probability that an incident will occur within the next 30-60 minutes, given current traffic and weather conditions. The network input consists of space-time “images” that are a snapshot of current (e.g. last 15-30 minute) traffic dynamics and weather conditions, labeled by impending incidents, and the networks are
trained to perform binary image classification.

he results above show that predictive ability of our model declines as the length of the segment is reduced, and the examples of incidents that the network can learn from drops. In effect the network can identify conditions that make an incident likely, but as the segment becomes shorter it is harder to identify individual incidents. We can plot the network output against actual incidents to see this:

o avoid reaching hard limits of our predictive system when reducing segment length we tested a different setup, in which we turn from binary to multi-class classification and take long highway segments at the network input. The classes correspond to shorter segments where the incident is predicted to happen. In this way the balance issue is reduced significantly, and the network is given the task to localize the incident on one of the short segments, based on the differential of risk conditions, as the task of predicting an incident on the total long stretch becomes easier. We’ll report on the results of these tests in part II.