Dr Mark Walters worked with IBM to improve the performance of large databases
4 July 2012
Databases are a fact of modern life. They power all sorts of systems – from vast communications networks to discrete repositories of information. In order to be effective they need to be quick, reliable and able to scale – without needing all the information to be transferred to a central repository.
The big challenge comes in designing a database that possesses all of these qualities. Dr Mark Walters worked with colleagues at IBM to address this challenge.
As the name suggests, distributed databases use a model whereby information is not held centrally; instead fragments are held throughout a network. Only the result of a query is communicated back to the user. Some mobile phone networks rely on distributed databases, as do military units operating on a battlefield.
For example, armoured military vehicles are fitted with sensors that can monitor the weather: wind, rainfall, temperature. This information does not need to be monitored constantly, but it does need to be available when requested.
Over time, members of an established database may change as new people (or, in the example above, sensors) join and others leave. The network underpinning this database must be strong enough to cope with these kinds of changes.
“Networks often work best when they are growing,” explains Mark. “It’s when they become static that problems arise. Older networks become ‘flatter’, less hierarchical. This can mean that it takes a greater number of ‘hops’ for the query to reach the information it is seeking, and this slows everything down.”
IBM created the Gaian database – a distributed database – in response to the need for a product that scales effectively.
The underlying network is formed using preferential attachment, a model designed to allow for changes in scale. It possesses several desirable properties, such as resilience to failure of nodes and low communication overheads.
During his time with IBM, Mark questioned whether or not these qualities would be retained as the network grew and evolved. He explored this question from a mathematical viewpoint, drawing on geometric graph theory.
“It was quite hard initially to pin down what was wanted, in terms of prioritising desirable network properties,” explains Mark. “At first it seemed that having a low communications overhead was the most important thing. But of course this has costs in other areas – it may make it less resilient. Defining priorities was an essential first step.”
Mark's research showed that, contrary to expectation, the network that supports the Gaian database does not retain its desirable qualities as it evolves. On the contrary, it becomes almost uniform in structure. Further research will be needed to find a model that stays scale-free as it evolves.
“Ultimately it’s a question of balancing competing priorities and coming up with as good a solution as possible. It is unrealistic to expect that we will be able to create a network that possesses all desirable qualities in equal proportions,” says Mark.
Mark's research findings will feed into improving the functioning of the network underlying IBMs Gaian database.
Even though the work was initially targeted towards military applications, this sort of distributed database has very wide reaching civilian applications.
For example, Facebook can be viewed as a giant database. Turning this into a distributed database where each user has control over their own information, and the information is only transferred if someone wishes to see it, would reduce both bandwidth and server costs, as well as reducing privacy concerns.
The collaboration established a good working relationship between the two organisations, and Mark aims to work with IBM in the future.
For media information, contact: