Machine Learning, Artificial Intelligence and the like are all the rage these days. These techniques are being applied in a vast arrays of disciplines from manufacturing to healthcare. They are obviously very heavily used in most aspects of Computing and IT. One area that has a lot of interest but not a lot of progress is networking. There are lots of reasons for this but in my opinion one of the biggest hurdles is the lack of an accurate mathematical model for networks.


There are a wide array of thoughts on why this is, but I do believe it is achievable, at least in some part. Because networks tend to be diverse and unpredictable, pinning networks down into a model that is consistent and comprehensive is a challenge.


There are fundamental differences in how traffic is constructed for transport across IP based networks. Traffic can be connection-oriented, e.g. TCP, or connectionless, e.g. UDP. How that traffic is created, transported and received is very different. However, both types share the same queues bandwidth to get from source to destination. Connectionless protocols follow the same methodology used by IP itself, which is an unreliable connectionless delivery system for data in the form of packets. How they deal with loss, delay, duplication, or out of order delivery is left the applications using this transport. Since there is now standard to how an application must deal with, or not deal with, such issues adds layers of complexity to an accurate model of networks. The model for a purely connectionless network is different than a purely connection oriented one. Since almost no network will be purely made of one or the other, the most common network is by far a hybrid of the two.


It would be great if you could break a network down into each type, determine mathematically what each one looks like, and then create a linear combination of the two to create a final accurate model of the environment. But it is not quite so simple. There is no reason to think any network is a linear combination of both types of transport. In fact, there is no guarantee that each type of transport is a linear combination of its own constituents, though I suspect connection oriented systems such as those using TCP are more readily defined by this model, since it provides many more mechanisms for removing randomness and noise from the process.


There are at least four layers of complexity that need to be accounted for when building a mathematical model for networks:

  1. Physical
  2. Network (Interface and Internet delivery – could be two separate layers)
  3. Transport (Connection and connectionless)
  4. Application (A lot goes into this layer)


Since there is still a lot of unknown in the definition of a network model at layers 1-3, hardly any consideration is given to how the 4th layer impacts that model. This is where the true strength of machine learning comes into play, allowing computers to more quickly and efficiently identify relationships and build complex models that may change frequently and in very non-intuitive ways.


Since Analytical Modeling is a mathematical representation of the network, systems and applications in the environment it seems to be the best way to create the overall model. But it is static, looking at past data and making predictions that have varying degrees of accuracy. This means it if it is off the mark you need to go back and rebuild the model based on this new data and start over. Because of this manual process there is a delay between figuring out how the environment should look and being able to actually make it look that way. Discrete-Event Simulation is looking in real-time at how the environment acts and responds. The problem is that last word, simulation. You are using a tool or set of tools to SIMULATE the environment and not actually using production information. These simulations can also take a long time to create and any changes introduced to the production environment while the simulation is still running causes you to have to stop and restart the simulation with the new parameters added in.


Another big problem with mathematically modeling a network is reflected in this comparison. Complexity and detail will naturally be lost when trying to generalize an environment with an all encompassing mathematical representation. There should be no expectation that a single formula or even a set of formulas can completely characterize your network.


In Dr. David Meyer’s Presentation: “Beyond Siri: Applying Machine Learning To Network Automation” From Open Networking Summit 2017, he describes some of the challenges we face when trying to apply machine learning techniques to networking.  The issue revolves around the nature of the data and how un-friendly it is to ML.


The statement that Machine Learning for network data requires novel approaches is an accurate one, but not an insurmountable problem in my opinion. So what are these novel approaches? To get there we need to a journey through a great many different areas. We will need to understand the current state of machine learning and how it is used in other areas such as image processing and natural language. Since some of the best techniques emulate the natural world we will need some understanding of animal behavior and neuroscience. And of course we will need to have an in-depth understanding of networking and its relationship to the larger IT ecosystem.


I think it makes sense to start with our background in networking. When I say background I am really talking about the mathematical underpinnings of computer networks. This includes routing and switching algorithms, queuing theory and performance analysis. We will need to find ways to relate this to the other areas of computing as mentioned by Dr. Meyer. Once we have this grounding, we will need to understand how we can gather information from all areas of Computer Networks, such as physical server hardware, operating systems, storage devices, applications as well as networks.


Once we get the networking and related stuff out of the way, I will move on to a background in Artificial Intelligence, Machine Learning and other related topics (neural networks, deep learning, etc.). With this background in place I will discuss how it can be applied in the area of computer networking.


The final discussion will center around more advanced topics describing ways to better apply ML to networks such as natural evolution strategies, partial swarm optimization and the like. Throughout all of this I will demonstrate ways in which I have been able to successfully apply these techniques and create an environment for not predictive modelling for performance optimizations and failures, but also prescriptive modelling techniques that allow us to actually find ways to prevent future failure and performance degradation, doing so in near real-time. It is a system that constantly learns and trains itself, changing and evolving as the environment itself does.


I will attempt to explain how this can be accomplished in a series of articles that will cover a broad range of topics. As I build this body of knowledge I will be providing examples from software I have written to achieve these results.