Many students in undergrad think the only way to explore Computer Science in graduate studies is by studying Artificial Intelligence and Machine Learning. However, there are many interseting areas to explore in computer science that many are not aware of. Including myself, when I was trying to choose the topic that I want to work on for graduate studies, I wasn’t aware of the interesting problems that exist in the area of systems and networking.
In this article I want to point out to three interesting areas in systems and networking.
Making Datacenters Faster, Faster, and Faster!
Nowadays, datacenters and cloud computing are improving at an unprecedented speed. More and more workloads are being migrated to the cloud. Applications running in cloud requirer tighter SLAs (Service Level Agreements). This demands has led to research to improve the aforementioned areas. The speed of switches and routers in data centers are being improved. New switches can forward packets at 100 Gb/s speed. On the other hand, operating systems were designed for the era of dial up and KB/s speed. This design has led to operating system being converted to the bottleneck for high speed communication between applications. To improve end-to-end performance of applications in cloud computing environments researchers have tried different approaches:
Creating Custom End-To-End Application Level Protocols
If you look at the papers released by popular systems conferences, every year you can find a paper about this topic. The best paper award winner of NSDI ‘19 (one of the leading systems conferences) was about desigining a RPC protocol for datacenters. If you are interested to read more about it see this paper .
Offloading Traffic Processing to FPGAs, GPU, and DPDK
One of the ways to improve the performance of packet processing is using custom hardware for packet processing. The first solution is using FPGAs. FPGAs are a piece of hardware that enable dynamic reconifguration of connection between different sections of it. It enables usage of software to configure circuit (hardware).
The other solution is using GPUs. GPUs provide an array of small CPUs. GPUs were originally created for graphic processing. But other types of applications showed similar attributes as graphic processing. Recently, Machine Learning and packet procssing has been offloaded to GPUs to increase performance.
DPDK (Data Plane Development Kit) tries to enable fast packet processing on CPUs. Various CPU vendors such as Intel provide their own DPDK to optimize packet processing on their own CPU architectures.
Systems to Improve the Performance of Machine Learning
Machine Learning generally contains various stages such as preprocessing, training, and serving. There are problems associated with each of these stages. For example, training of large DNNs can be very time-consuming and resource-hungry. We may not be able to transfer data from devices to the cloud for learning because of privacy concerns (see federated learning).
In the model serving phase we may want to serve multiple-models and enable caching, low-latency models. These are the examples in which systems can help ML.
These are some of the interesting papers that have been published in this area:
- Scaling Distributed Machine Learning with the Parameter Server
- Clipper: A Low-Latency Online Prediction Serving System
With the advent of IoT devices, a new paradigm for computing is becoming more and more popular. The idea of edge computing is to put the processing of applications as close to the user as possible. This can be become challenging when the resources which are close to user, have limited capacity. Using this approach, we can enable mission critical low-latency applications.
These are some of the interesting papers in this area:
- Distributing Deep Neural Networks with Containerized Partitions at the Edge
- An Edge-based Framework for Cooperation in Internet of Things Applications
These are just some of the areas that are I found interesting in systems and networking. There are lot of other interesting areas that I may discuss in a later post on this topic!