Parallel Linux Utilities for NFS

Introduction

Linux has common utilities such as List files (ls), remove files (rm), find files (find), change file permissions (chown, chmod), and copy (cp). Developers, admins or other users of Linux systems use these utilities on a daily basis. Designed to work on mounted file systems, these utilities are also quite useful while writing Shell scripts to automate work. They work with disk-based and network-based file systems. Their usage in case of network files system is by mounting the network file systems (e.g. NFS) to a local directory. The utilities are executed on the local mountpoint. The file system driver on the system helps execute the utilities.

Let’s discuss some advantages and disadvantages of using the mounted approach to run Linux utilities for network file systems.

Advantages:

  • No need to change existing Linux utilities
  • The quick development of tools/scripts around them

Disadvantages:

  • Linux utilities are basically single-threaded applications
  • Every operation has to go through a whole file system stack of the operating system
  • Not efficient for all type of workloads, especially parallel data processing

What is LIBNFS?

LIBNFS is a client library for accessing NFS shares over a network. Check out this project at https://github.com/sahlberg/libnfs. LIBNFS offers three different APIs, for different use cases:

  • RAW: A fully asynchronous, low-level RPC library for NFS protocols
  • NFS ASYNC: A fully asynchronous library
  • NFS SYNC: A synchronous library

Any of the Linux utilities (ls, cp, etc) can be developed with different types of APIs provided by libnfs, which allow you to run without mounting the NFS share point. It gives the flexibility to create multiple connections to the NFS server. Multiple connections enable a utility to handle any operation in parallel and without going through the file system stack.

Parallel Copy (PCP)

Let’s consider the case of parallel copy utility, which is similar to the CP utility (http://man7.org/linux/man-pages/man1/cp.1.html) in Linux. As we know, the primary work of CP is to copy data from a source directory to a destination directory. Linux CP supports a lot of functionalities around copy operation, but we will not be discussing that. We will focus on how parallel data copy is done for NFS.

Take the case of copying massive data (in hundreds of TB) from one NFS server to another. A copy could be due to migration of data or replication use cases. Modern NFS servers could be of type scale-out or parallel NFS and underline storage could be of Flash storage type. These types of storage support multiple connections to a server. To leverage this type of NFS storage, we also need utilities to take advantage of it.

A typical high-level design for any parallel utility could be as shown in the diagram below:

Connection

Each connection is a communication endpoint between the PCP utility and the NFS server. It uses the NFS context from LIBNFS. The portmapper service takes care of providing an NFS connection.

Sessions

A session is a logical collection of the NFS connection. Typically, both the source and destination NFS servers have one session each. PCP reads data from the source session and writes to the destination session. The read/write calls internally need to distribute to connections to achieve parallelism.

Multi-threading

Because there are multiple connections and massive data to process, using multi-threading in the PCP utility is required. One approach could be to use a thread pool across a session or one thread per connection. An IO request could be distributed to threads and connections without starving or overloading either. This is up to the implementer.

Conclusion

  • Leveraging the storage capacity to achieve massive parallel data copy is possible.
  • PCP can support a CP-like option from the main page so that the existing tool/scripts can be used with PCP too.
  • Utilities such as PLS, PRM would be just metadata operations, which will have simpler and parallel implementations.
 
Share:

Related Posts

cloud storage vs. on Premises storage

Cloud Storage vs. On-Premises Storage: A Comparative Analysis

Enterprises in today’s digital landscape, be they Large/ Small Medium Enterprises (L/SMEs) or startups, face a perpetual dilemma – how to manage their data, applications, and technology…

Share:
Understanding the Potential of Storage and Security in IoT

Understanding the Potential of Storage and Security in IoT

The potential of storage and security in IoT plays a significant role in transforming industries and the lives of people. However, tackling challenges such as data isolation, interoperability, and scalability will be essential in underpinning this potential. To embrace the full potential of storage and security in IoT involves a holistic method, incorporating technological advancements with comprehensive tactics. Read the blog to understand the potential of security and storage in the IoT ecosystem, its challenges, and keyways to overcome them.

Share:
MicroStream: Modernizing Data Storage

MicroStream: Modernizing Data Storage

MicroStream is an efficient Java persistence framework meant for continuous object storage and recovery. MicroStream’s lightweight solution and in-memory storage policy enables rapid development and augments application speed, making it an ideal choice for microservices architectures, serverless systems, embedded systems, and real-time applications. Read the blog to explore how MicroStream is reforming the outlook of data persistence.

Share:
The Effects of Edge on Data Storage

The Effects of Edge on Data Storage

Edge computing is creating a ripple effect across the tech industry, and it is most strongly felt in the Data Storage sector. In this article, we explore four major ways Edge computing is changing Data Storage. Read on…

Share:
6 Challenges In Going Cloud-Native - And The Perfect Solution

6 Challenges In Going Cloud-Native – And The Perfect Solution

Going cloud-native carries with it a number of challenges, but danger, properly harnessed, becomes opportunity. Check out this article to see what obstacles lie in your path – click here!

Share:

What CTOs must know about Cloud-Native Development

Cloud-native development offers huge opportunities to companies. CTOs need to know how to navigate the upcoming changes to grab them – this article explores how.

Share: