Monday, July 26, 2010

Large file systems (raw notes)

I have decided to post some of my 2008/2009 documents containing raw resources about distributed processing. These documents contain just raw material and personal notes. I hope I will have the time to make them more presentable and post them. Below is the first from a larger set of post (about 30) I will create soon.


Some time ago I had to find a file system that supports really large files (TB/PB).
Here is the starting point (raw notes) from where I started my research.

File systems
 A file system (often also written as filesystem) is a method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system.

Types of file systems

  • File systems with built in fault tolerance
  • Shared disk file systems
  • Distributed file systems
  • Distributed fault tolerant file systems
  • Distributed parallel file systems
  • Distributed parallel fault tolerant file systems
  • Fault tolerant file systems

Comparison of file systems

SAN - Storage area network

A storage area network (SAN) is an architecture to attach remote computer storage devices (such as disk arrays, tape libraries, and optical jukeboxes) to servers in such a way that the devices appear as locally attached to the operating system. Although the cost and complexity of SANs are dropping, they are still uncommon outside larger enterprises.
Network attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block.

NFS - Network File System (protocol)

Network File System (NFS) is a network file system protocol originally developed by Sun Microsystems in 1984, allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard defined in RFCs, allowing anyone to implement the protocol.

NFS is the "Network File System" for Unix and Linux operating systems. It allows files to be shared transparently between servers, desktops, laptops etc. It is a client/server application that allows a user to view, store and update files on a remote computer as though they were on their own computer. Using NFS, the user or a system administrator can mount all or a portion of a file system.

CIFS is the "Common Internet File System" used by Windows operating systems for file sharing. CIFS uses the client/server programming model. A client program makes a request of a server program (usually in another computer) for access to a file or to pass a message to a program that runs in the server computer. The server takes the requested action and returns a response. CIFS is a public or open variation of the Server Message Block Protocol (SMB) developed and used by Microsoft, and it uses the TCP/IP protocol.
NFS and CIFS are the primary file systems used in NAS. CIFS tends to be a bit more "chatty" in its communications.


XFS is a high-performance journaling file system created by Silicon Graphics, originally for their IRIX operating system and later ported to Linux kernel. XFS is particularly proficient at handling large files and at offering smooth data transfers.

The CXFS file system (Clustered XFS) is a proprietary distributed networked file system designed by Silicon Graphics (SGI) specifically to be used in a Storage area network (SAN) environment.

A significant difference between CXFS and other distributed file systems is that data and metadata are managed separately from each other. CXFS provides direct access to data via the SAN for all hosts which will act as clients. This means that a client is able to access file data via the fiber connection to the SAN, rather than over an Ethernet network (as is the case in most other distributed file systems, like NFS). File metadata however, is managed via a metadata broker. The metadata communication is performed via TCP/IP and Ethernet.

Another difference is that file locks are managed by the metadata broker, rather than the individual host clients. This results in the elimination of a number of problems which typically plague distributed file systems.

Though CXFS supports having a heterogeneous environment (including Solaris, Linux, Mac OS X, AIX and Windows), either SGI's IRIX Operating System or Linux is required to be installed on the host which acts as the metadata broker.



No comments:

Post a Comment