(2013)

How to configure GlusterFS for a NFS-like behaviour, for good-performance access to files stored on a remote PC or server.


Index:


Introduction

EDIT 28.Jul.2013 - WARNING: Recent versions of GlusterFS have a major performance impact - if possible stick to Glusterfs 3.0 or 3.1.

I needed a way to share the contents of my media directory through multiple Linux PCs (my mediacenter, a VM and other PCs).

I started trying out NFS but unluckily I have never been able to achieve a constant transfer rate higher than 50MB/s over my GbE LAN - it always started very well (up to 100MB/s) but after ~30 seconds or so the throughput started decreasing ending up at around ~20MB/s.
I tried out a lot of different settings, which contributed to delay the start of the decrease of the throughput, but I was never able to completely solve that problem. A very sad story.

The next classical candidate, CIFS/Samba, was in my case just plain disappointing. Don't know if there are any magic settings that unveil its power, but at least in my case I haven't been able to find them.

Continuing my desperate quest for speed, reliability and flexibility I then tried out POHMELFS, CODA, SSHFS, Ceph, OCFS2 (Oracle), Lustre, iSCSI, EnoBD, MooseFS and perhaps something else but they all failed for different reasons - they did not support large files and/or were not flexible enough and/or did not perform and/or were just filesystems and did not provide a server/client infrastructure needed for my need of being able to mount them remotely.

I kept on trying, knowing that by banging my head against the wall for a sufficient amount of time would soon or later result in a victory.
As it usually does, it worked - that's when I found GlusterFS.


GlusterFS - overview

GlusterFS was developed originally by Gluster, Inc. which has just been acquired (if I understood correctly) a few months back by Red Hat.
We will see what Red Hat will do with it, but in any case I don't have to be worried (at least for a while) concerning Oracle buying it, which would most probably result in a full privatization of the project or a complete abandon.

Why did I choose Glusterfs?
Because it's (in no particular order):

  • fast.
    When transferring files between PCs I can reach the limit of my GbE switch with ~100MB/s transferred and the throughput never decreases.
  • stable.
    I've been using GlusterFS since 2009 and never had any problem.
    I even used remote disks installed in a different PC, which were mounted on the GlusterFS-server through NBD with EncFS in between de/encrypting all the I/O and it worked always flawlessly.
  • flexible
    It doesn't matter if you try to mount a directory when the GlusterFS-server is not running or if you shut down the GlusterFS-server while you have at least a PC that has the exported directory mounted.
    None of the PCs will ever hang and whenever the server comes back online the client will almost immediately notice this (as soon as you try to access the files).

While the software itself is great, the documentation is (in my opinion) chaotic, especially concerning its use as a NFS-replacement.
I am not one of the most intelligent persons on the planet and I had to invest quite a lot of time trying to understand which configuration files had to be used where and which options could be set where.
I think that those GlusterFSlings should take a step back and rethink the approach that they want to take with their docs, dividing it perhaps by usage type or by most-simple-to-most-complicated setup, creating a unique options index referencing from/to which versions they are available, etc... .
In any case this is the reason I am writing this article, hoping to create a simple how-to for all the people that have performance problems with NFS and that are willing to give GlusterFS a try.

Btw., there is one problem with GlusterFS (referring to the "(almost)" i mentioned above): the authentication mechanisms that are available allow to only specify the authorized IP-addresses that are allowded to connect to the server - a private/public certificate exchange or some other sort of tricks are not supported at this time, as far as I know :o(


README 1ST!

The setup of GlusterFS can be done in various ways.
From what I understood the main way of setting it up (what's at least referenced most of the time in the docs) is to speak directly (or perhaps more or less directly?) to the storage devices/harddisks, supporting multiple ways of distributing the data accross nodes (PCs that are part of the storage pool of GlusterFS) - e.g. by completely replacing a local raid array, by mirroring the data over multiple server locations, etc... . This is all very interesting but is out of scope for me.
All I was interested in was to make the contents of some directories available to other hosts without touching the filesystems/directories themselves, and this is what I am describing here. More or less the way that NFS works.

Please be aware that from what I remember, GlusterFS currently officially supports only the ext3 and XFS filesystems (meaning that they'll help you only if you submit problems involving these two filesystems - others might still technically work).
It wasn't a problem for me as my raid5 was anyway already on XFS and as I did not want to change that (I can defragment and grow the FS like other more recent filesystems, and anyway for some reason XFS is getting recently a lot of updates to keep it young and healthy).

You will have to set up at least 2 configuration files - one for the server and one for the client - which is what I am showing here.
You could probably use multiple configuration files for each of the subvolumes that you integrate, but to keep things simple here I will do everything using only 1 file for each of the client and server.

I am using Gentoo Linux.
Steps needed for other distributions might differ.

The directories that are exported will have to reside on a filesystem that supports extended attributes.
To check if your mounted XFS-filesystem has extended attributes enabled:
================
xfs_info /mnt/memstick/ | grep -i attr
= sectsz=512 attr=2
================
If the "attr=2" is visible, then you should probably be ok.
I saw some kernel configuration options called "Extended attributes" under "Filesystems" (in the kernel config menu) for other filesystems, but don't ask me how to doublecheck that for a running system.

A last thing - you were probably starting getting suspicious without the famous statement saying that...
I won't be responsible for the data that you might lose by following these guidelines. Please back up your data on a separate medium before applying any change mentioned in this page.
:o)


Server configuration

Install ("emerge" in Gentoo) the package "sys-cluster/glusterfs".
For this how-to I used the version 3.2.6.
Versions 3.1.x might hang when the server is not ready, while version 3.0.x seemed alright.
GlusterFS is continuously changing, so if you get into troubles doublecheck that the options I mention here are still valid if you're not using the version 3.2.6.

Once you have GlusterFS installed you have to make sure that the file "/etc/glusterfs/glusterfsd.vol" contains a valid configuration, which is the main worry that you'll have. The file "/etc/glusterfs/glusterfsd.vol" can as well be a symlink to some other file.

In my case I wrote a file called "server.vol", which I symlinked to "/etc/glusterfs/glusterfsd.vol":
===============
vaio # ls -lh /etc/glusterfs/
total 8.0K
lrwxr-xr-x 1 root root   53 Jul 15 19:28 glusterfsd.vol -> /mydir/glusterfsconfig/server.vol
===============

Simple configuration file

The simpliest contents of the file "server.vol" (aka "glusterfsd.vol") can be like this:
===============
volume myvolume-directory
        type storage/posix
#       Use the next option only for test purposes if you did not enable extended attributes for your filesystem...
#       ...otherwise you will get the error message "Extended attribute not supported, exiting."!!!
#       option mandate-attribute off
        option directory /mnt/memstick
end-volume

volume server
        type protocol/server
        option transport-type tcp
        option transport.socket.listen-port 45000

        option auth.addr.myvolume-directory.allow 127.0.0.1,10.0.*.*
        subvolumes myvolume-directory
end-volume
===============

You see above 3 things:

  • the subvolumes should always be above what is referencing them ("myvolume-directory").
    You can choose any name you want.
    I think that the "server"-volume is mandatory and has to be called like that :o|
  • you can allow multiple IP-addresses by listing them with a comma and/or "*".
  • The directory that I am making available to the other PCs is in the top-most section, and it's called "/mnt/memstick".

 

Complex configuration file

A more complicated setup of the file "server.vol" (aka "glusterfsd.vol") could look like this:
===============
volume myvolume-directory
        type storage/posix
#       Use the next option only for test purposes if you did not enable extended attributes for your filesystem!!!
#       Otherwise you will get the error message "Extended attribute not supported, exiting."  
#       option mandate-attribute off
        option directory /mnt/memstick
end-volume

volume myvolume-iothreads
  type performance/io-threads
  option thread-count 8 # default value is 16?
  subvolumes myvolume-directory
end-volume

volume myvolume-writeback
  type performance/write-behind
  option flush-behind on    # default value is 'off'
  option window-size 20MB
#  option aggregate-size 1MB # default value is 0 #Option is not valid anymore?
  option disable-for-first-nbytes 128KB #default is 1
  subvolumes myvolume-iothreads      
end-volume

volume server
        type protocol/server
        option transport-type tcp
        option transport.socket.listen-port 45000

        option auth.addr.myvolume-writeback.allow 127.0.0.1,10.0.*.*
        subvolumes myvolume-writeback
end-volume
===============

You see that, starting from the "server" volume, all the other ones are called as follows:
server => myvolume-writeback => myvolume-iothreads => myvolume-directory

You can specify different options for the different types of volumes: have a look here for a listing. They're referring to "translators" in there and while I never understood what they mean with that, they're just volume-type options to me. Took me a while to assimilate this as I kept on interpreting a "volume" as something physical like disks, partitions, etc..., but that does not seem to be the case here.

This is it, on the server-side.
Once you have your configuration in place just run "/etc/init.d/glusterfsd start" (in Gentoo - other distributions might differ) and have a look at the output of "/var/log/glusterfs/glusterfsd.log".
If everything in the log file seems alright, then you're ready to go.
If anything is not ok, with "ps -Af | grep -i gluster" you won't find any process running, as GlusterFS has most probably aborted & exited immediately.


Client configuration

Not much to do here.

Simple configuration file

If I wanted the client to connect to a server which uses the "simple" configuration mentioned above I would write a file ("client.vol") containing the following:
================
volume myvolume-directory
        type protocol/client
        option transport-type tcp
        option remote-host vaio
        option remote-port 45000
        option remote-subvolume myvolume-directory
end-volume
================

You see that I mention here the "myvolume-directory", which is where I specified the security options (the IP-addresses allowded to connect to the server) that I set in the simple server setup at the top.

Complex configuration file

Practically the same, but I have to refer to the volume "myvolume-writeback", as in the complex server setup I set the authentication on that level:
=================
volume myvolume-directory
        type protocol/client
        option transport-type tcp
        option remote-host vaio
        option remote-port 45000
        option remote-subvolume myvolume-writeback
end-volume
=================

Now give it a try and run on the client "glusterfs --volfile=/mydir/glusterfsconfig/client.vol /mnt/myserver/" and you should see in "/mnt/myserver" the files which are on the server in the directory "/mnt/memstick" (or whatever you have set - I used a USB-stick formatted with XFS to write this tutorial).
By the way I called the server that I used to test this article "vaio" (used to be called "hp" but that one miserabely died after I spilled on it a cup of coke), so exchange that with the name of your own server and make sure that you have that entry in the "/etc/hosts"-file in both the client & server and that your firewall is not blocking the traffic - perhaps use the same PC for the first test.


Links