MogileFS
MogileFS provides static file replication across multiple servers. It is fully open source, simple, easy to setup and easy to use. All in all, it seems like a great solution. Only time will tell if it holds up to the demands of a high traffic website.
MogileFS is a Filesystem…But Different
So what is the purpose behind using MogileFS? And are there other good options? MogileFS is simply a system for storing data. For those of you out there that are Windows users, you probably use the FAT filesystem or the NTFS filesystem to store your documents, photos, etc. Linux users commonly use ext3 or a large variety of other filesystems to store their files. MogileFS is very much like that. It is a filesystem.
However, MogileFS is not intended for typical filesystem usage for a few reasons. For starters, it is not a POSIX compliant filesystem. You can’t format a partition of your hard drive as MogileFS. There is a FUSE library out there that makes MogileFS seem more like a normal filesystem, but it is experimental and it not how MogileFS is typically used.
Another reason MogileFS is not typical is that there is no such thing as a directory structure. Files are stored using keys. You simply choose a key (or identifier) for the file and store it using that. There is a little more structure to it than that, which we will discuss later, but that is the gist of it.
MogileFS Protects Against Data Loss and Data Unavailability
What MogileFS is good for is protecting data from being lost and from being unavailable. A good starting point for comparison here is RAID. If you are not familiar with RAID, it is a way of storing data across multiple hard drives on a system so that if one of the hard drives fails, you don’t lose any data. RAID will therefore protect you from data loss, however if you are using this RAID to server images for a website and the entire server goes down, you won’t lose any data, but your images won’t be available until you get the server back up.
MogileFS protects your data by storing it on multiple disks that can be located on different servers. When a file is retrieved, one server is tried. If that server fails, another server is tried. In this way, you can have a hard disk fail or even a whole server fail and you won’t lose any data or have any down time. Beautiful, isn’t it?
How MogileFS Works
MogileFS accomplishes this in about as simple a manner as I can think of, which I appreciate. Simple is usually better and more dependable. More moving parts provide more opportunities for failure. MogileFS uses three main applications which work together: the storage daemons, the tracker daemons and the clients. I refer to each of these in plural because you can have as many of each of them as you would like, although only having one of each of them is possible as well.
First off, if you don’t know what a daemon is, you’re probably going to get confused in a hurry. In windows a daemon is usually called a service. Basically, it is just something that always runs in the background, as opposed to an application that you open to use and close when you are done. The storage and tracker daemons must be daemons because they must always be running and ready to receive instructions or commands.
The storage daemons are somewhat dumb. They are either given a file to store or asked to retrieve a file. The tracker daemons are the brains of the operation. They track which files exist on which storage daemons and make sure they are replicated the right number of times (we’ll about how to set this next). The client works with the trackers and the storage daemons to save and retrieve files.
File Organization in MogileFS
Files are organized by a little bit more than just keys. There are also domains and classes. Domains might also be referred to as namespaces or perhaps buckets. Inside of a domain, file keys must all be unique. Within domains, classes are defined. The purpose of classes is to set properties for the files that belong to that class. Most notably is how many times the files should be replicated. Let’s say you are storing images and you are also generating thumbnails of those images. If you lose the thumbnails, it is not much of a problem since you can regenerate them using the originals. So you might chose to replicate those files only twice or maybe even just once. However, the originals are very important since they can’t be replced, so you might put those in a separate class and set that class to be replicated several times. Keep in mind that file keys must still be unique across all classes within a domain.
The last important thing to note is that to access MogileFS, you must use the client. There is a command line utility provided which allows administrative-style access to the system. There is also an API written in Perl for accessing the system. As a result, MogileFS is not something you would just browse through like you would a normal filesystem.
More Information
- MogileFS Project
- Best MogileFS Setup Instructions that I’ve found
- When you download the source for MogileFS (which the instructions explain how to do), you will get the storage daemons, the tracker daemons and the API for Perl. There are also clients available for a variety of other languages. You can view a list of them here.