Distributed Systems

Project Information


Project overview

In this sequence of labs, you'll build a multi-server file system called Yet-Another File System (yfs) in the spirit of Frangipani. At the end of all the labs, your file server architecture will look like this:

You'll write a file server process, labeled yfs above, using the FUSE toolkit. Each client host will run a copy of yfs. yfs will appear to local applications on the same machine by registering via FUSE to receive file system events from the operating system. The yfs extent server will store all the file system data on an extent server on the network, instead of on a local disk. yfs servers on multiple client hosts can share the file system by sharing a single extent server.

This architecture is appealing because (in principle) it shouldn't slow down very much as you add client hosts. Most of the complexity is in the per-client yfs program, so new clients make use of their own CPUs rather than competing with existing clients for the server's CPU. The extent server is shared, but hopefully it's simple and fast enough to handle a large number of clients. In contrast, a conventional NFS server is pretty complex (it has a complete file system implementation) so it's more likely to be a bottleneck when shared by many NFS clients.

Lab assignments

Lab 1 - Lock Server

Lab 2 - Basic File Server
Lab 3 - File Server: Reading, Writing and Sharing Files
Lab 4 - MKDIR, REMOVE, and Locking
Lab 5 - Caching Lock Server
Lab 6 - Caching Extent Server + Consistency
Lab 7 - Paxos
Lab 8 - Replicated lock server

Collaboration Policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution. You may discuss the assignments with other students, but you may not look at or copy each other’s code.

Programming Environment

You should be able to do Lab 1 on any Unix-style machine, including your own Linux/FreeBSD desktops, or MacOS laptops.

For Labs 2 and beyond, you'll need to use a computer that has FUSE and its development headers installed. You should be able to install these on your own machine by installing the relevant packages. We outline instructions on Ubuntu/Debian and Fedora machines. Most other Linux installations and many other Unix-like systems should work as well. However, the official environment for the labs is the VirtualBox image provided below.

Note that you if you have your own FreeBSD or MacOS machines that you prefer to use for programming, you should be able to use them for the majority of the work. However, there are minor annoying differences between FUSE on Linux and FUSE on other operating systems that may cause your code to fail our tests when it seems to pass for you. As an example, on Linux FUSE passes file creation operation to the file system using the MKNOD call, while on other systems it uses CREATE. Please ensure that your assignment passes the tests in the official environment, and there shouldn't be any problems when we test it.

Git repository for the class

The git repository is located at https://gitlab.mpi-sws.org/ds-ws16/yfs-lab.git.

Linux VirtualBox image with FUSE

We've created a VirtualBox machine image that contains everything needed for developing and testing the lab code. VirtualBox can be obtained for free from here.

·       OS: 64-bit Fedora 24

·       User “ds”: no password.

·       Root user: password is “ “ (one space). Use “su” to gain root access in a terminal. (“sudo <command>” should also work.)

·       Common package manager commands: dnf install <package>, dnf search <package>


We will try to assist students as much as possible with issues related to the environment, but ultimately it is on the students to fix them.

 

How can I copy files from the host machine into the VM or vice versa?

We recommend that you set up a shared folder in VirtualBox. See the VirtualBox documentation for details. Inside your VM, you can mount the shared folder using the following command:

sudo mount -t vboxsf -o uid=$UID,gid=$(id -g) share ~/host

“share” is the name of the shared folder, and “~/host” should be an existing empty folder on your VM.

 

Alternatively, you can try using scp inside the virtual machine. This requires SSH access to the machine to which the files should be copied to/from. The host machine can be accessed from inside the VM with the IP address 10.0.2.2.

Installing FUSE on your own computer

Note: not necessary. You can simply use the VirtualBox image we provide.

Install FUSE and its development files like this:

Ubuntu/Debian

sudo apt install libfuse2 libfuse-dev

Fedora

sudo dnf install fuse fuse-libs fuse-devel

Aids for working on labs

There are a number of resources available to help you with the lab portion of this course:

All the labs use the POSIX threads API (pthreads). A comprehensive guide to programming with pthreads can be found here.

The labs use the FUSE interface to plug the lab file system into the operating system. See the FUSE website for more information.

printf statements are always your friend when debugging any kind of problem in your programs. However, when programming in C/C++, you should always be familiar with gdb, the GNU debugger. You may find this gdb reference useful. Below introduces a few gdb tips for complete newbies:

If your program is crashing (segmentation fault), type gdb program core where program is the name of the binary executable to examine the core file. If you don't find the core file anywhere, type ulimit -c unlimited before starting your program again. Once inside gdb, type bt to examine the stack trace when the segmentation fault happened.

While your programming is running, you can attach gdb to it by typing gdb program 1234. Again, program is the name of the binary executable. 1234 is the process number of your running program. Of course, you can choose to run your program with gdb from the beginning. If so, simply type gdb program. Then at the gdb prompt, type run.

While in gdb, you can set breakpoints (use gdb command b) to stop the execution at specific points, examine variable contents (use gdb command p), etc.

To apply a given gdb command to all threads in your program, prepend thread apply all to your command. For example, thread apply all bt dumps the backtrace for all threads.

Check out the GDB manual for full documentation.

W. Richard Stevens' books “UNIX Network Programming” Volume 1 and 2 are classic references for network programming. If you are struggling with the sockets interface it could be a helpful purchase. See the suggested books list for other helpful references.


Questions or comments regarding this course? Please use the general course mailing list or the teaching staff mailing list.

Top // Distributed Systems //


Imprint | Data Protection