Sharing files with virtual machines using NFS

In a previous post, I sang the virtues of VMWare Fusion’s shared folders feature, and the way it lets one share files from the host OS to the guest OS.

Folder sharing is a bit of a pain to install, especially because OS upgrades on the guest seem to break the guest tools installation. The solution is one of the oldest file sharing technologies there is: NFS. It’s easy to set up, works on basically everything, and is solid as a rock. It also supports symlinks, which means you can sidestep VirtualBox’s issues with shared folders and symlinks.

The best bit is that once it’s set up and works you can switch between any virtualisation technology you like (for example: I’m now using VirtualBox at work so we can make better future use of Vagrant, but I use VMWare for Windows)—this works at the OS level so it’ll just keep working.

Configuration time

This is the easy bit. First we export whichever folder we want to share from the host OS. In my case it’s ~/Projects. Edit /etc/exports as sudo (this won’t exist by default on OS X), and add the following line to it:

/Users/bradleyw/Projects -mapall=501:20 -network 192.168.56.0 -mask 255.255.255.0

The first part is obviously the directory you want to share, and the -network flag tells us which IP range to share with. In my case my VM listens on 192.168.56.101, hence 192.168.56.0. The rest of the flags you can ignore.

Now we need to run a few commands in the OS X terminal to complete this side of the configuration:

sudo nfsd checkexports
showmount -e

If everything went well no errors should be reported and your exported directory should be printed to stdout.

Now on the guest OS, you first need to install the NFS software. On Debian and Ubuntu this is:

sudo apt-get install nfs-common

If you’re on another distribution this is an exercise for the reader.

We can now configure fstab so mount knows what to do. As a super user, edit /etc/fstab, and add the following line at the bottom:

192.168.56.1:/Users/bradleyw/Projects /mnt/nfs nfs soft,intr,rsize=8192,wsize=8192 0 0

The important bits to change are 192.168.56.1, which should be the IP your guest can see your host at (so the IP address your virtual machine uses to hit OS X), the path to your export, and the /mnt/nfs, which can be anything you want. I use /mnt/nfs as it seemed the right thing to do.

Note that you need to create /mnt/nfs before proceeding: sudo mkdir -p /mnt/nfs.

Now we can attempt to mount the shared filesystem: sudo mount /mnt/nfs. If no errors are reported, it’s all good!

You should now be able to read and write from /mnt/nfs.

My own workflow

My own workflow with this is to have a case-insensitive disc image that’s password protected automount on login (this is exported as /Volumes/Smarkets). Then the Linux machine starts in headless mode, and because of NFS it’s already mounted. So I’m up and running very quickly after logging in, and the case-insensitive disc image gets around all the issues Python has when exporting from OS X to case-sensitive file systems.

Hunter helped me a lot with some of the details here.

Issues mounting shared folders in VMWare Fusion 3

I recently took advantage of the VMWare 3 upgrade for only $9.99 USD, which has all gone well, except for one issue with /etc/fstab: it turns out that VMWare Fusion 3 tools doesn’t actually respect your mount settings, which means you lose all the permissions information contained therein.

There’s a solution to the mount problem here, which works perfectly after a reboot. I hope VMWare fixes this soon, though: it’s a pretty nasty bug.

Using Dropbox as a Git repository

So last month I wrote a bit about setting up your own personal Git repositories on a Linux box, and how to use that for sharing code.

I’ve had a slight epiphany since then: what if I just used the awesome Dropbox (my referral link, if you’re likely to sign up) to share Git repositories between computers? Dropbox seems able to get through most corporate firewalls (my previous employer blocked SSH, for example), and is very unobtrusive in its synchronisation behaviour.

Enough introductions, make with the commands

Okay, here we go. Basically, we’re just going add a new remote which points at Dropbox (in the same way the origin remote typically points at your primary external repository). Please note these instructions should be mostly *Nix agnostic—but they’re only tested on OS X.

First, create the Git repository in Dropbox (assuming your repository is named myrepo):

cd ~/Dropbox
mkdir -p repos/myrepo.git
cd !$
git --bare init

And that’s the repository created. Basically we made a bare repository in the Dropbox directory.

Now we can add the new remote to our existing repository (again, assuming it lives at ~/Projects/myrepo).

cd ~/Projects/myrepo
git remote add dropbox file://$HOME/Dropbox/repos/myrepo.git
git push dropbox master

And we’re done. We’ve created the repository, linked a Git remote to it, and pushed the master branch to the repository. This Git repository will now be available on all computers that your Dropbox account is.

Pulling from the repository

When you get to a computer that shares this Dropbox account, but hasn’t checked out Git yet, do as follows:

cd ~/Projects
git clone -o dropbox file://$HOME/Dropbox/repos/myrepo.git

Which will add your repository locally, and automatically set up a remote called dropbox which auto–merges with master.

I think this approach could be valuable for things like keeping personal documents or text files in version control (or indeed personal coding projects) without bothering to set up your own Linux box or server. Git really does make these things incredibly easy.

How to set up your own private Git server on Linux

Update 2: as pointed out by Tim Huegdon, several comments on a Hacker News thread pointing here, and the excellent Pro Git book, Gitolite seems to be a better solution for multi-user hosted Git than Gitosis. I particularly like the branch–level permissions aspect, and what that means for business teams. I’ve left the original article intact.

Update: the ever–vigilant Mike West has pointed out that my instructions for permissions and git checkout were slightly askew. These errors have been rectified.

One of the things I’m attempting to achieve this year is simplifying my life somewhat. Given how much of my life revolves around technology, a large part of this will be consolidating the various services I consume (and often pay for). The mention of payment is important, as up until now I’ve been paying the awesome GitHub for their basic plan.

I don’t have many private repositories with them, and all of them are strictly private code (this blog; Amanda’s blog templates and styles; and some other bits) which don’t require collaborators. For this reason, paying money to GitHub (awesome though they may be) seemed wasteful.

So I decided to move all my private repositories to my own server. This is how I did it.

Set up the server

These instructions were performed on a Debian 5 “Lenny” box, so assume them to be the same on Ubuntu. Substitute the package installation commands as required if you’re on an alternative distribution.

First, if you haven’t done so already, add your public key to the server:

ssh myuser@server.com mkdir .ssh
scp ~/.ssh/id_rsa.pub myuser@server.com:.ssh/authorized_keys

Now we can SSH into our server and install Git:

ssh myserver.com
sudo apt-get update
sudo apt-get install git-core

…and that’s it.

Adding a user

If you intend to share these repositories with any collaborators, at this point you’ll either:

We’ll be following the latter option. So, add a Git user:

sudo adduser git

Now you’ll need to add your public key to the Git user’s authorized_keys:

sudo mkdir /home/git/.ssh
sudo cp ~/.ssh/authorized_keys /home/git/.ssh/
sudo chown -R git:git /home/git/.ssh
sudo chmod 700 !$
sudo chmod 600 /home/git/.ssh/*

Now you’ll be able to authenticate as the Git user via SSH. Test it out:

ssh git@myserver.com

Add your repositories

If you were to not share the repositories, and just wanted to access them for yourself (like I did, since I have no collaborators), you’d do the following as yourself. Otherwise, do it as the Git user we added above.

If using the Git user, log in as them:

login git

Now we can create our repositories:

mkdir myrepo.git
cd !$
git --bare init

The last steps creates an empty repository. We’re assuming you already have a local repository that you just want to push to a remote server.

Repeat that last step for each remote Git repository you want.

Log out of the server as the remaining operations will be completed on your local machine.

Configure your development machine

First, we add the remotes to your local machine. If you’ve already defined a remote named origin (for example, if you followed GitHub’s instructions), you’ll want to delete the remote first:

git remote rm origin

Now we can add our new remote:

git remote add origin git@server.com:myrepo.git
git push origin master

And that’s it. You’ll probably also want to make sure you add a default merge and remote:

git config branch.master.remote origin && git config branch.master.merge refs/heads/master

And that’s all. Now you can push/pull from origin as much as you like, and it’ll be stored remotely on your own myserver.com remote repository.

Bonus points: Make SSH more secure

This has been extensively covered by the excellent Slicehost tutorial, but just to recap:

Edit the SSH config:

sudo vi /etc/ssh/sshd_config

And change the following values:

Port 2207
...
PermitRootLogin no
...
AllowUsers myuser git
...
PasswordAuthentication no

Where 2207 is a port of your choosing. Make sure to add this so your Git remote:

git remote add origin ssh://git@myserver.com:2207/~/myrepo.git

Using Nginx as reverse proxy

It’s common knowledge that when you’re serving a web application you shouldn’t use a standard Apache install to serve static assets, as it comes with too much overhead. I won’t go into the details of why here, as it’s been covered by many other people better qualified than I.

What I can do, however, is tell you how I set up Nginx, which is a super light–weight web server, on my VPS here on Slicehost (who are awesome, by the way).

Quick theory

Just quickly, the theory is that Nginx listens on port 80, and subsequently sends requests for certain URL patterns through to the mod_wsgi server (in my case, Apache) listening on a different port. This server currently serves the meat of my Django site. Static assets (JS, CSS, images) are served directly from Nginx without ever touching Apache.

Assumptions

We assume several things for this article:

  • You’re comfortable with a command line;
  • You’re using Ubuntu or Debian (I use apt-get quite a lot);
  • You have sudo access to a server; and
  • You’re already serving Django or similar on Apache and just want to replace the static/front-end.

First steps

Firstly you’ll need the basic tools to install Nginx:

sudo apt-get install libpcre3 libpcre3-dev libpcrecpp0 \
libssl-dev zlib1g-dev make

What we’re installing here is the minimum amount of tools needed to run GZip and URL re–writing with Nginx.

Get Nginx

At the time of writing, the latest stable version of Nginx was 0.6.32, so let’s get that. Note that we need full source code as the version that ships with Ubuntu and Debian is 0.5.3 or similar, which doesn’t have URL rewriting or GZip compression (both of which I really want).

mkdir ~/src
cd !$
wget http://sysoev.ru/nginx/nginx-0.6.32.tar.gz
tar -zxvf nginx-0.6.32.tar.gz
cd nginx-0.6.32

So we downloaded the source, de–compressed it, and went into the directory that was created.

Compile Nginx

We have a few different options to run here, most of which are personal taste. Feel free to modify as required:

./configure --pid-path=/var/run/nginx.pid \
--conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/local/sbin \
--with-http_ssl_module --user=www-data --group=www-data \
--http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log

The only thing I would say should be kept there is the PID file path and the user/group configuration. The user/group matches the accounts that Apache uses, so it keeps everything under the same user structure. If you want to use a different user account, be sure to create this user before running ./configure.

The above command will spit out a set of paths for your convenience: these should look similar to the following:

nginx path prefix: "/usr/local/nginx"
nginx binary file: "/usr/local/sbin"
nginx configuration prefix: "/etc/nginx"
nginx configuration file: "/etc/nginx/nginx.conf"
nginx pid file: "/var/run/nginx.pid"
nginx error log file: "/usr/local/nginx/logs/error.log"
nginx http access log file: "/usr/local/nginx/logs/access.log"
nginx http client request body temporary files: "/usr/local/nginx/client_body_temp"
nginx http proxy temporary files: "/usr/local/nginx/proxy_temp"
nginx http fastcgi temporary files: "/usr/local/nginx/fastcgi_temp"

You may want to copy them somewhere for posterity.

Then we do the usual make/make install dance.

make
sudo make install

Nginx will now have started, but won’t be running because Apache is using port 80, and Nginx is very helpful and fails silently.

Swap Apache and Nginx

First we need to stop Apache:

sudo apache2ctl stop

Then we start Nginx:

sudo /usr/local/sbin/nginx

Note that the path to nginx will be different depending on what value (if any) you used in the ./configure stage.

If you now navigate to your IP address, you should see a “Welcome to Nginx!” message. Great!

Make Apache listen on a different port

I chose port 8080, since that seemed sensible and symmetrical.

sudo vi /etc/apache2/ports.conf

And change the value to something you can remember.

sudo apache2ctl start

And navigate to your old site but with :8080 appended to the IP address. You should see your old site there. (Note: I’ve added extra information about Apache at the end of this article).

Configure Nginx

Nginx comes with some initial configuration, but here’s what I use:

# smart default nginx (Ubuntu 7.10)

user                www-data www-data;
worker_processes    2;

error_log           /var/log/nginx/error.log warn;
pid                 /var/run/nginx.pid;

events {
    worker_connections  1024;
    use epoll;
}

http {
    # allow long server names
    server_names_hash_bucket_size 64;
    
    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log          /var/log/nginx/access.log;
    
    # spool uploads to disk instead of clobbering downstream servers
    client_body_temp_path /var/spool/nginx-client-body 1 2;
    client_max_body_size 32m;
    client_body_buffer_size    128k;
    
    server_tokens       off;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         off;

    keepalive_timeout   5;
    
    ## Compression
    gzip on;
    gzip_http_version 1.0;
    gzip_comp_level 2;
    gzip_proxied any;
    gzip_min_length  1100;
    gzip_buffers 16 8k;
    gzip_types text/plain text/html text/css application/x-javascript \
        text/xml application/xml application/xml+rss text/javascript;
    # Some version of IE 6 don't handle compression well on some mime-types, 
    # so just disable for them
    gzip_disable "MSIE [1-6].(?!.*SV1)";
    # Set a vary header so downstream proxies don't send cached gzipped 
    # content to IE6
    gzip_vary on;
    
    # proxy settings
    proxy_redirect     off;

    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    proxy_max_temp_file_size 0;

    proxy_connect_timeout      90;
    proxy_send_timeout         90;
    proxy_read_timeout         90;

    proxy_buffer_size          4k;
    proxy_buffers              4 32k;
    proxy_busy_buffers_size    64k;
    proxy_temp_file_write_size 64k;

    include             /etc/nginx/sites-enabled/*;

}

Note that this is the primary configuration, which if you’d followed the above installation verbatim would be at /etc/nginx/nginx.conf.

To test that this configuration works, we add a simple localhost configuration file:

sudo mkdir /etc/nginx/sites-enabled
sudo vi /etc/nginx/sites-enabled/localhost.conf

And put the following configuration into it:

server {
    listen       80;
    server_name  localhost;

    location / {
        root   html;
        index  index.html index.htm;
    }
}

Proxy requests to Apache

Now we need to send requests to Apache. This is actually very simple:

sudo vi /etc/nginx/sites-enabled/testproject.conf

We’re pretending that your domain is at testproject.com for the purposes of this exercise.

Enter the following into your domain config:

# primary server - proxypass to Django
server {
    listen       80;
    server_name  dev.testproject.com;

    access_log  off;
    error_log off;

    # proxy to Apache 2 and mod_python
    location / {
        proxy_pass         http://127.0.0.1:8080/;
        proxy_redirect     off;

        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_max_temp_file_size 0;

        client_max_body_size       10m;
        client_body_buffer_size    128k;

        proxy_connect_timeout      90;
        proxy_send_timeout         90;
        proxy_read_timeout         90;

        proxy_buffer_size          4k;
        proxy_buffers              4 32k;
        proxy_busy_buffers_size    64k;
        proxy_temp_file_write_size 64k;
    }
}

Again, the IP address and locations of configuration files depend on whether you changed anything during the process so far.

That’s it!

When you next start Nginx, it should send all requests through to Apache on port 8080, and your memory overhead should start coming down.

What next?

In the next instalment we’re going to set up Nginx as a static content server, in order to bypass Apache completely for anything non–dynamic.

Enjoy!

Additional reading

This article is based on the hard work of those awesome people over at Slicehost, and my experience on their servers.

Update:

Gareth Rushgrove mentioned to me at work that if you’re not exposing Apache to the world on port 80, you probably shouldn’t let it listen to any interface except loopback (otherwise people can see your dynamic site on http://yourdomain.com:8080). This isn’t an issue for me because I firewall almost every port except 80, but in case you’re interested here’s how to configure Apache:

sudo vim /etc/apache2/ports.conf

And add 127.0.0.1: before the port number you’re using for your Apache, for example:

Listen 127.0.0.1:8080

Now restart Apache and you should be secure that only Nginx is receiving HTTP requests from the outside world (or “The Internets”, as we in the industry call it).

To check what interfaces are listening, period, use this command: netstat -pant.