Below is the 3rd of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinions. ;)
Laurens Duijvesteijn asks:
Q3: I intent (sic) to provide quite a lot of media to my internal network, if I choose for virtualisation, will the VMs be able to access the disk space outside of the container? I do not want to create TB size containers (or should I?). I will probably use the SMB protocol here.
How To Access Extra Storage from Inside a VM
If you think of a VM, a virtual machine, as just another machine on the network, then you’ll understand that you can access storage like you would from anywhere on your network. In part 2, I explained that you want to setup storage services as NFS, iSCSI, CIFS, whatever. Each of these methods are client/server based. That means you need a client and a server for them to work. Linux has free clients and free servers for each of them. This works for entire virtual machine storage or just accessing a single MP3 file while streaming music that you own around the house. It also means that your media doesn’t need to be part of the media server VM itself.
The storage methodology best for use in your systems are dependent on the virtualization technology deployed. Be certain to check this before you begin. If you aren’t certain of you choices, run a tiny test first.
Segment Your Storage
When you work in enterprises with more than 30,000 physical machines, you learn how to become flexible with your storage. Here’s crash course.
- Put only your OS inside the VM Virtual File System Keep is small and tight. 5-10GB should be enough even if you go crazy with apt-get install or yum.
- Try to keep your specially created application installations under /opt or /usr/local and external to the VM file system. You can NFS mount these to all the VMs which need it. This is not for data, just programs that you compile. Mounting them as read-only provides an extra layer of security. Consider doing this for your static web server content. If all those web servers which were defaced, had mounted their static pages as read-only, do you think they would have been defaced? Yes, I know it is more complex than that.
- Put your HOME directories on externally mountable (NFS is nice) storage from the storage server. Your can mount the same HOME to many different VMs. You just need to keep the uid numbers the same or use a centralized user management system like NIS+ or LDAP POSIX accounts. For home users, it is just easier to make the userid and groups match on all the different daily use VMs. Most VMs won’t be of that sort anyway, so having a specific userid on all of them simply isn’t needed.
- Put your large data files, like video media, on externally mountable storage from the storage server. Keep this stuff separate from your HOME.
- You can share the same storage, like for video files, using CIFS and NFS, to different clients.
- Put your backups on different volumes, different physical disks. There are lots and lots of ways to perform backups. Reference other articles here or wait for that article later in this series. I don’t recall if that question was asked or not.
- To make your life easier later, please do not create storage volumes that are too large, 1TB is probably the sweet spot these days. You want them to be sized to match the size you can easily backup with room for incremental backups on the backup media too.
- Until a few days ago, I was happily using just 10GB of storage for my HOME directory. Don’t oversize storage. You can always add more later through LVM or some other migration technique.
If you do these things, you’ll have a fairly flexible storage setup that can handle all sorts of planned and unplanned needs.
If you’re running running a Linux-based XBMC machine elsewhere on your network, then you will probably want to mount 3 partitions from your storage and you don’t need a media server VM at all. I have TV, Movies, and Music mount points to my XBMC machine. I could merge all of those, but the size limitations impact my storage setups too. As your video library increases, you may need to add more storage and more storage mounts. That is pretty easy and you can do it either inside the XBMC user interface or at the OS layer and let XBMC access those files as local media. Your choice.
If you need to transcode the files on the fly due to other media center client limitations, only then would you need a transcoding server. An XBox360 or PS3 has limited support for video codecs. I have 3 media playback devices
- XBMC Linux-based – plays all SD content, but the hardware cannot play 720p or higher resolutions due to CPU limits.
- WD TV Live HD – silent 1080p playback for almost all video codecs that I use. It is also a CIFS file server.
- MediaGate MG35 – plays my older SD content since it was the first media player on the network. Also has a 300G PATA USB HDD which can be connected to other devices and shared.
Each of these have different ways to connect to the storage server, but most use CIFS. For video streaming, this is fine.
All this networking will happen extremely quickly for storage and systems inside the same physical machine, assuming you use virtio drivers for networking and storage. If you have multiple machines and do this, you’ll want to use wired ethernet connections. WiFi should not be used for NFS or iSCSI connections. 100base-tx or GigE, 1000 base-tx are best. You may wish to build a storage network that all your storage packets traverse on. Faster is better as always.
SMB, Samba, CIFS are all the same name for Microsoft’s network file sharing technology. Using SMB is probably a good idea for a few specific applications – sharing storage with MS-Windows machines and sharing files with certain media playback devices. SMB has changed over time, which is why old Windows machines don’t always easily connect with Windows Vista or Windows7 sharing machines.
For Linux-to-Linux file sharing, you really want to avoid CIFS and use iSCSI or NFS. NFS, Network File System, has been around since long before CIFS was invented. NFS brings some overhead AND some capabilities. For accessing block data, iSCSI is better. This means for database systems or even VMDK storage, using iSCSI would be the preferred remote storage method.
For NFS, I recommend using the automounter with spongy mounts, not hard, not soft, but it behaves like hard mounts when it can.
iSCSI lets your storage server look like an enterprise-class storage server without the enterprise-class price. That’s pretty cool for free software.
Direct VM Access to Storage
It is possible to directly access storage hardware through a VM, but not with normal desktop-class equipment. You need server class systems. I do not have any of this equipment at home. I suspect 99% of you do not either. There may be ways to directly access PCI hardware without using VT-d, but I have never heard of it. Some virtualization servers may have a way to solve this. If you are considering using USB storage and USB passthru, don’t bother. USB is too slow.
Backup are necessary for all data, all VMs, all HOME directories. When you are buying storage, be certain to allocate cheap and slow disks for backups. When you allocate primary storage, think about the backup storage. If you allocate 3TB in a single partition, then you want 3.5TB for the backup storage. Today, nobody makes 3.5TB disk drives and even if they were made, I wouldn’t trust my data, even backup data to them. Backup storage needs to be simple, easy to manage, easy to move, and really easy to recover. Avoid using RAID-whatever for backup storage – that includes striping or RAID0.
For HOME directories, I like automatic, daily, or even hourly, rdiff-backups or Back-In-Time snapshots. I like to retain 90 days worth of these. Obviously, we can’t do this for huge HD media files. For those, a straight mirror is probably the most realistic solution. Regardless, we need a backup. That backup may be on the original optical disk.
Backups for virtual machines can be performed in many different ways. Having a full copy of the VMDK/VDI/IMG file is nice, but not realistic to retain 30+ days of incremental backups since incremental backups don’t work very well on large files (anything over 2GB in size).
Storage design is the second most important thing here. By following the suggestions above, you will provide huge flexibility for growing storage needs. Resist the desire to have 1 large shared volume with everything.
Network design is more important when you start growing beyond just a few physical machines. For a home user, the network design is usually pretty easy. Go as fast as you can with wired connectivity. GigE is pretty affordable for any home network. High quality GigE network cards for desktops are $25. You just need to ensure that your systems motherboards and laptops all include GigE ports, not 100base-tx.
Thanks to Laurens Duijvesteijn for letting me use these questions.