Below is the 2nd of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinion. ;)
Laurens Duijvesteijn asks:
Q2: I read everywhere about Virtualisation, should I directly install packages to the base system to provide services, or should I virtualise all services? What are the advantages here?
Advantages of Virtualization
The list of advantages is long, but with those advantages comes a few disadvantages. I cannot hope to point out all the advantages, so I’ll limit it to just the main ones.
We’ll be talking about OS-level virtualization. There are other kinds which have advantages too. I won’t differentiate between paravirtualization and Whole Machine Virtualization beyond saying that paravirtual VMs share the kernel with the hostOS and the clientOSes need to be modified to play-nice in this way. There are ways to run other paravirtual systems under Xen where the same kernel is not involved. I’ve never done that. All the clientOSes running under paravirtual need to be the same type of OS, so they run the same kernel, but the do not need to be identical OS distributions. Paravirtual VMs are usually much faster and use fewer system resources, but because the kernel is shared, security may also be breached easier.
Flexibility is the main feature that using virtualization brings. How?
Virtualization Hides Hardware Details
This is huge. When you virtualize a server, you usually create fake hardware for the OS to run under. That means that the actual hardware used, the actual north-bridge and south-bridge and network cards and disk controllers and the disks themselves do not matter to the client OS. This also means that if you need to relocate the VM to different hardware, as long as you use the same virtual hardware, the client OS won’t know the difference. Think about that in an enterprise. You have a mix of desktops and servers, all from different manufacturers purchased over the last 5 years. The hardware is similar, but different. With virtualization, those differences do not matter.
Exact Virtual Machine Copies Are Trivial
With a virtual machine, it is very clear exactly which parts are included in the VM. There’s no wonder whether the boot manager needs to be included in the backup. Virtual machines are commonly stored on disk in 3 ways.
- Disk files that are allowed to grow as needed – I call these sparse files.
- Disk files that are pre-allocated to the final size – I call these full disk files.
- Direct Access to Disk Partitions – Often, this is through LVM partitions which are presented to the client-VM
Regardless, if you use a full disk file, then when you want to make a copy for any reason, you shutdown the clientOS running there and simply copy the disk file. Simple and you have a perfect copy. These copies can be compressed, transmitted, put on an encrypted HDD and carried anywhere, then brought back up.
If you have a service or application which took a day or a month to properly configure, having the exact settings is critical. If you are deploying a complex application server and need to upgrade between servers, this can be extremely difficult. With virtualization, just copy the VM image file over (or access it from your storage network) from the newer VM which has faster CPUs and more CPUs than was allocated to the old VM. This can be a logical VM-client change or a physical VM-client change to newer, faster hardware. Regardless, the application settings that were optimized previously are brought over saving a day or weeks of resetting the configuration.
Testing Upgrades Imagine you are testing a system upgrade. Being able to run that upgrade against the production system isn’t possible, but running it against an exact copy of the production system over and over is highly advised. Virtualization makes this almost trivial.
Testing Production Data Imagine you are in a development shop, in the test group. Every day your team gets a new version or multiple versions of software from the development team(s). Keeping the test data in a known configuration is really easy with virtualization. That exact copy of the VM can hold production data in a known state. This means you can test fresh installations with ZERO data or upgrade installations filled with carefully selected test data or grab a full production machine and perform the upgrade against real data with no risk. Test teams is where I first saw virtualization used in the mid-to-late 1990s.
It will be the exact same machine in the new location in every way. This can be a problem if the new VM is on the same subnet and static IPs are used because now you have 2 VMs with the same IP. Not good.
Try Different Operating Systems
With virtualization, you can have many different operating systems running without touching the others. That means trying OS/2, DOS, Windows95, Win98, Win2000, WinXP, Win7, multiple BSD and Linux flavors isn’t beyond your reach. You can try any OS that runs on x86 hardware. Try them all – Plan9, RealOS, AmigaOS, etc. Check here for more alternative OSes options. Some many not work as you move outside the currently popular operating systems, so you may find special setup will be needed or that additional research must be performed to get it working. MacOS may be extremely popular, but be certain to learn about the restrictions and carefully read the how-to guides to get OSX running under whatever VM technology you deploy. It may not be possible.
Increase Physical Hardware Utilization
In the business world, physical servers average just 13% utilization. So your company spent $5K – $30K for a server and is only using 13% of the CPU capacity? With virtualization, your company gets 3x-10x more use out of the same hardware. I’ve worked in large companies with 20,000 to over 100,000 employees. We had many, many thousands of servers. Most of those servers were barely used. Sure, a few needed more CPU, more RAM, more disk, but the vast majority of them were almost idle. Even the clustered servers or those with multiple load-sharing front-end servers were often under utilized. They were sized for loads planned 2+ years later. Sometimes that increased load never materialized so the hardware was wasted.
High Availability Easily Achieved
HA as it is called comes in many different flavors. Commonly it is deployed with a primary and a failover server using an Active/Failover server configuration. In this configuration, some virtual machine technology has built-in failover capabilities or you can use standard clustering techniques that have been proven over the last 25+ years.
HA can also be deployed at the networking layer with multiple front-end systems and a reverse-proxy which sends traffic to each front-end server using either loaded or round-robin selection. Most reverse-proxies recognize when a front-end server doesn’t respond and redirects the traffic to another front-end server in the group automatically. When the failed front-end server comes backup, traffic is again sent to that server. It is a wonderful thing. In this mode, logical failures due to bad data input can be mitigated, however, physical issues with a server can take down all the VMs running on that single machine. If you have only 1 server, it is difficult to achieve true HA capability.
Disaster Recovery Possible
If you build your virtual machines to backup either daily or continuously to a remote location, constantly write the database information to both locations, and connect a watcher to the primary system which validates it is handling requests, then you can have a very efficient DR plan. If you simply copy the VM to another location every week where another VM host is available, a quick DNS change and your customers are able to access the same services again after fairly minimal downtime regardless of the actual hardware used at the remote location. If you don’t want to buy the equipment for this, you can rent it from many different providers.
Compartmentalization of Complex Systems
Some software requires complex, exact, configuration. Often different internal experts or a vendor perform this configuration.
- Often the contract with a vendor requires that the organization running the system not touch any settings, and if they do, then correcting the issue is a time & materials cost. I’ve seen the hourly rate for this support at $300/hour.
- Often the system has specialized versions of commonly used libraries. This means that loading other, unconnected, software on the same machine as this main software will introduce unwanted interactions and may not work at all.
- Often a system is critical to a business. If multiple software services are installed and running on the same OS instance, then when an upgrade is performed for one of the systems, all the other systems may be negatively impacted. That is a huge risk.
With virtualized servers, a company can place each critical service onto different virtual machines to reduce the risk of system-to-system negative interactions.
Feel Like You Have a Data Center At Home
Having lots of virtual machines at home can help us understand the issue that people running a data center have. It also means that you don’t need more hardware to bring up a new desktop or server operating system to try out. For example, even if you are just curious about running a newly-release Linux version or even an alpha version of a new OS, through virtualization, you have mitigated the risks to your other computers and operating system installs. Go for it, try Bob’s Kewl Linux Distro without fear.
Build Your Own Cloud
Cloud computing is all the rage these days. How do you gain experience without all the investment or rental costs? Amazon EC2 costs about $2/day per machine in the small configuration. For $6/day, you can have a 3 server cloud with which to test. Or you can run 3 virtual machines on your home server for free. Typical home machines can easily run 3-15 VMs. Yes, 15. In the Linux world, a VM needs between 256MB and 1G of RAM. This will be the limiting factor on the physical machine. If you host has 4GB of RAM like a typical home machine would, then you can run … 512 for the host + 256 for each clientOS – about 12 clientOS VMs. Now that’s a cloud. If you are interested in this, check out Proxmox or Eucalyptus or OpenStack solutions. OpenStack is gaining more traction in corporate and enterprises.
Increase Security with Purpose Built Virtual Machines
The more services that an OS exposes, the more attack vectors there will be against that machine. For internet facing servers, we really would like to have only the exact services necessary running. That means a web server is only a web server, not a database server or a proxy or a firewall. It means that a firewall is only a firewall, a document server is only a document server. You get the idea. This is a best practice for enterprise IT deployments.
When you look at the open ports on an internet facing machine, like a web server, you want to see 1 or 2 ports open. 80 and/or 443, nothing else.
Freeze A VM
When you aren’t actively running a VM, you can just take the storage and hold onto it. That storage doesn’t need to be inside or even connected to a running server. Put it on a DVD or BluRay or old HDD and put it on a shelf. This is a great way to retain an old machine with a specific setup for use later. Imagine you have some old hardware or software that runs under WindowsXP, but will not run under Vista or Windows7. I have some software like that, for example, older versions of tax software or Quicken or games. I keep an old WinXP VM just for this reason. I’ve heard of friends with printers that were not supported by their current OS, but through USB passthru they were able to access and use that printer.
Imagine there’s a game that you enjoy playing. You’d like to play it again at some point, but not for the next year. You’d like all your settings retained. Freeze the VM, store it offline.
Other uses for frozen VMs include development teams who need to support old versions of their software for important clients. Imagine your company has sold 5 years of support for a specific version of your software. You don’t want to dedicate an entire machine to this or stop development for newer versions with newer development tools. In the old days, you’d probably swap out the HDD and store it on a shelf. With the way we swap out desktop computers these days, you’d grab that old HDD with all the old development tools setup and drop it into a new desktop and it won’t boot due to a license issue. Too much of the hardware had changed. With a VM, the hardware will appear to be the same and you can bring it up easily to patch and verify that your team is still able to build the software. The only cost of this special virtual machine is the cost of storage.
Freezing a VM is pretty cool stuff.
Learning Virtualization Will Help Get/Stay Employed
In the corporate world virtualization is a way of life. In many organization, all server software is deployed on virtual servers regardless of what the vendor requirements say. For many years, software vendors were unwilling to support their software on virtual machines. Large enterprise customers demanded support for virtualization and vendors responded appropriately. I can’t think of any enterprise software that is run inside a VM that isn’t fully supported by that vendor. That means all of us win, but it also means you and I need to be experienced with different virtualization technologies. Understanding desktop virtualization is important, but so is understanding server-based virtualization. These are very different and each has different employment opportunities.
Negatives of Virtualization
There are some downsides to this.
Virtualization Adds Complexity, Requires Expertise
When you run 1 OS on a single machine, a computer can be hard enough to manage. Having more instances of an OS make it just a little more difficult. As you add more and different operating systems, the level of effort to keep them all maintained increases. If you run Linux and have a little knowledge, you can automate system patches so running 2 or 200 instances doesn’t necessarily mean more effort. That extra knowledge is expertise.
More Systems to Maintain, Patch, Track
With commercial OSes you need to track licenses. For all operating systems, we need to maintain updates and patches. Doing this can be automatic or manual. I’m leery of allowing systems to patch themselves automatically. I like to know which patches are being applied, which means manually installing them. On Windows, that means pointing and clicking. On Linux it means watching a script and hitting enter every once in a while. Sure, I could completely automate patching.
There’s a new term in virtualization, VM Sprawl. I must admit that I have it too. I’ve frozen far too many VMs over the years. ;)
Commercial Virtualization Is Not Cheap
Commercial virtualization has real costs. Software licenses, tools to manage the VMs, and different software to backup the VMs. VMware has recently changed their licensing, increasing the costs. I haven’t had a chance to look over the new terms so I don’t know how we are impacted, if at all. I know of one very large VMware client who is actively migrating to OpenStack over this increase. They have it running in a lab, but deployment of openstack to over 100 physical servers is planned. That’s 100 servers which will not be running VMware software or management tools. If it works out, the other 5,000 VMware servers will probably be migrated to OpenStack over the next few years.
If I were VMware, I’d pay attention. They have the lead on stability and 3rd party support, but they are making mistakes by increasing license costs when F/LOSS virtualization is becoming better, faster and is already rock solid. Expertise learned with VMware transfers to other virtualization technologies too. For example, where I work, we run VMware ESXi, Xen, KVM and VirtualBox. ESXi is extremely stable. Xen is equally stable over the last 3 years and KVM has not crashed since we started running it about a year ago. Of course, none of these is perfect for all situations. Having a company to blame is important for large enterprises.
VirtualBox was not stable on Linux machines in our testing last year. It cause the hostOS to lock up, in addition to the client OS locking up. Unacceptable. However, for desktop-class virtualization, VirtualBox has been extremely stable the last 6 months, never crashing.
Every month the F/LOSS virtualization tools get incrementally better. VMware isn’t standing still, but OpenStack has many organizations, government and commercial, putting resources into making it enterprise ready. I don’t believe that VMware will survive in mainstream use beyond 5 years, but it will become like UNIX and used under special situations only. OpenStack is maturing that quickly.
Different Thought Schools
Assuming you are sold on virtualization, then you need to decide what, exactly, you will be virtualizing. There are a few different options
- Virtualize everything, absolutely everything
- Virtualize everything, except storage
- Virtualize only those things which are extremely complex and hard to configure
- Virtualize only those things which need to be ready for disaster recovery or where you need an exact copy
You will probably come up with other levels of virtualization based on your specific situation.
Virtualize all services?
This means that you do not run any services directly on the physical machine. This will aid the overall stability of the hostOS which is paramount in virtualization. Anytime we run a service on the hostOS, we risk reducing stability of the hostOS and adding security issues which could impact all the clientOS instances. That must be avoided.
Some virtual hostOSes do not support running any normal software. VMware ESX and ESXi are examples of this, which means you don’t have any choice but to virtualize everything you want to run. Other virtualization technologies sit on top of a common OS, so loading any program that the normal OS supports is possible. Again, doing this impacts overall system security and may harm stability. It is hard to resist loading other tools in the HostOS for this type of system. It is nearly impossible if the hostOS is MS-Windows.
Storage Only Virtual Services
For many, running storage services like NFS or Samba or iSCSI on the hostOS makes lots of sense. Generally speaking, storage services are extremely stable and it is easy to limit access though both login controls and network controls.
This method is a reasonable compromise for home users and it simplifies lots of things, like backups. Backing up a virtual machine from the hostOS can be extremely efficient and quick since it avoids network slowdowns.
Stable, Commonly Used Virtual Services
For home users, having too many virtual machines can become an issue. We often have a few commonly used programs that have proven to be extremely stable. This is very true for Linux systems. On a home network, there are some services that are commonly used, highly stable, but you only need running on 1 machine on the network. It may be desired to set these up directly on the hostOS of the VM server. Some examples are:
- Network Storage and File Sharing – NFS, Samba, iSCSI
- Printing – CUPS, print spoolers, PDF converters, print servers with automatically loading drivers for client OSes as needed. I do not have printing on the hostOS. I place these types of services into a common utilities VM.
- DNS Server – so all client machines automatically know the hostnames and IPs for other internal systems. Common Utilities VM here.
- DHCP Server – reserved IP address for specific machines on your network. Basically creating a static IP without having to setup the static IP on each client. Common Utilities VM here.
The difficulty is in not installing too many programs on the hostOS without careful thought. that isn’t always easy, since dropping just one more tiny program into the hostOS doesn’t appear to be an issue, until it becomes an big issue due to a failure.
Suggested VM Organization
There are lots of ways to organize services for virtual machines. The goal for the proposed solution is to reduce complexity where possible, increase security, while providing a place for commonly needed services that don’t need a new VM built all the time. Your needs will probably be different, so this is just a starting point.
- VM Services – ok to run on a hostOS
- Virtualization Tools
- Storage Tools
- Backup Server
- Common Utility VM – run all these inside a single VM
- File/Print – Samba and CUPs
- DNS internal and caching server – not externally available
- Desktop VM – Office productivity apps, web surfing, email, etc.
- Firewall VM – no other purpose due to security concerns.
- External Web Server VM – no other purpose; Security demands single purpose.
- Database VM – it is likely you will have multiple systems which need a database.
- VPN VM – no other purpose; Security demands single purpose.
- PBX VM – no other purpose; too specialized
- Document Management VM – if you deploy a real DMS, keep this pure due to the complexity of the system
- Email/Communications VM – we use Zimbra and it is complex, so it gets a VM to itself
- Monitoring/Log VM – this is where all the logs from every server are pushed. It also captures performance statistics and builds performance graphs for each server. On VMware servers, this is built into the vSphere client.
Basically, starting with a single hostOS with 2 clientOS VMs is recommended. The host provides backup, VM and storage. A common utility VM handles file/print and internal network services. A desktop VM provides a place to try desktop tools like office programs, software development, and other commonly used applications. This desktop may not be available except through network connections over VNC, RDP, ssh or NX tools. This may be confusing to users with MS-Windows background. It also means that your desktop is available from a laptop, netbook or any other machine on your network. That is pretty cool when you think about it. It can mean that your desktop is available to you from anywhere in the world where you have sufficient network connectivity too.
As your needs become more complex, you can add the other recommended virtual machines. Generally any service which will face the internet needs to be a special-use VM with a single purpose. Some people may elect to have an all-in-one security VM installed like ClearOS. It will certainly be easier to manage using their web GUI, which means end-user configuration mistakes will be less likely. Only you can determine whether the risks involved by using this solution inside a single VM facing the internet is worth it.
Regardless of how you choose to deploy VMs, please setup security in the same way that you would for different physical machines. Some VM deployment techniques break security between clientOS instances, so just be aware of this when you are deploying. A shared network card with lots of VMs can mean that each of those VMs can have access to any of the traffic going through the network card, even traffic no meant for them.
Deploying virtualization brings many capabilities and great flexibility, but also increases complexity. Since you are the boss of your own systems, only you can decide which programs are stable enough to be on the hostOS. I would suggest that any program that you do not have lots of experience with should not be placed directly on a hostOS at all. I use the rule that if it isn’t directly part of the virtualization or backup or storage, then it goes into a VM. This has worked very well for us and limited the stability concerns.
Thanks to Laurens Duijvesteijn for letting me use these questions.