Over the last few weeks I migrated my VPS from one provider in Orlando to a new provider in Atlanta. Surprisingly I get much better throughput from my location to Atlanta then I do to Orlando! Things were great for the first month or so, but then it happened. First let me explain about the testing process for selecting a VPS. I spent days trying out various providers, asking for test download files and IP ranges to ping and tracert. I even picked a different provider at first until I got an offer from my current provider that had better throughput. Needless to say I had some canceling to do from the providers I tried out that didn’t perform to my expectations. So a little over a week ago, I got a paypal billing statement from a provider with a funny name, I didn’t recognize it and figured it was one of the providers I was evaluating but didn’t want to keep. I sent a message to the billing dept. to refund the payment and cancel the account. That evening I was at home and started to get SiteUptime alerts that my sites were down. After doing some quick checking to verify, sure enough everything was down. I frantically went through all my e-mail folders trying to find the info for the provider I decided to stick with, but I could not seem to remember the name. Finally after a process of elimination, what I found horrified me. I had cancelled the wrong VPS provider, basically asking them to terminate my account where all my sites are hosted.
I had already cancelled all the other providers and since they do the backups for me, I had no backups of my own with which to restore from. I sent a few desperate e-mails to the support address for the provider I wanted to keep. I even called and left a message on the voice line. After a few hours I started to get automated setup e-mails from them as if they were rebuilding my account. I was also told they had an offsite backup in Dallas with which they should be able to restore my VPS. This helped reduce my panic level right away, but it would take several hours to copy the huge 15GB backup file to Atlanta. I went to bed that night still waiting for a restore to complete.
The next morning I was hoping to find an update from the provider in my e-mail, but had nothing. Actually had less e-mail than normal. I sent a few desperate requests for an update but never heard anything back. Finally that night I sent one last e-mail saying it had been 24 hours since hearing from the support guy. That next morning I got a call on my cell phone from the provider saying I must not be getting his replies via e-mail. Sure enough I guess somewhere in my DNS setup there was a problem and with the VPS down which hosts the domains I use in my e-mail, this was causing problems with my e-mail system.
Now that we established some communication and used alternate e-mail addresses, we made some progress on restoring the VPS. For some reason there were issues restoring the rsync backup my VPS and when starting up it would complain about all kinds of problems and none of the cpanel services would start (This is a linux server if you haven’t guess already). I downloaded a copy of the backup to try some things on my end and the provider was trying things and nothing was working. Finally, we got the VPS restored to the point where I could access WHM (Web Host Manager), and was told by cpanel support that I could force the backup script to run, then grab the account backups which I could use to restore after we reload the server. So thats what I did, and ran into a few issues like a permissions problem on some of the accounts, where the backup would fail, so I had to chown the accounts folder for each user and then I was able to run the backup script again.
Finally after getting backups of all my accounts and making sure I had all the information I needed, the provider reloaded my VPS from scratch. I then installed cpanel again and worked on restoring accounts one by one. I ran into a few problems here as well. For example, when restoring two of the accounts, it would restore with the original hosting plan which was now too small for the actual disk usage. So when the restore would run for these accounts, it would stop as soon as it hit the quota for the account. This caused a bit of trouble but I was able to get around it by manually extracing the home directory for each account and copy it to the proper account after I upgraded their disk space quote. A little work with chown and we are back in business. However, I’m still not out of the woods yet. A few users had a gallery2 install on their accounts, now after getting things running again I’m seeing weird behavior with 404 Not found errors in gallery and strange thumbnail issues. I have rebuilt the cache and truned on performance optimizations in gallery, but the issues still persist. I am still working on that issue for a few accounts, but otherwise everything is up and running.
This was completely my fault, but fortunately the provider did offsite backups. I do not even want to think about all the trouble I would have to go through rebuilding and re-creating all my accounts and starting over on most of them for lack of a backup. Since this issue occurred, the provider has implimented a vzdump backup procedure to prevent this from happening in the future. I also plan to manually setup a way to backup my accounts myself at my location so that I can have immediate access to the account backups should I need to quickly switch providers or servers at any point in the future. Lots of lessons learned here, but honestly it could have been a lot worse! There was a lot more that went on in the background and other things we tried over 4 days to restore, but this is the overview version.