WHS Server Recovery

Bulkhead · Mar 13, 2018

Greetings All -

I have a situation I want to get your guidance on.

I have a Windows Home Server (WHS) which is a headless machine with 4 drive bays that recently hung. This particular machine is a network attached server that does not have a monitor nor keyboard/mouse directly attached. As a result, in this case, when the server became inaccessible via Remote Desktop Connection, I was left with very little information gathering capability. I could get a reply across the network to a "ping" request, but could otherwise not see into the machine at all.

It should be noted here that I have 2 backup copies of the data that was stored on the server, so I am not panicked about data loss, but would nonetheless like to restore the server as I have put many hours into configuration and refinement over the years and although I have backups, my last batch was made in September 2017, so I would lose some files that I would otherwise like to keep.

So, with little else to do, I did a hard shutdown, rebooted the server and hoped it would boot up and allow me to remotely connect to the machine. Well I did this twice but ultimately, I was never able to establish an RDC to the server.

I removed the drive (WHS Disk 0) and connected it directly to a bench machine that I have set up Lubuntu on for recovery operations. The failing drive is a 1.5TB WD Black with 2 partitions: SYS (21 GB NTFS) and DATA (~1.5 TB NTFS). I also directly connected a new 2.0 GB WD Black drive to the bench machine.

I used Gddrescue to clone the failing SYS drive to the new 2 GB drive and got "99.99% pct rescued" according to the gddrescue result summary screen after 5.5 hours.

The command line I used is below:
sudo ddrescue -d -f /dev/sdb /dev/sda /home/administrator/RecoveryLogs/WHS_SYS.log

This is where I would like your guidance.

Should I run gddrescue again and specify the -r switch to attempt repeated reads of bad blocks? I had not done this on the first pass as I am under the impression that the first pass should be limited to just trying to get what you can with as little stress imparted to the source drive as possible.
If I run it again, what switches should I specify?
Despite reading multiple "guides" and posts, it isn't clear to me whether I should have created an image of my failing drive instead of cloning the drive to a new disk. Should I also make an image of the drive to yet another device? (does having the image give me more data recovery options than having the cloned disk?)

One additional point is that when configuring my bench machine with Linux to attempt to rescue the failing SYS drive, the bench machine found my server SYS drive and started that OS ( I had not changed my boot order so it found the wrong drive to boot from). Inadvertently, I got the only glimpse into the failing SYS drive when I saw a message that read something to the effect that there was an error with the ntoskrnl.exe file and the boot process halted.

I didn't write the message down and I recognize the file, but it also explains why I want to do more than just clone the failing drive because I expect that corruption occurred that I would need to fix before attempting to reinstall the cloned drive into my server and try booting it up. I don't want to have to do another hard shutdown as WHS is temperamental by nature and the pooled drives it manages are very susceptible to corruption.

I appreciate any guidance you can offer. Let me know what information I've overlooked.

Cheers,

Jared · Mar 13, 2018

An image has no advantage over a clone for data recovery purposes. Besides, you can always make an image later from the clone if you'd like.

You can attempt to re-read the previously unread sectors by using the -r followed by a number and using the same log file. So you'd do something like:

Code:

sudo ddrescue -d -f -r3 /dev/sdb /dev/sda /home/administrator/RecoveryLogs/WHS_SYS.log

That would give you three retry attempts on each bad sector. However, the odds that it'll actually read them successfully is low. At best it'll probably just clean up a few more of the sectors around the ones that are actually bad.

You also might try running the whole thing in reverse using the -R trigger. Sometimes that'll clean up a few more sectors. Also, you can adjust the cluster read size to a single sector by using the -c1 trigger.

That having been said, it's likely that you've got some corrupted files necessary to boot. It's likely that you even have some in the file tables. The missing exe could actually be caused by a missing entry in the MFT.

You can try just replacing the bad files Windows references, but I strongly suspect that in the end, you'll need to do a re-install. If you've got a full image backup from the past, why not just restore that to get the server working, then sync all the data from your clone drive so you'll have all the new data?

Bulkhead · Mar 13, 2018

Thanks Jared!

Bear with me as I want to sure on what I do next... I had not specified the "-f" option whereas you have in your example command. In reading the --help file, I see the description of the option as "--force overwrite output device or partition".

My understanding is that by specifying the logfile that was created during the first run of ddrescue, when I run it again, ddrescue will only work on the sectors it had trouble with during the first pass and that the -f option will not overwrite my existing clone, but rather attempt to fill in where ddrescue has success in recovering additional sectors. Is that right? (sorry to perhaps state what is obvious to you, but I don't want to overwrite the clone I have by making a mistake with the next ddrescue command.)

So I am going to execute the following command:
sudo ddrescue -c1 -d -f -r3 -R /dev/sdb /dev/sda /home/administrator/RecoveryLogs/WHS_SYS.log

If you've got a full image backup from the past, why not just restore that to get the server working, then sync all the data from your clone drive so you'll have all the new data?

I should have noted that I don't have a backup image of the SYS partition....

But I will have one once the new server is rebuilt!

(ouch)

Is there by chance a way to determine which files have been corrupted and output that list? Such a list would facilitate doing as you suggest which is to copy missing or damaged OS files to the new SYS partition from say a dummy install on a temporary drive.

Jared · Mar 14, 2018

Bulkhead":1htitx7p said:
[post]10038[/post] My understanding is that by specifying the logfile that was created during the first run of ddrescue, when I run it again, ddrescue will only work on the sectors it had trouble with during the first pass

Correct

Bulkhead":1htitx7p said:
[post]10038[/post] and that the -f option will not overwrite my existing clone, but rather attempt to fill in where ddrescue has success in recovering additional sectors.

The -f trigger is just needed since you're writing to a drive rather than a file. You must have used the -f trigger or ddrescue would have spit out an error and refused to clone to a physical disk. But, no it won't overwrite what you already recovered if you're using the same log file in any event. ddrescue will only ever attempt to re-read and subsequently write the previously unread sectors.

I don't know why they call it a log file. It's essentially a sector map of the read/unread sectors that the program uses to know what's still to be read.

Bulkhead":1htitx7p said:
[post]10038[/post] So I am going to execute the following command:
sudo ddrescue -c1 -d -f -r3 -R /dev/sdb /dev/sda /home/administrator/RecoveryLogs/WHS_SYS.log

Your command looks good. I might just hold off on the -R command until you've run it in forward mode at least once through.

Bulkhead · Mar 14, 2018

Thanks Jared!

Apologies for slow response - I've been trying to figure out how to make a copy of the cloned DATA partition from my server Disk0, but I don't have a single drive large enough for the 1.5TB partition I created so I was looking at drives I have to see if I could cobble enough together to create a large enough partition with LVM, but it doesn't look promising.

I can get close, but not to a full 1.5T.

Only 764Gb of the 1.5TB DATA partition is used. Would it be enough to try to backup up the contents of the partition to a logical volume or should I really clone the entire 1.5TB for backup in the event my data recovery efforts on the first recovered DATA partition fail?

Sorry for the poorly (confusing) question!

Thanks Jared

Jared · Mar 15, 2018

There's no way I know of to target just the used sectors using ddrescue. There are other programs that can do that, but none that handle bad sectors well. I do know that the developer of hddsuperclone (user maximus on this forum) is working on implementing that into his program which does handle bad sectors, but I'm not sure it's ready yet.

Personally, I think it'd be best to just buy what you need (a 2Tb drive) to get the job done right.

Bulkhead · Mar 15, 2018

Yeah, that's the conclusion I've come to.

Once I've cloned as many sectors as I can from the failing Drive to the new one and, of course, made a backup of that rescued clone, I will begin to determine what the problem is and of course try to correct it.

Can you point me to any resources that describe the evaluation and correction process? Not being an expert in this, I am not familiar with the steps that should be taken to correctly identify the problem that needs to be corrected.

I assume that rather than apply whatever corrective measure I think I should use (which would really just be a semi-informed guess) that there is probably a logical, rational sequence of steps that an experienced recovery person would take (absent of course specialized hardware and software)

I wouldn't expect you to take your time to do that for me but if there's a resource or a guide online you can point me to I'd greatly appreciate it. Absent that, I'll likely apply the Unix equivalent of check disk and probably test disk as well based on what I've gleaned so far from the internet.

Thanks for all your help that's far! Hopefully this thread provides some clarity to others in the future as well!

Jared · Mar 15, 2018

Bulkhead":1mzuosun said:
[post]10057[/post] a logical, rational sequence of steps that an experienced recovery person would take (absent of course specialized hardware and software)

That's the hard part there. Personally, I'd use PC-3000's Data Extractor software to generate a report of which files are affected by bad sectors. Then replace any that are needed for Windows to boot (if it were my job to do that, though I generally don't I leave that up to their IT staff to do, I usually just provide the list). But, without proper pro tools it'll be quite difficult to correlate the bad sectors with the corrupted files. It might be best to just get yourself a proper Windows CD from the version the server is running and do a startup repair from it.

Bulkhead · Mar 15, 2018

Thanks Jared,

I'll post an update in the coming days.

WHS Server Recovery

Bulkhead

New member

Jared

Administrator

Bulkhead

New member

Jared

Administrator

Bulkhead

New member

Jared

Administrator

Bulkhead

New member

Jared

Administrator

Bulkhead

New member