-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss after version migration - Corrupt or incorrect page (1470) #88
Comments
Hi thememika! Very sorry to hear about these problems. Force rebuild does carry a risk of data loss. Upgrade shouldn't have needed a force rebuild if the VDO was cleanly shut down, so I'm not sure what went wrong there. Unfortunately force-rebuild is destructive so the 50%->4% usage drop probably represents the amount of lost data, sadly. I'm very disconcerted by the messages about a reference count decrement error during fsck after the force rebuild -- that should not happen. One thing you could try doing is locating the bcachefs superblock on the device under VDO and then copying it to the right place on the VDO. This might allow bcachefs to mount, though there is likely other data loss with that 50% usage. You could try to locate it with Maybe one of the actual vdo people has another idea? -Sweet Tea (who hasn't worked on vdo in two years...) |
First off, I'm very sorry to hear about this situation. From the messages it appears that the forced rebuild ran into some errors and as a result it has likely destroyed all the block mapping data on your device. (Technically much of the data is probably still there, but without the mappings it's unrecoverable, are there nothing to prevent further use of the device from overwriting it.) I hope you had backups. The vdoStats output is likely not invalid; instead the device really has no mapped content any more. Forced rebuild is a very involved process, and it can result in the loss of data. If you have a backup or snapshot of the device before you ran the forced rebuild, we may be able to help you recover from that state. As for why it happened, a VDO device needs to be shut down cleanly before it can be upgraded to the in-kernel version. Your other device were likely clean, so they encountered no issues. If you have any remaining VDO devices that you have not upgraded yet, make sure that you have shut them down cleanly before switching to the newer version. The "corrupt page" error does concern me. I don't currently see how that could happen other than misbehavior from the storage layers below VDO, but I would still like to rule out a bug on our part. The fact that you can reproduce it is worrying. If you can provide the result of vdoDumpBlockMap for this device, it might help us understand what happened in more detail. It would also be helpful to know what version of VDO you were using before the upgrade, and whether you are using lvm to manage the VDO volume, or whether you're using dmsetup directly. Also let us know if there's anything else we can do help. |
For anyone encountering this isue in the future, the "recovery journal in old format" message is the indicator that the VDO was not cleaned properly before upgrade. In that situation you should be able to switch back to the older version of VDO, do the recovery, and then upgrade safely. |
Hello.
I upgraded from out-of-tree KVDO to the in-kernel version on one of my systems.
It went wrong and now KVDO doesn't work as intended.
During the upgrade I had to use
force rebuild
because the recovery journal was "in the old format"It completed with some errors.
But the device was started, so I didn't notice the errors first.
I ran my fsck (bcachefs fsck, to be specific) on the DM-VDO device.
After few seconds, the fsck crashed with an I/O error.
This is what happened
I retried
force rebuild
, and got the same "Expected page" errors once again during the rebuild.Once again, the device had started, though.
And it's now operating with no I/O errors, but I'm getting zeroes when trying to read the most important parts of my device, such as superblock:
How could it go like that, why? And is there any possibility of recovering data?
My disks are fully operational, there are no media errors or damage.
Everything was working fine before upgrade, how could this happen? Also, my other machines upgraded without any problems.
Here is some information about the setup, please ask more if needed:
su -c vdo-devel/src/c++/vdo/bin/vdoStats
note: it's invalid, my use% was 50+ % before migration
su -c vdo-devel/src/c++/vdo/user/vdoDumpConfig /dev/sda2
su -c vdo-devel/src/c++/vdo/bin/vdoStats -v
Thanks for any help!
The text was updated successfully, but these errors were encountered: