Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-blk ZC blocks #40

Open
enterJazz opened this issue Jan 2, 2024 · 10 comments
Open

virtio-blk ZC blocks #40

enterJazz opened this issue Jan 2, 2024 · 10 comments

Comments

@enterJazz
Copy link
Contributor

enterJazz commented Jan 2, 2024

reproduce:

just start-sev-vm-virtio-blk
cryptsetup open /dev/vdb target
fio --filename=/dev/mapper/target --name='bw' blk-bm.fio

err:

# nearing end of `bw write` benchmark (2min20s left)
qemu-system-x86_64: virtio: bogus descriptor or out of resources][eta 22m:05s]
[ 4916.361324] INFO: task kworker/u8:4:172 blocked for more than 122 seconds. 
[ 4916.362784]       Not tainted 6.7.0-rc7-g2924ea4399ec-dirty #119
[ 4916.364013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
# benchmark remains stuck

may be related with removing BUG_ON (see gitlab linux repo commit ) #41

@enterJazz enterJazz changed the title virtio-blk ZC issues virtio-blk ZC blocks Jan 2, 2024
@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 2, 2024

Possibly related:

Ok, GFP_NOFS -> GFP_KERNEL did the trick.

Now I get:
virtio: bogus descriptor or out of resources

So, still some work ahead on both ends.

Few hacks later (only changes on 9p client side) I got this running stable
now. The reason for the virtio error above was that kvmalloc() returns a
non-logical kernel address for any kvmalloc(>4M), i.e. an address that is
inaccessible from host side, hence that "bogus descriptor" message by
QEMU.

https://listman.redhat.com/archives/virtio-fs/2021-October/004353.html

So it may be possible that we are returning a page in HIGH_MEM / not in logical address space. Will have to check the allocation flags in dm-crypt.

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 4, 2024

Currently checking if I can reproduce on native- will update comment

UPDATE: cannot reproduce on native. Hence, it must be an SEV specific issue.

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 4, 2024

For SEV, this completely blocks the dm-crypt mapper target; calling e.g. echo foo > /dev/mapper/target causes the echo to hang. This issue will probably hard to debug.

Noteworthy, writes to the underlying disk itself, which go through the same virtual device, do not block.

Hence: there is probably a deadlock inside dm-crypt.

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 4, 2024

I reproduced the error again while watching dmesg, got a similar error as in #41 again; see err message bellow.

[ 2485.640871] ------------[ cut here ]------------
[ 2485.640901] WARNING: CPU: 0 PID: 0 at kernel/smp.c:786 smp_call_function_many_cond+0x471/0x500
[ 2485.640919] Modules linked in: dm_crypt aes_generic cbc encrypted_keys trusted asn1_encoder tee tpm af_packet cfg80211 mousedev rfkill 8021q edac_core intel_rapl_msr bochs intel_rapl_common drm_vram_helper crc32_pclmul polyval_clmulni drm_ttm_helper ppdev polyval_generic ttm sha512_ssse3 parport_pc sha512_generic sha256_ssse3 psmouse sch_fq_codel iTCO_wdt ghash_generic drm_kms_helper gf128mul ghash_clmulni_intel intel_pmc_bxt sha1_ssse3 i2c_i801 input_leds watchdog led_class evdev gcm aesni_intel parport crypto_null intel_agp loop sev_guest mac_hid libaes crypto_simd intel_gtt tun i2c_smbus tsm tiny_power_button agpgart cryptd tap lpc_ich serio_raw button macvlan bridge drm fuse stp backlight llc firmware_class efi_pstore configfs efivarfs dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ahci libahci libata scsi_mod virtio_net net_failover virtio_blk atkbd failover libps2 virtio_pci vivaldi_fmap i8042 virtio_pci_legacy_dev crct10dif_pclmul crct10dif_common serio
[ 2485.641928]  crc32c_intel rtc_cmos virtio_pci_modern_dev scsi_common dm_mod dax virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[ 2485.642048] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc7-gf0eb8d3bdd7d-dirty #7
[ 2485.642059] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
[ 2485.642069] RIP: 0010:smp_call_function_many_cond+0x471/0x500
[ 2485.642081] Code: 08 48 8b 74 24 38 48 c7 c1 e0 8e 59 a1 e8 d7 b8 f4 ff 65 ff 0d c8 9a a9 5e 0f 85 b7 fe ff ff 0f 1f 44 00 00 e9 ad fe ff ff 90 <0f> 0b 90 e9 cf fb ff ff 8b 7c 24 48 e8 8e 5f f5 ff 84 c0 0f 84 a4
[ 2485.642091] RSP: 0018:ffffc90000003cc0 EFLAGS: 00010206
[ 2485.642110] RAX: 0000000080010003 RBX: 0000000000000000 RCX: 0000000000000003
[ 2485.642120] RDX: ffffc90000003d88 RSI: ffffffffa148ec90 RDI: ffffffffa2e3afe0
[ 2485.642137] RBP: ffffc90000003d88 R08: 0000000000000000 R09: 0000000000001000
[ 2485.642147] R10: ffff88846d800000 R11: 0000000000000000 R12: ffffc90000003d88
[ 2485.642156] R13: ffffc90000003e18 R14: 0000000000000000 R15: 0000000000000000
[ 2485.642165] FS:  0000000000000000(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
[ 2485.642181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2485.642190] CR2: 00007fdf25e42058 CR3: 000800010cec0000 CR4: 00000000003506f0
[ 2485.642200] Call Trace:
[ 2485.642221]  <IRQ>
[ 2485.642230]  ? smp_call_function_many_cond+0x471/0x500
[ 2485.642241]  ? __warn+0x81/0x130
[ 2485.642257]  ? smp_call_function_many_cond+0x471/0x500
[ 2485.642274]  ? report_bug+0x171/0x1a0
[ 2485.642288]  ? handle_bug+0x42/0x70
[ 2485.642300]  ? exc_invalid_op+0x17/0x70
[ 2485.642316]  ? asm_exc_invalid_op+0x1a/0x20
[ 2485.642331]  ? __pfx___cpa_flush_tlb+0x10/0x10
[ 2485.642345]  ? smp_call_function_many_cond+0x471/0x500
[ 2485.642363]  ? __pfx___cpa_flush_tlb+0x10/0x10
[ 2485.642374]  ? srso_return_thunk+0x5/0x5f
[ 2485.642385]  ? _vm_unmap_aliases+0x2a5/0x2f0
[ 2485.642404]  on_each_cpu_cond_mask+0x24/0x40
[ 2485.642416]  cpa_flush+0x18f/0x1b0
[ 2485.642428]  __set_memory_enc_pgtable+0xc3/0x1a0
[ 2485.642443]  crypt_page_free+0x37/0x90 [dm_crypt]
[ 2485.642467]  crypt_free_buffer_pages+0x155/0x170 [dm_crypt]
[ 2485.642482]  crypt_endio+0x4d/0x70 [dm_crypt]
[ 2485.642497]  blk_update_request+0x114/0x480
[ 2485.642516]  blk_mq_end_request+0x1e/0x110
[ 2485.642528]  virtblk_done+0x76/0x100 [virtio_blk]
[ 2485.642544]  vring_interrupt+0x64/0xd0 [virtio_ring]
[ 2485.642566]  __handle_irq_event_percpu+0x4d/0x1a0
[ 2485.642578]  handle_irq_event+0x3e/0x80
[ 2485.642589]  handle_edge_irq+0x9d/0x280
[ 2485.642601]  __common_interrupt+0x42/0xb0
[ 2485.642620]  common_interrupt+0x83/0xa0
[ 2485.642632]  </IRQ>
[ 2485.642641]  <TASK>
[ 2485.642650]  asm_common_interrupt+0x26/0x40
[ 2485.642666] RIP: 0010:pv_native_safe_halt+0xf/0x20
[ 2485.642677] Code: 0f 0b 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 45 11 2f 00 fb f4 <e9> ac 93 01 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
[ 2485.642687] RSP: 0018:ffffffffa2c03e80 EFLAGS: 00000212
[ 2485.642705] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 2485.642714] RDX: 4000000000000000 RSI: ffffffffa25baddd RDI: 000000000336e39c
[ 2485.642729] RBP: ffffffffa2c10900 R08: 000000000336e39c R09: 0000000000000001
[ 2485.642738] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 2485.642747] R13: 0000000000000000 R14: ffffffffa2c10900 R15: 000000007cdc8000
[ 2485.642760]  default_idle+0x9/0x20
[ 2485.642772]  default_idle_call+0x2e/0xe0
[ 2485.642788]  do_idle+0x1f3/0x230
[ 2485.642801]  cpu_startup_entry+0x2a/0x30
[ 2485.642812]  rest_init+0xd1/0xe0
[ 2485.642829]  arch_call_rest_init+0xe/0x30
[ 2485.642842]  start_kernel+0x4d9/0x790
[ 2485.642855]  x86_64_start_reservations+0x18/0x30
[ 2485.642872]  x86_64_start_kernel+0x91/0xa0
[ 2485.642884]  secondary_startup_64_no_verify+0x18f/0x19b
[ 2485.642900]  </TASK>
[ 2485.642909] ---[ end trace 0000000000000000 ]---

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 5, 2024

As mentioned in #41 (comment) , I am currently seeing if running w/ 1 vCPU reproduces issue. Will update comment.

NOTE: already had the error of #40 (comment) occur ; even w/ only one vCPU

CONFIRMED: error occurs w/ 1 vCPU- so it is not a deadlock.
Will continue by consulting #41 (comment)

@enterJazz
Copy link
Contributor Author

As current design is agnostic of driver, will see if same bug occurs w/ nvme driver.
I guess nvme driver should work fine w/ ramdisk? Will see

@enterJazz
Copy link
Contributor Author

Just ran it w/ nvme driver- also hangs right away.
So this is not a virtio-blk specific issue, but in dm-crypt.

@enterJazz
Copy link
Contributor Author

tried setting numjobs for the breaking iops job to 1; still breaks

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 29, 2024

stonewall
blocksize=4k
rw=randwrite
iodepth=32
# numjobs=4 # this with writes probably makes things fail
numjobs=1 

also breaks dm-layer

got interesting error this time (qemu):

error: kvm run failed Invalid argument
EAX=00000005 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 00000000 00000000
CS =0000 00000000 00000000 00000000
SS =0000 00000000 00000000 00000000
DS =0000 00000000 00000000 00000000
FS =0000 00000000 00000000 00000000
GS =0000 00000000 00000000 00000000
LDT=0000 00000000 00000000 00000000
TR =0000 00000000 00000000 00000000
GDT=     00000000 00000000
IDT=     00000000 00000000
CR0=80050033 CR2=00000000 CR3=00000000 CR4=003506f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000901
Code=<??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Update: setting io_depth to 1 makes it work - however, setting numjobs to 4 again makes it break again

@enterJazz
Copy link
Contributor Author

enterJazz commented Jan 29, 2024

This configuration again works???

[iops randwrite] # makes device-mapper hang
stonewall
blocksize=128k
rw=randwrite
iodepth=32
# iodepth=1
# numjobs=4 # this with writes probably makes things fail
numjobs=1 # this with writes probably makes things fail

NOTE: doesn't work with bs=4K
setting numjobs to 4 works fine w/ this to ; as long as it isn't 4K blocks (???)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant