-
-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VIA/Zhaoxin Padlock: 64-bit montmul, rep xsha512, GMI, partial decode #279
Comments
Good findings :-) Maybe we can add a |
Such a decoder mode makes sense for the two GMI instructions at least. I'm much less sure about whether any of the other items I've found are truly Zhaoxin-specific, though - Christopher Domas's Sandsifter tool ran into the partial decode behavior of |
I've done a bit more testing, and made a few more minor findings:
|
Having gotten hold of a box with a Zhaoxin KX-6580 CPU (Chinese x86 cpu vendor; formed as a joint venture of VIA and Shanghai; their designs are mostly a continuation of the VIA C3/C7/Nano series cores, mainly for the Chinese market but have started showing up elsewhere) I decided to do a whole bunch of testing on its PadLock functionality - and in doing so, I've made a number of findings of various undocumented and underdocumented features. The ones most relevant for disassembly tools like, say, Zydis, so far appear to be:
The
rep montmul
instruction takes, much to my surprise, a mandatory67h
address size prefix in 64-bit mode (!!). This is observed by the sequencef3 0f a6 c0
consistently producing an #UD exception, while something likef3 67 0f a6 c0
does not. The issue appears to be thatrep montmul
takes a pointer in rSI to a data structure that contains 5 pointers to various buffers needed by this instruction - this data structure does not appear to have ever been updated to work with 64-bit pointers, and so the 67h prefix is needed to force 32-bit addressing for the instruction. This makes the instruction fairly inconvenient to set up, since it becomes necessary to make sure that this structure and all its buffers reside in the bottom 4GB of virtual address space, but once that is done, the instruction variant with the 67h prefix (but not without) will execute a Montgomery multiply just fine.The instruction encoding
f3 0f a6 e0
is a seemingly undocumented instruction to accelerate SHA-512 hashing. In my testing, it appears to take the following arguments:I haven't been able to find this instruction documented anywhere, but OpenSSL clearly knows about it (see https://github.com/openssl/openssl/blob/master/engines/asm/e_padlock-x86.pl , line 597), referring to it as
rep xsha512
. The instruction encodingf3 0f a6 d8
also appears to be an alias of this instruction.The instruction encoding
f3 0f a6 e8
is a Zhaoxin-specific "GMI" instruction:ccs_hash
. This instruction is documented ( https://github.com/ZXOpenSource/OpenSSL-ZX-GMI/blob/master/GMI%20User%20Manual%20V1.0.pdf - in Chinese, but gets pretty readable after a trip through google translate) to provide support for the Chinese SM3 hashing algorithm - in my testing, it also provides undocumented support for SHA-1/256/512 that can be obtained by setting rBX to values in the range 0x10 to 0x15.The instruction encoding
f3 0f a7 f0
is another Zhaoxin-specific "GMI" instruction:ccs_encrypt
. This instruction is documented to provide support for the Chinese SM4 encryption algorithm - it also provides undocumented support for AES-128/192/256 that can be obtained by setting rAX to values in the range 0x10 to 0x15.The instruction encodings
f3 0f a6 f0
andf3 0f a6 f8
are undocumented and I haven't been able to figure out what they might do. They produce a #GP exception for all sorts of arguments I've been trying to pass them, suggesting that they either expect a really odd input data format or are privileged instructions.At least on this specifc CPU, the
xstore
instruction accepts therepne
prefix, and treats it as a synonym forrep
-f2 0f a7 c0
produces the same output as I would expect fromrep xstore
f3 0f a7 c0
. None of the other Padlock instructions accept this prefix (#UD). The instruction encodingf3 0f a7 f8
appears to be an alias ofrep xstore
, however it doesn't acceptrepne
.From what I can find, all of the instructions in the Padlock space (
0f a6 c0-ff
and0f a7 c0-ff
) exhibit partial decode, where the bottom 3 bits of the last byte of the instruction are ignored - e.g.f3 0f a7 f7
is accepted as a valid instruction and behaves identically tof3 0f a7 f0
.The text was updated successfully, but these errors were encountered: