Add new record filter tool for public release of traces #6662

edeiana · 2024-02-20T04:33:38Z

We want to create a new tool to filter traces of Google workloads for public release.
The new public Google workload traces will contain more information compared to the previous version, while still preserving confidentiality of Google's IP.

Main features of new public traces:

Instruction categories will be used instead of the original instruction opcode;
Operand dependencies will be preserved using virtual registers + register size;
Syscall numbers will be removed as before, but we'll keep the blocking attribute, if a syscall is blocking;
Branch target addresses will be embedded in the trace, instead of being in a separate file;
Provide a map of all virtual addresses used in the trace and a tool to perform the virtual-to-physical mapping.

Simple record_filter_t::record_filter_func_t filter that modifies a field of every trace_entry_t record in a trace and leverages record_filter to output such modified records onto a "filtered" output trace. Issue: #6662

Containing-register IDs can be >=256, hence their value does not fit in the allotted 8 bits per register operand of regdeps encoding. This was causing a memory corruption in instr_convert_to_isa_regdeps() where src_reg_used and dst_reg_used have only 256 elements and are laid out next to each other in memory. Writing to index >=256 into one was overwriting the other. Fix: remap containing-register IDs to virtual-register IDs starting from 0 for all architectures. We still have only up to 198 unique containing registers (max number of containing registers for AARCH64), so remapping them allows to fit them in 8 bits. In the re-mapping (from DR_REG_ to DR_REG_V) we exclude DR_REG_INVALID to avoid issues with opnd_t operations for registers. We introduce 2 new public APIs: dr_reg_to_virtual() and get_virtual_register_name(). We use dr_reg_to_virtual() in instr_convert_to_isa_regdeps() to avoid the issue mentioned above. We also re-introduce setting the size for register operands in instr_convert_to_isa_reg_deps() and decode_isa_regdeps() as instr_t.operation_size because DR_REG_V don't have predefined size. We added tests to check that DR_REG_ with IDs >=256 don't cause problems. Issue: #6662

Containing-register IDs can be >=256, hence their value does not fit in the allotted 8 bits per register operand of regdeps encoding. This was causing a memory corruption in instr_convert_to_isa_regdeps() where src_reg_used and dst_reg_used have only 256 elements and are laid out next to each other in memory. Writing to index >=256 into one was overwriting the other. Fix: remap containing-register IDs to virtual-register IDs starting from 0 for all architectures. We still have only up to 198 unique containing registers (max number of containing registers for AARCH64), so remapping allows to fit them in 8 bits. In the re-mapping (from DR_REG_ to DR_REG_V) we exclude DR_REG_INVALID and DR_REG_NULL to avoid issues with opnd_t operations for registers. We introduce a private routine dr_reg_to_virtual() to do the mapping from real ISA to virtual register. We use it in instr_convert_to_isa_regdeps() to avoid the issue mentioned above. We modified the get_register_name() public API to use the global dcontext and its ISA mode to determine whether to return a real register name or a virtual one. The signature of the API remained the same, but we document the use of the global dcontext in doxygen. We also re-introduce setting the size for register operands in instr_convert_to_isa_reg_deps() and decode_isa_regdeps() as instr_t.operation_size because not all DR_REG_V have a predefined size based on their enum value (e.g., reserved DR_REG_XMM enum values). We added tests to check that DR_REG_ with IDs >=256 don't cause problems. Issue: #6662

AssadHashmi · 2024-04-25T14:10:13Z

We'd like to raise a point of discussion about the use of DR_REG_V as a prefix for virtual registers.
Most Arm programmers' documentation uses V for vector registers, e.g.ADD Vd.4S, Vn.4S, Vm.4S.

The DynamoRIO user documentation uses R for reasons of convention and generality, e.g.INSTR_CREATE_add_vector(dc, Rd, Rm, Rn, width), see:
https://dynamorio.org/dr__ir__macros__aarch64_8h.html#ad6fa6d2ab7764783481efd209cf11b76

A new or relatively inexperienced user may probably and intuitively but wrongly try something like this first:

INSTR_CREATE_add_vector(dc, opnd_create_reg(DR_REG_V0), opnd_create_reg(DR_REG_V1), opnd_create_reg(DR_REG_V2), OPND_CREATE_SINGLE())

Rather than the correct:

INSTR_CREATE_add_vector(dc, opnd_create_reg(DR_REG_Q0), opnd_create_reg(DR_REG_Q1), opnd_create_reg(DR_REG_Q2), OPND_CREATE_SINGLE())

or e.g.

INSTR_CREATE_add_vector(dc, opnd_create_reg(DR_REG_D20), opnd_create_reg(DR_REG_D10), opnd_create_reg(DR_REG_D14), OPND_CREATE_HALF())

This possible stumbling block is probably less of a problem for scalable vectors, e.g. INSTR_CREATE_add_sve(dc, Zd, Zn, Zm), because the documentation implies DR_REG_Z, see
https://dynamorio.org/dr__ir__macros__aarch64_8h.html#a14d0f0b7fad176b301539c3e9254771b

What do you think? Are we worrying unnecessarily?
Is the level of DR knowledge required to use instruction macros in e.g. clients, enough for developers to know which register names/IDs are correct?

derekbruening · 2024-04-25T14:25:37Z

s/DR_REG_V0/DR_REG_VIRT0/?

AssadHashmi · 2024-04-25T17:48:58Z

s/DR_REG_V0/DR_REG_VIRT0/?

That's fine.

edeiana · 2024-04-25T18:15:54Z

Thank you for pointing this out @AssadHashmi !
Will change from DR_REG_V to DR_REG_VIRT as @derekbruening suggested.

New record_filter_t::record_filter_func_t filter, which we call "encodings2regdeps", that modifies the encoding of trace_entry_t records from a real ISA to the synthetic regdeps ISA. "encodings2regdeps" can add or remove trace_entry_t records that contain encodings depending on the regdeps encoding size. Note that "encodings2regdeps" only changes the encoding of instructions, but it does not adjust their length (or changes the instruction PC), hence the output trace will have encoding sizes that do not match the instruction length. For this reason we disable the encoding size vs instruction length check in reader_t when the trace has DR_ISA_REGDEPS encodings. This filter is part of the "record_filter" tool and can be invoked with: ``` drrun -t drcachesim -simulator_type record_filter -filter_encodings2regdeps -indir path/to/input/trace -outdir path/to/output/trace ``` Issue: #6662

Fixes a size mismatch between dr_reg_fixer[] and d_r_reg_id_to_virtual[] maps in aarch64. Adds a check in encode_debug_checks() (in core/ir/${ARCH}/encode.c) for all architecures. Issue: #6662, #3544, #1569

Fixes a data race due to multiple dr_standalone_init() done in parallel (per shard) by encodings2regdeps_filter_t. dcontext is now initialize one time by record_filter_t and passed to its filters through the record_filter_info_t interface. Issue #6662

Fixes a data race due to multiple dr_standalone_init() done in parallel (per shard) by encodings2regdeps_filter_t. dcontext is now initialized one time by record_filter_t and passed to its filters through the record_filter_info_t interface. Avoids a data race in opcode_mix where all threads set the dcontext isa_mode to DR_ISA_REGDEPS for regdeps input traces. Issue #6662 #6812

Adds disassembling for DR_ISA_REGDEPS instructions. Specifically, when we disassemble DR_ISA_REGDEPS instructions, we print the instruction encoding (divided in 4 byte words, can span one or two lines, similar to x86), we substitute the opcode with categories, we print the operations size (e.g., `[4byte]`), and then the source and destination virtual register names (e.g., `%rv3`). Disassembled instructions look as follows: `00000812 06260606 load [8byte] %rv4 -> %rv4 %rv36` In general, they follow this pattern: [encoding (in 4 byte words)] [categories] [operation_size] [src_regs -> dst_regs] Issue: #6662

Modifies the view tool to handle OFFLINE_FILE_TYPE_ARCH_REGDEPS traces, leveraging the disassembly of DR_ISA_REGDEPS instructions. When visualizing DR_ISA_REGDEPS instructions, the view tool still prints the instruction length and PC, which for OFFLINE_FILE_TYPE_ARCH_REGDEPS traces are the same as those in the original trace. Then, after the PC, the instruction encoding, categories, operation size, and registers are printed following the disassembly format of DR_ISA_REGDEPS instructions (xref: #6799). DR_ISA_REGDEPS instructions printed by the view tool look as follows: ``` [...] ifetch 10 byte(s) @ 0x00007f86ef03d107 00001931 04020204 load store [4byte] %rv0 %rv2 %rv36 -> %rv0 [...] 00000026 ``` We also fix a formatting bug in DR_ISA_REGDEPS instruction disassembly, where we were missing a new line when the instruction encoding spills into a second line. Issue: #6662

The public version of a trace needs to preserve the TRACE_MARKER_TYPE_FUNC_[ID | ARG | RETVAL] markers related to SYS_futex system calls. We add this feature to encodings2regdeps_filter_t. This filter still drops the TRACE_MARKER_TYPE_FUNC_ markers related to other functions that are not SYS_futex. However, we still rely on type_filter_t to remove the additional markers that we don't want to preserve in the public trace. Issue #6662

edeiana added Type-Feature Component-DrMemtrace labels Feb 20, 2024

edeiana self-assigned this Feb 20, 2024

edeiana mentioned this issue Feb 21, 2024

i#6662 public traces, part 2: encoding_filter #6663

Merged

edeiana mentioned this issue Mar 6, 2024

i#6662 public traces, part 1: synthetic ISA #6691

Merged

edeiana mentioned this issue Apr 18, 2024

i#6662 regdeps ISA: virtual registers #6783

Merged

edeiana mentioned this issue May 8, 2024

i#6662 public traces, part 3: regdeps disasm #6799

Merged

edeiana mentioned this issue May 9, 2024

i#6662 virtual regs: bug fix #6805

Merged

edeiana mentioned this issue May 13, 2024

i#6662 encodings2regdeps: data race bug fix #6809

Merged

edeiana mentioned this issue May 17, 2024

i#6662 public traces, part 4: view tool #6816

Merged

edeiana mentioned this issue May 21, 2024

i#6662 public traces, part 5: func_id_filter_t #6820

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new record filter tool for public release of traces #6662

Add new record filter tool for public release of traces #6662

edeiana commented Feb 20, 2024 •

edited

AssadHashmi commented Apr 25, 2024

derekbruening commented Apr 25, 2024

AssadHashmi commented Apr 25, 2024

edeiana commented Apr 25, 2024

Add new record filter tool for public release of traces #6662

Add new record filter tool for public release of traces #6662

Comments

edeiana commented Feb 20, 2024 • edited

AssadHashmi commented Apr 25, 2024

derekbruening commented Apr 25, 2024

AssadHashmi commented Apr 25, 2024

edeiana commented Apr 25, 2024

edeiana commented Feb 20, 2024 •

edited