Does DarkRISCV include unit tests or regression tests? #71

pointerliu · 2024-11-09T11:52:39Z

pointerliu
Nov 9, 2024

I noticed that the project repository doesn’t contain any test files. Could you clarify how functional bugs are identified or addressed? Are there any unit tests, regression tests, or other testing methodologies used to ensure the reliability and stability of the project?

Thanks a lot ~ :)

Answered by samsoniuk

Nov 11, 2024

Well, apart from normal implementation bugs (signed shift, for example), the most persistent bug is related to memory interfaces, on both instruction and data buses. But, contrary to implementation bugs that are hidden until someone needs use them, the memory access bugs are quickly identified because the core is unable to boot!

The instruction related problems are quickly identified because the boot code in asm will uncompress and print the following banner:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

                  vvvvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrr…

View full answer

samsoniuk · 2024-11-10T18:53:42Z

samsoniuk
Nov 10, 2024
Maintainer

Of course! We use a kind of "rocket science" to make DarkRISCV work: we just throw it in the air and wait for it to fly... or explode! it worked very well with Saturn V, for example... anyway, in case of explosion, we analyze the remains to try to find out why it exploded!

Jokes apart, the real methodology that I use is not far from this! since I worked to many years with reverse engineering of ASICs and there is no much way to test some black boxes other than compare the output with the original, so I typically relies on comparing the instruction by instruction flow between versions, in a way that the same binary code compiled for RISC-V must decode and execute exactly the same way, regardless the changes in the logic. For this purpose there are some example codes that I periodically run, in order to try find divergentes, such as the coremark test. Although it is not designed to check the RISC-V compliance, the fact that such complex code compiled for RISC-V runs on DarkRISCV and ensures that the core is working fine. Apart from the main project, there are other collaborators working with different methodologies, including run the RISC-V compliance tests. However, because there are so many HW and SW different setups, it is very hard to map and test all possible combinations, so all such effort probably covers only a small part of the possibilities!

After all, I have some real products running with DarkRISCV and they are really very stable and reliable, but they are working with different versions and different HW/SW setups, so there is not a true answer regarding the reliability and stability other than test by yourself with your own code! :)

0 replies

pointerliu · 2024-11-11T12:08:57Z

pointerliu
Nov 11, 2024
Author

Thank you for detailing the testing approach for DarkRISCV! 👍

I'm very interested in the bugs encountered during DarkRISCV’s development and would love to reproduce some of them to deepen my understanding. I noticed certain commits in your Git history with messages like “fix ...” that seem to address specific bugs. Would running the CoreMark test and comparing results between buggy and corrected versions help reproduce some of these bugs?

For instance, in a recent commit related to sign extension, was this bug identified through CoreMark testing or discovered in your real-world applications running DarkRISCV? I’d also appreciate any advice you have on reproducing these historical bugs.

Thank you again for your insights! 🙏

0 replies

samsoniuk · 2024-11-11T15:28:38Z

samsoniuk
Nov 11, 2024
Maintainer

Well, apart from normal implementation bugs (signed shift, for example), the most persistent bug is related to memory interfaces, on both instruction and data buses. But, contrary to implementation bugs that are hidden until someone needs use them, the memory access bugs are quickly identified because the core is unable to boot!

The instruction related problems are quickly identified because the boot code in asm will uncompress and print the following banner:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

                  vvvvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvv  
rrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvv    
rr                vvvvvvvvvvvvvvvvvvvvvv      
rr            vvvvvvvvvvvvvvvvvvvvvvvv      rr
rrrr      vvvvvvvvvvvvvvvvvvvvvvvvvv      rrrr
rrrrrr      vvvvvvvvvvvvvvvvvvvvvv      rrrrrr
rrrrrrrr      vvvvvvvvvvvvvvvvvv      rrrrrrrr
rrrrrrrrrr      vvvvvvvvvvvvvv      rrrrrrrrrr
rrrrrrrrrrrr      vvvvvvvvvv      rrrrrrrrrrrr
rrrrrrrrrrrrrr      vvvvvv      rrrrrrrrrrrrrr
rrrrrrrrrrrrrrrr      vv      rrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrr          rrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrr      rrrrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrr  rrrrrrrrrrrrrrrrrrrrrr

       INSTRUCTION SETS WANT TO BE FREE

The code is fully in assembly and do not use stack or heap, so it works basically w/ registers and the same memory as the code is stored. Although it does not use stack or heap, does not means that load/store are not used, but in this case the loads are from the code memory and stores are only for the UART, so no real R/W memory is used.

In this segment, the bugs are related to memory access and pipeline, in special the fetch and decode stages. In the past, I used some $display() messages to dump the fetched words, so was possible compare previous implementation and find memory and pipeline bugs, but nowadays there is a trace option in the core, so the executed instructions are partially decoded and dumped in a way that is possible compare the execution flow with the previous core with lots of details, such as wait states and pipeline flushes. The debug of memory interfaces, including the caches, was done almost based on such flows, comparing cores without cache vs. cores with cache, so the execution flow must match in both cases. Such scheme is basically the same as we use for reverse engineering, in a way that we can compare the new implementation (on FPGA, for example) with the old implementation (on ASIC, for example) and ensure it is working exactly that the same way. As long the observable execution flow is the same, we can assume that the internal pipeline and registers are the same!

Well, after the banner, the code switches from the asm boot to the main() code in C, so it needs use the stack, which means use load/store to the R/W memory, as well some core and libc specific features:

boot0: text@0 data@4096 stack@8192
board: simulation only (id=0)
build: darkriscv fw build Sat, 30 May 2020 00:55:21 -0300
core0: [email protected] with rv32e+MT+MAC
uart0: 115200 bps (div=868)
timr0: periodic timer=1000000Hz (io.timer=99)

Welcome to DarkRISCV!
>

The messages may vary with the version and features, but basically do quick tests of the hardware. For example, the presence of "MAC" or "MT" options are result of some kind of test that ensures that the code really worked at that point. Of course, not a complete test, but a very basic test... anyway, it must work, otherwise I may just drop the changes and try implement that feature in a different way that does not result in a bug! :O

Some bugs can be easily identified, such as a fail in the libc mul/div functions, because printf() use then to convert binary to decimal, for example. The signed shift, however, was not quickly identified because, although shifts are heavily used, they are most unsigned. So, the signed shift was identified by a collaborator that was testing a specific function on qemu and that function did not worked well on DarkRISCV. Although I do not use qemu, some colleagues use it, in a way that I try match some features, such as map the SDRAM on the same base address as qemu.

Although traces are the key to detect execution flow changes, the simulation and waveform visualization is the effective tool to understand why the traces are not working well. In the case of signed shift, for example, I added a small check in the main() test code to check the shifts and put an ebreak in order to stop the simulation (contrary to real ebreak, it just run a $stop() on Verilog code), so I checked the waveforms on registers and confirmed that the inputs were correct but the output was wrong. After some changes in the code, I confirmed again it was working and removed the debug code.

Maybe, in the future, a good improvement would be effectively test more instructions to ensure that they are working well. Not a complete test, but a quick test at the boot level, so we can lost some seconds at simulation level to ensure everything is okay! Anyway, as I mentioned before, the most persistent bug regards to the memory interface, partly because it is Harvard architecture and everything is working at the same time, partly because it is a complex combination of pipelined (instruction bus) and non-pipelined (data bus) interfaces, so there is no direct solution to prevent that future changes or features will create new bugs on such interfaces.

Regarding the Coremark, it is a good test for real targets but very hard to run on simulation level, so I typically run only a small part, just enough to compare core dumps and ensure that they match. Of course, case a bug results in that the real target is unable to run the Coremark, I will try simulate it in order to find the problem... but I typically relies on short tests, which means attack the main bugs first and left the hidden bugs for the future! :)

0 replies

pointerliu · 2024-11-12T07:42:32Z

pointerliu
Nov 12, 2024
Author

Thank you for your detailed response! It really helped me gain a deeper understanding of how developers approach debugging in real-world hardware design scenarios.
Thanks for sharing your experience! :)

0 replies

samsoniuk · 2024-11-12T16:20:01Z

samsoniuk
Nov 12, 2024
Maintainer

Just to complement, last night I added a new set of debug features that was asked by a colleague, in a way that when the ebreak handler is enabled, some core conditions can be catch and handled by software (GDB stub, for example):

__attribute__ ((interrupt ("supervisor")))
void dbg_handler(void)
{
    unsigned cause = get_scause();
    
    printf("debug: *** exception at %x w/ mcause=%x/[%s] ***\n",
        get_sepc(),cause,
        cause==0?"instruction address align":
        cause==1?"instruction data fault":
        cause==2?"illegal instruction":
        cause==3?"breakpoint":
        cause==4?"load address align":
        cause==5?"load data fault":
        cause==6?"store address align":
        cause==7?"store data fault":
                 "unknown");

    set_sepc(get_sepc()+4);
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does DarkRISCV include unit tests or regression tests? #71

{{title}}

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Does DarkRISCV include unit tests or regression tests? #71

pointerliu Nov 9, 2024

Replies: 5 comments

samsoniuk Nov 10, 2024 Maintainer

pointerliu Nov 11, 2024 Author

samsoniuk Nov 11, 2024 Maintainer

pointerliu Nov 12, 2024 Author

samsoniuk Nov 12, 2024 Maintainer

pointerliu
Nov 9, 2024

samsoniuk
Nov 10, 2024
Maintainer

pointerliu
Nov 11, 2024
Author

samsoniuk
Nov 11, 2024
Maintainer

pointerliu
Nov 12, 2024
Author

samsoniuk
Nov 12, 2024
Maintainer