Does DarkRISCV include unit tests or regression tests? #71
-
I noticed that the project repository doesn’t contain any test files. Could you clarify how functional bugs are identified or addressed? Are there any unit tests, regression tests, or other testing methodologies used to ensure the reliability and stability of the project? Thanks a lot ~ :) |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Of course! We use a kind of "rocket science" to make DarkRISCV work: we just throw it in the air and wait for it to fly... or explode! it worked very well with Saturn V, for example... anyway, in case of explosion, we analyze the remains to try to find out why it exploded! Jokes apart, the real methodology that I use is not far from this! since I worked to many years with reverse engineering of ASICs and there is no much way to test some black boxes other than compare the output with the original, so I typically relies on comparing the instruction by instruction flow between versions, in a way that the same binary code compiled for RISC-V must decode and execute exactly the same way, regardless the changes in the logic. For this purpose there are some example codes that I periodically run, in order to try find divergentes, such as the coremark test. Although it is not designed to check the RISC-V compliance, the fact that such complex code compiled for RISC-V runs on DarkRISCV and ensures that the core is working fine. Apart from the main project, there are other collaborators working with different methodologies, including run the RISC-V compliance tests. However, because there are so many HW and SW different setups, it is very hard to map and test all possible combinations, so all such effort probably covers only a small part of the possibilities! After all, I have some real products running with DarkRISCV and they are really very stable and reliable, but they are working with different versions and different HW/SW setups, so there is not a true answer regarding the reliability and stability other than test by yourself with your own code! :) |
Beta Was this translation helpful? Give feedback.
-
Thank you for detailing the testing approach for DarkRISCV! 👍 I'm very interested in the bugs encountered during DarkRISCV’s development and would love to reproduce some of them to deepen my understanding. I noticed certain commits in your Git history with messages like “fix ...” that seem to address specific bugs. Would running the CoreMark test and comparing results between buggy and corrected versions help reproduce some of these bugs? For instance, in a recent commit related to sign extension, was this bug identified through CoreMark testing or discovered in your real-world applications running DarkRISCV? I’d also appreciate any advice you have on reproducing these historical bugs. Thank you again for your insights! 🙏 |
Beta Was this translation helpful? Give feedback.
-
Well, apart from normal implementation bugs (signed shift, for example), the most persistent bug is related to memory interfaces, on both instruction and data buses. But, contrary to implementation bugs that are hidden until someone needs use them, the memory access bugs are quickly identified because the core is unable to boot! The instruction related problems are quickly identified because the boot code in asm will uncompress and print the following banner:
The code is fully in assembly and do not use stack or heap, so it works basically w/ registers and the same memory as the code is stored. Although it does not use stack or heap, does not means that load/store are not used, but in this case the loads are from the code memory and stores are only for the UART, so no real R/W memory is used. In this segment, the bugs are related to memory access and pipeline, in special the fetch and decode stages. In the past, I used some $display() messages to dump the fetched words, so was possible compare previous implementation and find memory and pipeline bugs, but nowadays there is a trace option in the core, so the executed instructions are partially decoded and dumped in a way that is possible compare the execution flow with the previous core with lots of details, such as wait states and pipeline flushes. The debug of memory interfaces, including the caches, was done almost based on such flows, comparing cores without cache vs. cores with cache, so the execution flow must match in both cases. Such scheme is basically the same as we use for reverse engineering, in a way that we can compare the new implementation (on FPGA, for example) with the old implementation (on ASIC, for example) and ensure it is working exactly that the same way. As long the observable execution flow is the same, we can assume that the internal pipeline and registers are the same! Well, after the banner, the code switches from the asm boot to the main() code in C, so it needs use the stack, which means use load/store to the R/W memory, as well some core and libc specific features:
The messages may vary with the version and features, but basically do quick tests of the hardware. For example, the presence of "MAC" or "MT" options are result of some kind of test that ensures that the code really worked at that point. Of course, not a complete test, but a very basic test... anyway, it must work, otherwise I may just drop the changes and try implement that feature in a different way that does not result in a bug! :O Some bugs can be easily identified, such as a fail in the libc mul/div functions, because printf() use then to convert binary to decimal, for example. The signed shift, however, was not quickly identified because, although shifts are heavily used, they are most unsigned. So, the signed shift was identified by a collaborator that was testing a specific function on qemu and that function did not worked well on DarkRISCV. Although I do not use qemu, some colleagues use it, in a way that I try match some features, such as map the SDRAM on the same base address as qemu. Although traces are the key to detect execution flow changes, the simulation and waveform visualization is the effective tool to understand why the traces are not working well. In the case of signed shift, for example, I added a small check in the main() test code to check the shifts and put an ebreak in order to stop the simulation (contrary to real ebreak, it just run a $stop() on Verilog code), so I checked the waveforms on registers and confirmed that the inputs were correct but the output was wrong. After some changes in the code, I confirmed again it was working and removed the debug code. Maybe, in the future, a good improvement would be effectively test more instructions to ensure that they are working well. Not a complete test, but a quick test at the boot level, so we can lost some seconds at simulation level to ensure everything is okay! Anyway, as I mentioned before, the most persistent bug regards to the memory interface, partly because it is Harvard architecture and everything is working at the same time, partly because it is a complex combination of pipelined (instruction bus) and non-pipelined (data bus) interfaces, so there is no direct solution to prevent that future changes or features will create new bugs on such interfaces. Regarding the Coremark, it is a good test for real targets but very hard to run on simulation level, so I typically run only a small part, just enough to compare core dumps and ensure that they match. Of course, case a bug results in that the real target is unable to run the Coremark, I will try simulate it in order to find the problem... but I typically relies on short tests, which means attack the main bugs first and left the hidden bugs for the future! :) |
Beta Was this translation helpful? Give feedback.
-
Thank you for your detailed response! It really helped me gain a deeper understanding of how developers approach debugging in real-world hardware design scenarios. |
Beta Was this translation helpful? Give feedback.
-
Just to complement, last night I added a new set of debug features that was asked by a colleague, in a way that when the ebreak handler is enabled, some core conditions can be catch and handled by software (GDB stub, for example):
|
Beta Was this translation helpful? Give feedback.
Well, apart from normal implementation bugs (signed shift, for example), the most persistent bug is related to memory interfaces, on both instruction and data buses. But, contrary to implementation bugs that are hidden until someone needs use them, the memory access bugs are quickly identified because the core is unable to boot!
The instruction related problems are quickly identified because the boot code in asm will uncompress and print the following banner: