Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage collection at the end from .acv files is taking significant time #175

Open
tilak-nanavati-opshub opened this issue Nov 25, 2022 · 11 comments

Comments

@tilak-nanavati-opshub
Copy link

Hi Steve,

Altcover is working great. capturing the coverage of all the instrumented DLLs, but recently we have observed that the .acv files are growing and reaching ~300 MB, (and there are multiple such files), from these while collecting the coverage, it is taking a significant amount of time.

Observation: A file (~300 MB in size) takes more than 48 mins.

Is there any way we can reduce this time?

It would be a great help.

Thanks!

@SteveGilham
Copy link
Owner

Exactly what is written to the .acv visits file(s) will depend on how you are driving AltCover. In most use cases, each visit is written eagerly, with "classic" mode and the dotnet test integration being exceptions.

The output to file is buffered and compressed to reduce the file I/O bottleneck impact upon the system being tested, with the size on disk per visit record being dependent on the compressibility of the visit stream. Multiple such files are a result of concurrency handling measures; for example separate AppContexts (or AppDomains) each have their own output stream, again to reduce the impact on the system being recorded. On collection, the visit records are combined into a Dictionary of Dictionarys, a slightly worse than linear behaviour in the number of records, so from then on the process depends on the number of points visited rather than the number of individual visits.

For a simple visit process (e.g. one pass through most code), the compression makes each {module,line,count} visit around 12-13 bytes; but in instances where there is a tight loop with a great deal of repetition in the output stream, compression ratios down to tens of visits per output byte have been observed. A 300Mb file would thus represent at least ~25 million distinct visits, but potentially several billion.

If you are using the command line, --defer option (or running in "classic" mode) from the command line will accumulate the visits in memory while the tests are running, reducing the overhead of repeat visits (turning n instances of {module,line,1} to a single {module,line,n}), but at the cost of leaving the output to the process exit handler. In plain dotnet there is no timeout limit for this handler; in the .Net framework there is a grace period of only 2 seconds by default and the VSTest tool leaves only 100ms (typically insufficient). The AltCover dotnet test integration works around this by injecting a data collector object that executes before the process exits, so implicitly does --defer.

Alternatively, the --single option can be used to skip collecting information about repeat visits; the effect on the output volume will be similar to that of --defer (turning n instances of {module,line,1} to a single {module,line,1}).

Another approach would be to identify any hot spots, and arrange tests of the hot spots to be separate from the tests of the rest of the system (run with the hot spot types/methods involved being excluded from coverage), combining them all in post-processing.

Related issues - #57 and #111

@SteveGilham
Copy link
Owner

Finally got around to doing some performance testing, and with a program dominated by a tight loop -

let n = 1000
for i in 1..n do
  for j in 1..n do
    use s = new System.IO.MemoryStream(1024)
    use b = new System.IO.BinaryWriter(s)
    for k in 1..n do
      System.DateTime.UtcNow.Ticks
      |> b.Write

generated a > 300MB .acv file, which I processed on old, low-powered hardware and got

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (328,069,077b)
36,435,929 visits recorded in 00:01:12.7423790 (500,890 visits/sec)
A total of 36,435,929 visits recorded
Coverage statistics flushing took 0.17 seconds

so it doesn't in fact look like the bottleneck is in the unpacking of the coverage data - indeed with a smaller test set, the throughput was lower

... C:\Users\steve\Documents\GitHub\altcover\coverage.xml.0.acv (36,796,001b)
4,070,503 visits recorded in 00:00:08.6479199 (470,692 visits/sec)

as the fraction of time spent unpacking dropped.

Bearing all that in mind, the likelihood is that the issue reported here is a result of the size of the code being covered - and the consequent scaling of the in-memory representation of the report as it's being filled in - of necessity, a random-access thing. So, rather than just the size of the .acv files, of more interest, are scales like the size of the raw coverage report, and how many assemblies are involved across how many kLoC.

I've not done any comparisons at scale yet to compare the performance of the JSON representation vs the XML one, so cannot advise whether that might be another thing to try; but in any case the points made earlier -- reducing the number of data points collected and partitioning the testing across units before aggregating will still speed the process up.

@tilak-nanavati-opshub
Copy link
Author

tilak-nanavati-opshub commented Nov 15, 2023

Hi Steve,

Thank you for sharing the detailed insights about the time performance metrics with a quick test-set.

We appreciate the observations provided in the comment.

We also believe that a huge code repository can create a difference, i.e. introduce significant time latency while collecting the code coverage. We might need to look into how we can minimize the data points collected (need to investigate further into this).

May be we can start partitioning the testing modules and club the coverage collected at the end, all in parallel, that seems a good way ahead.

We will post the update how it goes!

Thanks again!

@tilak-nanavati-opshub
Copy link
Author

Just to confirm, there's no AltCover specific setting to speed up the coverage collection process, right? As per our understanding from the documentation.

@bheemvennapureddy
Copy link

Quite a bit of a difference from opencover from coverlet vs altcover coverage reports taking a huge time

@bheemvennapureddy
Copy link

Screenshot 2024-04-05 at 3 30 36 PM Screenshot 2024-04-05 at 3 15 22 PM

Opencover takes 2m 27 secs with Altcover it takes 14 mins

@bheemvennapureddy
Copy link

@SteveGilham Any insights on what could be done here ?

@bheemvennapureddy
Copy link

I see the instrumentation takes the quite long time compared to open cover with coverlet

@SteveGilham
Copy link
Owner

Up front, I don't think I can offer any quick wins with this, except that if you're getting many (hundreds or more) visits per location, there is some speed-up to be had from using the /p:AltCoverSingle="true" option.

At the moment I don't have any clear indications of hot-spots; and any ground-up rewrite would be at best speculative.

Can you indicate some metrics for the size of the work you're doing - how many kloc, how many total visits, that sort of thing.

@SteveGilham
Copy link
Owner

Something throwing an exception should be filed as a separate issue. And could you please post the log file with the exception trace in.

@bheemvennapureddy
Copy link

@tilak-nanavati-opshub removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants