-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster builds: Decoupled mondo
repo artefacts
#619
Comments
Please assign this to me, this should not be happening |
@matentzn I marked it as a low priority; it's just an idea I came up with and the details kinda poured out of my head. We can discuss some other time. I am curious to what you mean by "this should not be happening". There is no bug here. This is an optimization. Unless perhaps you mean that it shouldn't take 45 minutes for these things to be built. |
I thought we just spend a lot of resources fixing this: https://github.com/monarch-initiative/mondo-ingest/pull/594/files |
Ah, this idea improves upon that further. What we have done already is making so that whenever you run commands that use mondo.owl etc, it will always check and rebuild them if they are out of date. What this issue proposes is a decoupled process that uses a GitHub action to build these regularly and then upload them to get up releases. Then, when you run a goal which needs mondo.owl, it will download them if the local files are out of date. |
I guess if the process consumes that much time this is a possibility, but given the fact that you may need to rerun from the repo from time to time anyways (for example if you need a change that was just pushed) it introduces a bit of complexity in the code. The PR I linked (please ensure it is seen thru as I dont have eyes on it) should make sure that the code is only run once for any state of the Mondo |
True. If you are running a goal that needs
Oh, Refactor git refresh system (small refactor) #594 didn't have an assignee. It looks like you got stuck on an issue during your build. I've assigned this to myself now, but it is a low priority.
No worries; that has already been completed in:
I think you mean releases? I haven't had any such problems. We have ICD11 and OMIM pipelines set up using GitHub releases, and there's been no issue yet. |
Yes but you have specific repos for that. Are you suggesting to create a mondo.owl nightly built repo so you can create github releases? As if it was a "source"? I thought you meant to create a github action in the mondo-ingest repo that caches the built files in the GHA cache. In any case, maybe your idea of a separate repo isnt a bad one; Just make sure its raised it a call with Trish to discuss pros and cons.
Dropped the ball on this and I wont be able to pick it up again. Maybe just try to review the diff carefully until you understand the point of all the changes, and if you dont understand something / or want to question a modelling decision, just let me know? |
I'll talk about #594 w/ Trish and prioritize. For
I'd rather not do (b) and create a separate repo, but I think it is the easiest solution. It would run more often than nightly. In TermHub, we run things every 20 minutes. It will check if there are any updates to |
I am neither opposed nor strongly in favour, I think the idea is generally worth considering as it could shave off 30 min of every build we do.. |
Overview
I've recently started to refresh my "
mondo
repo artefacts" once a week. This has led me to see that building these artefacts takes up a significant amount of build time, usually around 45 minutes (about ~20% of the current avg ~3.5hr build time).These files are actually dependencies of the
mondo-ingest
pipeline. There are no changes anywhere inmondo-ingest
that would have any effect on how these artefacts are built. It makes sense to decouple them from both a logical and efficiency standpoint.The artefacts
These are the artefacts and their dependency structure. Nested items depend on the items 1 level above.
tmp/mondo/
- this isn't a release artefact; it's a cloned Mondo directory. It isn't used anywhere except to build the following artefacts.mondos.ssom.tsv
mondo.owl
mondo.db
Possible implementation details
Where to store releases
mondo
repo releasesmondo-ingest
releases.(a) is a bad idea, because this just makes a complicated release pipeline even more complex. (b) This is my favorite solution. (c) This is not a bad solution, but it would have similar problems to 'a'. We want to eventually move to doing
mondo-ingest
releases, and so now we would have to deal with two kinds of releases inmondo-ingest
: these files, and the actualmondo-ingest
build outputs.When / how to trigger a new release of these artefacts
a. Whenever
mondo
has a new commit onmain
b. Whenever any file that "the artefacts" depend on changes
md5
hashes of each of these files, and whenevermain
changes, do a check to see if any hashes changed. It may be thatmondo-edit.obo
is the only one.A GitHub action would run regularly, check on changes, and run the release.
Changes that would need to be made to
mondo-ingest
pipelineFor these goals,
mondo.owl
, etc, they would check the GitHub releases and see if a new release came out that is newer than what is cached locally (if anything). If so, then it wouldwget
these artefacts.Possible challenges
Memory requirements for GH action: I don't think that this is an issue yet, but I'm not 100% sure. I have to monitor these builds and see what the peak memory usage is and make sure that it is less than the maximum available for an (ideally free) GH action runner (though we might be able to switch to MacOS instead of ubuntu runners for more memory if needed). Currently,
mondo-ingest
builds are peaking for me at around 32GB of memory, whereas these artefacts peak at around 5.5GB, which so far is I think workable.The text was updated successfully, but these errors were encountered: