Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dependency types #378

Merged
merged 17 commits into from
Jul 15, 2016
Merged

dependency types #378

merged 17 commits into from
Jul 15, 2016

Conversation

mathstuf
Copy link
Contributor


So here are my current notes for where things need to be changed to facilitate different types of dependencies. By no means exhaustive since this is a breadth-first search from depends_on to one or two levels for what needs modification/inspection. The API seems to make lines easy to draw (pkg.dependencies will need to be a method, but there aren't too many uses of that).

Looking through, it looks as though the best storage is Dict[name, (spec, reason)] rather than Dict[reason, Dict[name, spec]] (as I started in the one commit when playing around with the depends_on code).

Are there any other dep types that would matter other than build-time and run-time (run-time for a library being a dep from netcdf onto hdf5 due to a link dependency)? The only bits I saw were "we need it to build" and "we need it to use it" dependencies.

@tgamblin

@alalazo
Copy link
Member

alalazo commented Jan 20, 2016

I'll just leave my two cents for the analysis part of this feature :

  • when building a package, the run-time DAG for that package is a subgraph of the build-time DAG
  • when building a package, build dependencies matter only for the root of the run-time DAG

Then : should build dependencies be part of the computed hash? On the one hand I think we should permit to upgrade build tools seamlessly, i.e. without spack complaining that installed packages depends on the older version (and for that I would say 'no'). On the other hand it would be nice to keep track in a weaker way of the tools used in the build for the sake of reproducibility. Are there already some thoughts on how to handle this?

@eschnett
Copy link
Contributor

The build dependencies should be logged in the installed package's .spack/spec.yaml.

@mathstuf
Copy link
Contributor Author

when building a package, the run-time DAG for that package is a subgraph of the build-time DAG

Not necessarily. NumPy might be a runtime-used bit, but not needed to build and install.

when building a package, build dependencies matter only for the root of the run-time DAG

Right. Package A needing Python to build does not mean that B (which depends on A) needs Python available as well.

should build dependencies be part of the computed hash?

This, I think, is tied to the SDK/platform question raised elsewhere. If those build dependencies can alter the ABI, yes, otherwise, no. The only situations I can think of where this would happen are contrived, so I'd lean towards "no" unless someone can come up with a use case where the ABI would be affected (other than toolchain builddeps).

@alalazo
Copy link
Member

alalazo commented Jan 20, 2016

when building a package, the run-time DAG for that package is a subgraph of the build-time DAG

Not necessarily. NumPy might be a runtime-used bit, but not needed to build and install.

@mathstuf : hmm, I am not sure I got your point. Let's say I package py-foo which is a pure python package that depends on numpy at run-time. Then I'll say that py-foo:

extends(python)
depends_on(numpy)

From my perspective numpy is also needed to build and install py-foo (at least within spack) as it takes part in computing the hash for the unique installation prefix. Am I missing something?

@mathstuf
Copy link
Contributor Author

Ah, true, it would need to be there for hash computation. I was thinking of ParaView which calls NumPy from C++ code and does not need it during the build, but it does need to advertize which NumPy is expected (though it only cares that it is compatible, not a specific version).

@tgamblin
Copy link
Member

@mathstuf: I'll add some notes to the spots you marked, but I wanted to get an answer out here before the conversation drops out of my inbox (sigh).

Are there any other dep types that would matter other than build-time and run-time (run-time for a library being a dep from netcdf onto hdf5 due to a link dependency)?

It's worth considering this from the perspective of what Spack would do differently for different dependency types. Spack could do these things for deps:

  1. Add -L flags and RPATHs in the compiler wrapper, and add to LD_LIBRARY_PATH at run time.
  2. Add -I flags in he compiler wrapper.
  3. Add to PATH at build time.
  4. Add to PATH at run time (or, more generally, load the module for the package - this might include other paths and variables that are needed, like QTDIR)
  5. Include in spec DAG passed to build (and in spec.yaml)
  6. Include in version hash.

Currently Spack does all of those for all deps. As a first cut I might break that down like so:

  • Build dependencies: 3, 5
  • Link dependencies: 1, 2, 3, 5, 6
  • Run dependencies: 4, 5, 6

We could debate whether you also want a "header-only" dependency for things like boost, that would only do 2, 3. I think it's not worth the extra complexity.

We could also debate about hashing build and run deps -- I talk a bit more about that below.

should build dependencies be part of the computed hash?
This, I think, is tied to the SDK/platform question raised elsewhere. If those build dependencies can alter the ABI, yes, otherwise, no.

I think we're on the same page here, and this is exactly why compilers are treated specially in Spack. They're build deps that directly affect the ABI. Spack doesn't (yet) do any ABI checking when it builds, but it does try to keep the compilers consistent by looking up in the DAG. We also allow different compilers for different packages in the same build and we've thought about better ABI checking.

Build deps are similar in that they can be different on different nodes in the DAG. Some packages might need Python but only to run a build script (PETSc). Another package might link with PETSc, but it shouldn't have to use an entirely different version of Python just because PETSc ran its build system using another.

This will have interesting ramifications for concretization. Currently when you load up a dependency spec from a spec.yaml, you get ALL of its dependencies. You probably want to remove the build deps from the dependency's DAG and only force transitive link and run deps to be included in the new root's DAG, along with the root's build deps. That way you can avoid the weird PETSc issue above. If you include build deps in the version hash, I think you have to force consistency of build deps across the DAG. That, or you have to start including multiple specs with the same name in a single spec DAG, but that complicates things like spec['python'] --- which Python should it return? I'd rather not get into that because it seems like needless extra complexity.

So, my take is that the build deps for package A should be documented in A/.spack/spec.yaml, but they shouldn't be included in dag_hash() or included automatically as transitive dependencies.

I haven't fully thought through run dependencies, but an example would be that curl is a run dependency of git, and decent versions of tar would be a run dependency of spack. We could talk about whether run deps should be hashed or not... but if you don't hash them then we would need a better mechanism to load the right run dependencies in modules. Spack doesn't currently have a way to determine compatibility in those cases.

FWIW, Maven has a decent taxonomy of dependency types that is more extensive than the one above. It calls them dependency scope. Note that I don't advocate adopting Maven's XML syntax, but the model is decent and I like the term "scope" for this.

Maven's provided scope is an interesting one, as it corresponds to what people are doing when they explicitly leave a dep out of package.py to pull in the system version of it. Personally, I think we don't need this, and it can be taken care of by the externals support in #120 and #309. In other words, I think provided is a property of the dependency and not of the relationship between the dependent and the dependency.

That said, to make externals more usable, it would be nice to generalize detection of external build and run dependencies from the current way compilers are detected. #309 adds support for multiple compiler detection strategies -- perhaps we could add system package detection strategies in a similar way.

Thoughts?

@mathstuf
Copy link
Contributor Author

On Wed, Jan 20, 2016 at 12:08:11 -0800, Todd Gamblin wrote:

@mathstuf: I'll add some notes to the spots you marked, but I wanted
to get an answer out here before the conversation drops out of my
inbox (sigh).

I know that story :) .

Are there any other dep types that would matter other than
build-time and run-time (run-time for a library being a dep from
netcdf onto hdf5 due to a link dependency)?

It's worth considering this from the perspective of what Spack would
do differently for different dependency types. Spack could do these
things for deps:

  1. Add -L flags and RPATHs in the compiler wrapper, and add to LD_LIBRARY_PATH at run time.

Any reason for rpath and LD_LIBRARY_PATH?

  1. Add -I flags in he compiler wrapper.
  2. Add to PATH at build time.
  3. Add to PATH at run time (or, more generally, load the module for
    the package - this might include other paths and variables that are
    needed, like QTDIR)

It only does this for directories that exist right? For example, don't
add libjpeg to PATH if $jpeg_root/bin doesn't exist? A separate branch
in any case.

  1. Include in spec DAG passed to build (and in spec.yaml)
  2. Include in version hash.

Currently Spack does all of those for all deps. As a first cut I
might break that down like so:

  • Build dependencies: 3, 5

Build deps should probably lose #5 (they'll pick up that step if they're
also link deps).

  • Link dependencies: 1, 2, 3, 5, 6
  • Run dependencies: 4, 5, 6

The only use case which differentiates link and run dependencies are
"plugins" which is handled by the extends directive already, so it
does seem that build and run are sufficient (build dep for linking
usually turns into a link dependency and is the common case, so that is
the default…where I've now added a comment for the rationale).

should build dependencies be part of the computed hash?
This, I think, is tied to the SDK/platform question raised
elsewhere. If those build dependencies can alter the ABI, yes,
otherwise, no.

I think we're on the same page here, and this is exactly why compilers
are treated specially in Spack. They're build deps that directly
affect the ABI. Spack doesn't (yet) do any ABI checking when it
builds, but it does try to keep the compilers consistent by looking up
in the DAG. We also allow different compilers for different packages
in the same build and we've thought about better ABI checking.

The only tools which I can think which do this:

  • compilers;
  • standard libraries (usually bundled with the compiler, but things
    like STLSoft exist);
  • specialized compilers (e.g., CUDA); and
  • wrapping tools (e.g., swig, shiboken).

Everything else is, if it's only a build dependency, irrelevant to the
ABI.

Build deps are similar in that they can be different on different
nodes in the DAG. Some packages might need Python but only to run a
build script (PETSc). Another package might link with PETSc, but it
shouldn't have to use an entirely different version of Python just
because PETSc ran its build system using another.

Correct. My thought was to only look up runtime deps when generating the
DAG, so PETSc would never introduce Python to a DAG other than its own
build.

This will have interesting ramifications for concretization.
Currently when you load up a dependency spec from a spec.yaml, you
get ALL of its dependencies.

The yaml file will have to specify the kind of dependency, so this is
not that different. Do these files already have a version field?

                         You probably want to remove the build

deps from the dependency's DAG and only force transitive link and run
deps to be included in the new root's DAG, along with the root's build
deps. That way you can avoid the weird PETSc issue above. If you
include build deps in the version hash, I think you have to force
consistency of build deps across the DAG. That, or you have to start
including multiple specs with the same name in a single spec DAG, but
that complicates things like spec['python'] --- which Python should
it return? I'd rather not get into that because it seems like
needless extra complexity.

So, my take is that the build deps for package A should be documented
in A/.spack/spec.yaml, but they shouldn't be included in
dag_hash() or included automatically as transitive dependencies.

I think we're on the same page here too: build-only deps are invisible
except when building the package with that dependency.

I haven't fully thought through run dependencies, but an example would
be that curl is a run dependency of git, and decent versions of
tar would be a run dependency of spack. We could talk about
whether run deps should be hashed or not... but if you don't hash them
then we would need a better mechanism to load the right run
dependencies in modules. Spack doesn't currently have a way to
determine compatibility in those cases.

These tools tend to be "just have one available" rather than any kind of
specifics. The problem is usually "it's a BSD coreutils set" where all
kinds of GNU flags and conveniences (shakes fist at OS X's find not
defaulting to searching .
) aren't available (though portable programs
shouldn't be using them blindly anyways).

      and I like the term "scope" for this.

Yeah, that sounds better.

Maven's provided scope is an interesting one, as it corresponds to
what people are doing when they explicitly leave a dep out of
package.py to pull in the system version of it. Personally, I think
we don't need this, and it can be taken care of by the externals
support in #120 and #309. In other words, I think provided is a
property of the dependency and not of the relationship between the
dependent and the dependency.

+1

That said, to make externals more usable, it would be nice to
generalize detection of external build and run dependencies from the
current way compilers are detected. #309 adds support for multiple
compiler detection strategies -- perhaps we could add system package
detection strategies in a similar way.

Packages could have methods which detect if a version of the package is
available already. I will say that, from experience, these things are a
PITA to maintain across all kinds of disparate environments.

@mathstuf
Copy link
Contributor Author

Down to 9 test failures…

@mathstuf
Copy link
Contributor Author

Down to 8 errors now. It seems to be something about concretizing the specs where the concrete version of the spec is a duplicate dependency when added. I can't seem to find where in the diff something changed related to this; specs are copied where they used to be copied, deleted where they used to be deleted, and so on. Where else are such bits manipulated?

@mathstuf
Copy link
Contributor Author

Tests all pass locally. Any comments on implementation before I go back and edit commit messages? Any TODO items which can be removed?

@tgamblin
Copy link
Member

tgamblin commented Mar 2, 2016

@mathstuf: I'm going to try to get around to reviewing this before the telcon tomorrow, but if I don't then I'll definitely get to it before Friday.

@mathstuf
Copy link
Contributor Author

mathstuf commented Mar 4, 2016

Successfully built qhull after changing its dependency type to DEP_BUILD.

@mathstuf
Copy link
Contributor Author

Updated with some bug fixes (now putting them into separate commits, but can squash them back if wanted). I've also gone through and update dependencies to be more accurate (Python module dependencies are build and run deps, but not install deps; autotools and cmake are build deps; Python modules from C++ programs are run deps (ParaView)).

@mathstuf
Copy link
Contributor Author

@tgamblin Ping? :)

@tgamblin
Copy link
Member

Reviewing it this afternoon! 😄

@tgamblin
Copy link
Member

ok so by this afternoon I meant late tonight, but here goes.

m.cmake = which("cmake")
def _cmake(*args, **kwargs):
which('cmake')(*args, **kwargs)
m.cmake = _cmake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also just do m.cmake = Executable('cmake') here.

@tgamblin
Copy link
Member

@mathstuf: also I can make the change if you want.

if deptypes is None:
# The default deptype is build and link because the common case is to
# build against a library which then turns into a runtime dependency
# due to the linker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an outdated comment. From the description of run and link, i got an impression that run is primarily for extensions of python and alike.

i suppose mpi would be build+link+run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, actually. The package itself does not call mpirun at runtime. The user calls that before the app runs. So, MPI is just a build and link dependency of the application. A run dependency would be something like curl for git.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl and git are probably build dependencies?

I think for distant observer like me it is not clear what run dependencies are in the context of package manager like Spack? You don't normally use/run application using Spack? Maybe this could be clarified in the description of run at the top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easier way to think about this is whether, when A -> B, B needs to be in the PATH when you module load A in order for A to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken run is thereby used exclusively in spack load and should not affect environment of spack install at all? I think example you mentioned will be helpful for package developers to understand the difference. So i would mention it in the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

To understand this you have to distinguish runtime from build time. run dependencies are for when a user runs the installed package. build dependencies are for when spack builds the package. Those are two different environments. This is the same reason why things like Package.setup_environment() distinguish between spack_env and run_env.

@alalazo
Copy link
Member

alalazo commented Jul 14, 2016

@tgamblin

I want to merge this with one change -- can we change deptypes= to type= in the depends_on directive?

In that way you'll mask builtin type...

@tgamblin
Copy link
Member

tgamblin commented Jul 14, 2016

I want to merge this with one change -- can we change deptypes= to type= in the depends_on directive?

In that way you'll mask builtin type...

Only within the depends_on directive, where it's in scope, and if we have to we can use **kwargs to avoid that. It doesn't affect packagers.

In a package, kwargs don't pollute the surrounding scope.

When var/spack/stage is a symlink, the tests fail since realpath is used
on the resulting path, but not the original path, so the string compare
fails. Smarter path testing might be a better option.
deptypes: allow for different queries

For things like Python modules, they are required for the build and
runtime, but rather than adding a new parameter for what to query across
the dependency DAG, just expose the recursive query parameter.
We want the run dependencies of all build and run dependencies of the
current package.
Unsure about this, but this is probably true.
The sundials doesn't use CMake directly, but it is referenced in the
patch step. I suspect it calls CMake somewhere else in its build system.
@mathstuf
Copy link
Contributor Author

@tgamblin Last commit fixes the "Can only install concrete packages" error.

@mathstuf
Copy link
Contributor Author

(Still building Qt.)

@mathstuf
Copy link
Contributor Author

Qt built successfully.

@tgamblin tgamblin merged commit 077848f into spack:develop Jul 15, 2016
@mathstuf mathstuf deleted the dep-types branch July 15, 2016 15:40
@paulhopkins paulhopkins mentioned this pull request Jul 18, 2016
climbfuji added a commit to climbfuji/spack that referenced this pull request Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants