Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename src directory to project_name #140

Closed
oczkoisse opened this issue Aug 30, 2018 · 12 comments
Closed

Rename src directory to project_name #140

oczkoisse opened this issue Aug 30, 2018 · 12 comments
Milestone

Comments

@oczkoisse
Copy link

oczkoisse commented Aug 30, 2018

First off, thanks for the wonderful template! I think this should work very well for my future projects. However, I could appreciate your thoughts on some aspects of the template:

  • Filenames like make_features.py and make_dataset.py in the src package suggest that those are scripts, not modules. Wouldn't it be better separation to let the reusable code live in modules inside src package while the scripts reside in a scripts directory in project root? This way both the scripts and the notebooks could use the modules inside the src package. Could you please elaborate the rationale behind the current template layout, which seems to have just scripts in the src package?

  • If I make my own project-specific packages, do they go under src? If I have a custom package that models, say, the dataset structure, does it make more sense to put it under src/data instead?

One minor nitpick: the name src could have been renamed to the project name itself. It would be clearer, I think, and also integrate well with a popular project structure suggested here, although I do understand that it may be a matter of personal preference.

@TobiasSkovgaardJepsen
Copy link

TobiasSkovgaardJepsen commented Aug 31, 2018

As someone who recently discovered this template:

I had the same initial thought, but I think (and someone please correct me if I am wrong) that one should view the src subfolders (data, models, features) as separate apps. In that case, it makes sense that the make_dataset.py and make_features.py are stored in the folder specific to the app as a "main" file. This structure has high functional cohesion and is similar to how the projects in the web framework Django is organised: https://simpleisbetterthancomplex.com/tutorial/2018/08/27/how-to-create-custom-django-management-commands.html

One minor nitpick: the name src could have been renamed to the project name itself. It would be clearer, I think, and also integrate well with a popular project structure suggested here, although I do understand that it may be a matter of personal preference.

As you suggest, this is probably a matter of personal preference. For instance, I would usually call such a folder python because I usually also have a significant amount of SQL and bash scripts in sql and bash folders, respectively.

@pjbull
Copy link
Member

pjbull commented Aug 31, 2018

(1) Agreed that src -> {{project name}} makes sense. We'd accept that PR.
(2) At times it can be useful to distinguish scripts from packages. In our experience, we don't find that we need this distinction in most of the ML/DS projects we do. Often our code that ends up in a package also has a CLI that you can enter like a script. If projects get built up to the point where they have scripts and packages, users can create a new folder. Just not sure there is a strong case for this to be built in as boilerplate for the default. I think the simplicity of one folder for source code is probably best for now.

@oczkoisse
Copy link
Author

@TobiasSkovgaardJepsen Thanks for replying! I agree that considering the src subfolders as separate apps provides better separation and cohesion. Maybe I missed something in the documentation, but having no background in Django, this wasn't immediately clear to me. The fact that src is a package made me immediately think of the subfolders as subpackages, and everything in them as modules. My bad :) I do think, however, that this rationale could be briefly mentioned in the docs for better understanding for newcomers.

@pjbull I'd be happy to create a PR at some point, given that I'll probably need to get up to speed on cookiecutter. Along with @TobiasSkovgaardJepsen 's reply above, your clarification about the use of modules like CLI apps makes a lot of sense too. Thanks!

@pjbull
Copy link
Member

pjbull commented Oct 24, 2018

Renaming this issue to represent (1) above so that we can be in line with Python best practices, which in turn should help with any confusion about where packages are installed from as has been brought up in #124

@pjbull pjbull changed the title Shouldn't scripts be kept separately outside src directory? Rename src directory to project_name Oct 24, 2018
@hackalog
Copy link

I've been running with a variant of this idea for a couple of months. I added a {{ cookiecutter.module_name }} as there have been cases where the project name is a little unwieldy for use as the module name. I also set the default to src, so the existing behavior is unchanged unless you choose to use it.

In practice, I tend to use src until there's a reason to rename the module. (often, that's never).

I'll put together a diff for this

@hackalog
Copy link

On the make_features.py and so on, I think these should be invoked as modules. e.g. My makefile does things like:

fetch:
	$(PYTHON_INTERPRETER) -m {{ cookiecutter.module_name }}.data.make_dataset fetch

that -m is important, as the make_dataset.py can then do relative imports from within the src module itself

@hackalog
Copy link

I've made these changes (both adding module_name and using -m when invoking things like make_dataset.py in PR #148

@BadToast
Copy link

BadToast commented Oct 25, 2018

I was about to open a new issue for this and then found this. So I'm adding my thoughts here. I think this thread is moving in the right direction.

I do think that the cookiecutter should be using module_name as the default (@hackalog ). Or at least as one of the questions during the cookiecutter process you should be asking whether you want to use module_name or src. Here's my reasoning and I hope it comes across as simple and good-intentioned.

First, every python package on the planet uses its own name and not src. We'd be just following convention/best practices (which seems to be in line with what @pjbull is saying). Second, there's a very immediate problem if you use src for every package/egg you build using this pipeline. Specifically, you start having conflicts as soon as you have more than one package. So you if you have cat_herding and predictive_cats both wind up installing themselves to src. It shouldn't be src.models.predict for both. It should be cat_herding.models.predict and predictive_cats.models.predict.

My guess is that the original intent of this project was to act as a good template for just playing around. However, at some point you might want to build an actual package out of it. All the architecture is already there, it's just currently breaking convention.

Keep up the great work.

@leblancfg
Copy link

@pjbull, any news on this front, regarding @hackalog's PR? Thanks!

@asteppke
Copy link

asteppke commented Sep 3, 2019

Changing the package name away from src has many benefits as pointed out above.

What can be considered for the directory structure though would not be a simple rename of the src directory to {{project name}}. The disadvantage of this is then that we have project name/project name where one would immediately ask why there is a subfolder with exactly the same name as the parent. A way out of that would be to adopt the directory structure similar to what is suggested by the pyscaffold people:

./project name
    /docs
    /[...]
    /src
        /project name
            __init__.py
            skeleton.py
            /models
                predict.py
                [...]

    /[...]

Then we keep the nice high level directory description of the underlying files and still a descriptive name for the module.

@nraw
Copy link

nraw commented Nov 13, 2019

The disadvantage of this is then that we have project name/project name where one would immediately ask why there is a subfolder with exactly the same name as the parent.

Sorry, but that really does not sound like a strong argument in any sense? Similarly, the first question people would ask in the structure that you proposed is "Why is there a subfolder in src with the same name as the parent of src?"

@jayqi
Copy link
Member

jayqi commented Aug 30, 2023

Closing—implemented for V2.

@jayqi jayqi closed this as completed Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants