-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you feel about the Kedro project template? #208
Comments
|
|
I am a big fan of keeping the data subdirectories because I think the data hierachy that is being used makes sense. It's also good to keep intermediate data artifacts cached on disk for debugging, partial reruns, and guides new users into thinking about how to manage such artifacts in their projects. I extended ProjectContext to save all outputs to the results folder in timestamped directories but there's no such functionality out of the box (I point the data catalog to |
Perhaps we should include a short readme within the |
Thank you! This is great insight! I have a further question, is anyone using the global |
@yetudada Yes, I put the node logic there and in each module only try to expose functions which are used as nodes (helper functions are set to private). And then for most of the actual heavy duty code like models, evaluation, metrics I put them into submodules in another folder like |
Agreed with most what @ThomasLittrell1 wrote. Suggestion: Opinions: I used results once where I dumped some charts and I used notebooks once where I created an analysis of what was done. I think it's okay to have them as then people know where to look for them, but would keep them |
I really like gatsby's idea of starters. You can start a new project with any starter from the command line with A similar effect can be achieved with options in cookiecutter, but I think the community aspect with the gallery make it really nice. gatsby starters how users contribute starters |
We are currently using a custom version of node_global from kedro's predecessor. All of our nodes live in We do not use For normal use I would not recommend What happened to the .ipython directory? One of my biggest complaints with the template was the magic behind %run, and how it changes your current working directory. Our template makes it so that you can |
I actually like kedro because it has such an opinionated, detailed project template. If anything, more documentation (e.g. READMEs in each directory) would be helpful. I want to minimize the number of decisions I have to make about where data, results, code, docs, etc. have to go for both myself and my team. Just my two cents. |
@ThomasLittrell1 What else do you think would help with making Kedro less intimidating for new users? And how do you usually explain data and pipeline abstraction to them? And how can we make this easier for you? @khdlim What type of outputs do you have and are they listed in the Data Catalog? It sounds like you could use @nraw We do know about the library components of Kedro like the @WaylonWalker Would you want a way to create a directory structure your way? And, we'll look into the changes to the .ipython directory! Thanks for raising this. @benjaminjack This is super feedback! Thank you guys so much for this! |
@yetudada I think a curated list of starters is a very powerful tool and can help have different reccomended ways to start a project. I also see where certain teams will have their own internal things they will need to add that does not make sense, or may not be possible, to open source. For these cases should the recomendation be to fork the template repo you want to build from and have your users use a Not sure if this answers the question you were asking. |
@yetudada Usually I'm just showing them a function, then showing the function in the context of the pipeline and, hey, that thing in To explain the abstractions, I try to be make it as non-abstract as possible. Most of the people I work with come from notebooks, so I like to show how kedro builds on those. Ideas like: if you're used to having a project broken into different cells, each of those cells is now a node and executing a notebook or section of a notebook is like executing a pipeline. Then I point to how they read in data and maybe save a model after training to talk about a catalog and data abstraction. I'm honestly not sure how to improve on that process because the kedro docs/tutorial are already great. Maybe two quick ideas:
|
Good initiative @yetudada, thank you for sharing! I really like the suggested changes, although I would definitely keep Data Engineering Convention subdirectories in Another suggestion would be to provide a more opinionated template or framework for |
Hey everyone! Here's an action on the proposed changes to the project template:
Eventually, resulting in a
What we haven't figured out yet:
What else we're going to work on:
|
The way that cookiecutter is typically used is to run I think that it would be really cool to have various projects like the spaceflights tutorial already completed as examples that could easily be accessed with
There are quite a number of other cookie cutter templates out there. If memory serves me right many of them use the |
I completely agree with @WaylonWalker here regarding starters. To make the separation of Kedro's library and framework more explicit, I think we should remove One cautionary note about having too many starters though: A huge value Kedro brings to an organisation is the shared mental model of an analytics pipeline among different functions (DE, DS, SWE, etc.). This shared mental model is enforced by the convention laid out in |
As discussed in #397 and the Zoom meeting with Lais and @921kiyo yesterday, I prepared simplified and enhanced Kedro project template suitable for both beginners and experts at: https://github.com/Minyus/kedro_template The major change is that
I would suggest this Kedro project template because users can:
|
This is great @Minyus! I think you might just be a candidate for the Kedro starters project that is due out in the next release of Kedro. @limdauto took inspiration from @WaylonWalker's idea and we have a way to use your version of the Kedro template to jumpstart your project. |
Hi everyone, we have just released 0.16.3. In this version, we add the ability for you to specify your own template when creating a new Kedro project by calling
Documentation for this feature could be found here: https://kedro.readthedocs.io/en/latest/02_get_started/06_starters.html. Please take a look and let me know if you have any questions / comments / feedbacks. To add a bit more information, the mental model that we are trying to reinforce, which is also in line with what everyone is discussing here, is: modular pipeline is for business logic reusability and starters are for convention reusability. If you have any idea on how to improve the synergy of these two features to make reusability in Kedro even more robust, we would love to learn more. |
You can see a walk-through of Kedro Starters by checking out @dataengineerone. |
Thank you everyone for weighing in. In light of the starters being released, I believe this issue can now be closed. |
Introduction
The joy about open-sourcing Kedro is that we've been exposed to diverse opinions and use cases that we didn't think Kedro covered. We're going to proactively ask for feedback and you will see a lot more "How do you feel about ..." GitHub issues raised by the Kedro maintainers as we try to capture your thoughts on specific issues.
First in the series was a question around introducing telemetry into Kedro and this one is about the project template.
Context
The Kedro project template is based on a template derived from CookieCutter Data Science. Some of our open-source users immediately picked up this relationship and have said that Kedro is a version of CookieCutter Data Science that thought about a pipeline framework, data abstraction, and versioning.
The project template is core to us being able to help you create reusable analytics code, according to the CookieCutter Data Science philosophy, but we've had feedback that the template is considered overwhelming for new users because they're not sure why we create so many directories. We've also observed users not using all of the template, or even removing generated folders in their templates.
Examples of directory removal are present here:
Possible Implementation
There's thought around removing non-essential folders and creating directories when certain actions are taken.
We're proposing the following categorization:
conf
conf
directory is the place where all your project configuration is located. Usingconf
encourages a clear and strict separation between project code and configuration.data
docs
docs
is where your auto-generated project documentation is savedkedro build-docs
is runlogs
kedro run
is runnotebooks
notebooks
is the folder where you can store your Jupyter Notebooksreferences
results
src
This would create the following template when you run
kedro new
:Questions for you
We need your help in answering the following:
The text was updated successfully, but these errors were encountered: