Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on the data storage management directory name #85

Open
stefaniuk opened this issue Jul 4, 2023 · 5 comments
Open

Decide on the data storage management directory name #85

stefaniuk opened this issue Jul 4, 2023 · 5 comments

Comments

@stefaniuk
Copy link
Contributor

stefaniuk commented Jul 4, 2023

There is no data storage technology endorsed by the repository template. There are options with trade-offs.

Some common directory names for the purpose of managing data storage creation, upgrades and migrations are:

  • database: Simple and straightforward name that clearly indicates that the directory contains database-related code.
  • db: This is a shorter version of the above. Feels lazy.
  • migrations: If the directory is specifically for managing database upgrades and migrations, this name could be more appropriate. It's commonly used in frameworks like Django and Rails, which have built-in database migration functionality.
  • sql: If the database code is primarily written in SQL, you might choose this name to indicate the nature of the code but in most cases this would be too narrow.
  • schema: This could be a good name if the directory contains just the schema definition for your database which is not the case for larger projects.
  • data: This name might be appropriate if the directory contains not only code but also data files, such as seed data for your database.
@regularfry
Copy link
Contributor

Can I suggest that this is devolved to technology-specific overlays? Just thinking about Rails and Django, those two alone do quite different things and moving the defaults would be a world of pain and sadness.

If it's a question of having a common interface into these things, might we want to think about having standard make tasks that wrap anything underlying provided by the framework, maybe?

@stefaniuk
Copy link
Contributor Author

The "technology-specific overlays" is certainly an option. It would be good to chat it through how we would see it working bearing in mind that increasing our range of options may also increase our maintenance burden.

One potential solution to manage this could be an automated repository synchronisation via GitHub actions. Alternatively, or in addition to that we could use a tool that merges these layers/overlays together to create a desired template that is applied to a new repository, similar to this one https://start.spring.io/ (this is just an example, as it has slightly different purpose)

@regularfry
Copy link
Contributor

The way I've done it in the past is to provide a CLI tool that Does The Right Thing(™️). That way the user-facing interface is in your control, not a third party's. It can call out to whatever repositories it needs to build the initial structure. The basic structure was a repository full of cookiecutter templates that got merged into a single output directory. That tool was also the entry-point into local docker setup and so on. An example session might look like this:

$ dev-tool new django my-new-project
# dev-tool forks the repo template, clones it locally, applies the django overlay, calls a `make bootstrap` task
$ cd my-new-project
$ dev-tool run # spin up all the things for a local instance to work
# now you do whatever goes into that initial commit and do the `git push`
$ dev-tool build # triggers the CI pipeline (possibly optional, depending on branching strategy)
$ dev-tool deploy --env=dev # this branch, running at a templated hostname
# beyond this point depends on how opinionated you want to be

Again, the benefit of doing it this way is that you can completely abstract away whether someone prefers github+aws or the azure stack other than a flag to dev-tool new.

Pulling the initial templates into the project is relatively painless: you start with an existing project, rip out anything that isn't infrastructure, apply naming conventions, and add make tasks as the interface the CLI tool can call. That helps with mindshare, too: if you start from somewhere people already know, the rest has some familiarity for free.

An interface like start.spring.io is also a good idea, though, because what I learned from doing all that is you always end up wanting a server-side component that can set up the right connections between github, CI, test tools, and so on, without the user needing admin-level credentials. But having access to the cookiecutter templates locally makes developing the tool much easier, and once you've got the CLI tool, assuming it's been developed sanely, adding a web interface isn't that hard.

Maintenance burden is definitely a thing, but it's also unavoidable. There's no one-size-fits-all for issues like this one so we either accept there's going to be a per-tech cost and standardise at the make task or shell script level, or we write off being able to standardise at all. Similarly if we want to provide common log formats and tracing middleware, or similar, that's per-library config that can't be done without per-tech, per-framework code. The best you can hope to do is get contributors from each tech community to help with the load.

@timrickwood
Copy link
Contributor

Side-stepping the technology-specific overlays conversation above (which feels like it could run for a while) and returning to the naming... This doesn't feel entirely straightforward. I might have prefered datastore over database. Is there any reason why some of the others could be optional directories nested under that ?

@regularfry
Copy link
Contributor

The reason you need the overlays is that it's not something you can sidestep. Again, Rails vs Django as an example:

Proposal Rails location Django location
database ./db/ N/A
migrations ./db/migrations/ ./<app_name>/migrations/
sql ./db/structure.sql it's complicated
schema ./db/schema.rb or ./db/migrations/, depending on what you mean ./<app_name>/models.py
data ./db/seeds.rb ./<app_name>/migrations 1

The difficulty is that because frameworks differ in how much control they assume over your database schema, they also differ in how coupled they are to a specific directory layout. In Django's case, because it generates migrations from models directly, you can't realistically detach the database definition from where the domain logic is defined. It's all under that top-level ./<app_name> bit. It's historically been easier (or, at least, doable) to reorganise Rails a bit, but in the long run it's a bad plan to deviate from the framework defaults.

I just don't think directory structure organisation standardisation can fly as an up-front specification for this specific case. I think it's more tractable to ask what are the operations we want to perform on the database and provide those operations as top-level commands, leaving the specific implementation of those commands to the frameworks.

Stealing wholesale from Rails (and deleting some of the more boring ones), a starting point would be:

  db:create
  db:drop
  db:fixtures:load
  db:migrate
  db:migrate:down
  db:migrate:redo
  db:migrate:status
  db:migrate:up
  db:prepare
  db:reset
  db:rollback
  db:schema:dump
  db:schema:load
  db:seed
  db:seed:replant
  db:setup
  db:structure:dump
  db:structure:load
  db:version

For projects that aren't on a framework which does those things for them, it's not beyond the bounds of possibility that we could knock up some sort of wrapper for alembic that gets them off the ground (other tools are also available), and that wrapper could use the directory structure above as its own living standard. That's where database vs datastore would be an interesting conversation. But like I say, if we're living in a world where frameworks that manage data for you are a thing - and we should! they're good! - we can't be prescriptive about this particular aspect of directory layout.

Footnotes

  1. there are other ways to do this, but that's the easiest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

3 participants