Skip to content

Assignment Submission Guidelines

Chinta Geetha Charan Reddy edited this page Oct 8, 2021 · 1 revision

Get Started:

  • Fork the repository to utilize the template
  • Create a branch with name as follows: {your_name}-{data_product_name}
  • Rename the root folder for the data product according to the name of the data source
  • Add your scripts for individual steps in an appropriate folder.
  • Any additional utility scripts should be added to the utils folder.

Writing Scripts for individual blocks:

  • Follow this general template for writing abstraction classes for every step
# import your dependencies here


class MyClass:
    def __init__(self, **kwargs):
        self.config = kwargs.get("config")

    def do_something(self):
        """Do extraction or processing of data here"""
        return None

    def load_data(self):
        """Function to load data"""
        return None

    def save_data(self):
        """Function to save data"""
        return None

    def run(self):
        """Load data, do_something and finally save the data"""
        return None


if __name__ == "__main__":
    config = {}
    obj = MyClass(config = config)
    obj.run()

Notes:

The class can have other helper functions if required

Script formatting and linting:

Use black formatter

Use flake8 linter

  • A settings.json is given in .vscode directory

Submitting the generated data:

  • The data can be submitted in one of the formats - CSV or JSON
  • The intermediate and final standardized data can be submitted in the following manner
    • Create a data directory outside of the root folder for the data product and push the data there
    • Use Git LFS(Git large file storage) to commit those files

Submitting all the deliverables:

  • Push the relative scripts to your branch
  • When done working with the data product create a PULL request to the head repository from your branch

Evaluation

  • The assignment will be evaluated based upon the following criteria