Skip to content

jagan-mathematics/Tensorflow-Extended-tutorial

Repository files navigation

code vulnarability test

Tensorflow-Extended-tutorial

tfx-logo

Model Centric

   In the model centric approach Data Scientist will stive to make the data fit their model through feature enginearing. First they will start with the base model. If their existing model fails they will develop new one that adequately address the problem. This type of Data Scientist approach is always be like keeping the data fixed after standard preprocessing and iteratively imporves the model to deal with the noise in the data.

Data Centric

   In the data centric approach Data Scientist will expose the data with the right analysis technique. They Highly inverst their time in ensuring the data quaity. Data Consistenct is a key here. They build complex visualizations to understand the data. This type of Data Scientist approach is always be like holding the code/algorithms fixed and interated the data quality.

In my perspective, to achieve a good AI solution there must be balance between model and the data quality. I too more conscious on data side. Andrew NG and his team prove the data quality is a key by show it with a experiment with real-world data.

The common practice amongst researchers is to hold the data fixed while trying to improve the code. But, when the dataset size is modest (<10,000 examples), Andrew Ng suggests ML teams will make faster progress, given the dataset is good.

Bellow table is an result of the expericent which proves why data centric aproach is better than model centric. If your model is already at its best the task to have it improved to achieve 90% accuracy sound almost impossible.

for the model centric, the improvements is based on Network Architecture search and using the state-of-the-art architectures, whereas, for the data centric, the approach taken was to identify inconsistencies and clean noisy labels. you can see that what data centric aproach does



Andrew Ng mentioned how everyone jokes about ML is 80% data preparation, but no one seems to care. A quick look at the arxiv would give an idea of the direction ML research is going. There is unprecedented competition around beating the benchmarks. If Google has BERT then OpenAI has GPT-3. But, these fancy models take up only 20% of a business problem.

Model Centric -> Data Centric
   Data is an main fuel for all type of machine learning model and it also an high stakes in AI developments, achieving high quality data is core here. meaning full data is not only scarce and noisy but also very expensice to be obtained. To achine data centric aproach there we need to feed our model complete, relevant, consistent and enough data. In a lot of the real-word problems, not much data is available and the more data we have moe noise it is present. we can counter it with right hyperparameters and model choice to achieve generalizable results. But better the quality of the data, the higher the probabilities of several models to do well.

In fact, MLOps is essential to connect the dots and take these steps to the next level while ensuring consistency, completeness and relevancy. The most important objective of the MLOps is to ensure a high-quality and consistent flow of data throughout all stages of a project.

How MLOps helps us to attain data centric approach?
   If the model in product has to give good result and get better over time, they need to be trained with high quality data and they has to be built and tuned in a continuous manner which ensures the consistent performance. MLOps will ensure the model consistency by repeated training with most relevant and recent data. It also helps to counter the training and serving skew.

There are a number of goals enterprises want to achieve through MLOps systems successfully implementing ML across the enterprise, including:

  • Deployment and automation
  • Reproducibility of models and predictions
  • Governance and regulatory
  • Scalability
  • Monitoring and management
Tensorflow Extended
   TFX is a Tensorflow Based Platform to host end to end Machine Learning Pipelines. TFX framework will used to prepare pipeline to clean data, train and serve production ready machone learning systems. TFX provides modular, flexible, collaborative, accessible and easy to use ML Ops Platform. Each TFX component allows proper storage, configuration, and orchestration of ML Models.
Orchestrators in TFX automates task executions and monitors TF components. One of the largest TFX Orchestrators is Apache Beam. Apache Beam is the unified batch and stream distributed API which acts as an abstraction layer to run on top of the distributed processing framework. This allows you to work on diverse backends such as Apache Spark, Local, Dataflow, etc.


In this repo we had given an wide range of idea on how to use each tfx components standalone and also as MLOps pipeline. All notebooks in this repo are depended to each other. Each notebook will expect the execution of previous one. Each notebook explained the standalone execution of component and orchestrate it using interactive context from tfx. we have used metadata store heavily to establish link between notebooks. Follow the below mentioned sequence:

There is a step to be taken for smooth learning:

step 1:
      clone the repo and create environment in the path [root_dir]/Tensorflow-Extended-tutorial
            pythom -m venv env
      Activate the evirnonment using command bellow
      if you are using windows:
            env\Scripts\activate
      if you are using linux based system:
            source env/bin/activate

step 2:
      install all required packages:
            pip install -r requirements.txt

step 3:
      For model training pipeline you need to download some pretrained model weights from here and extract it in the path
            [root_dir]/Tensorflow-Extended-tutorial/models
                        (or)
      you can dowload those thing on the fly by changing the value of the parameter in config.py file
            FILE PATH: [root_dir]/Tensorflow-Extended-tutorial/utils/configurations
      change line 15 as => UNIVERSAL_EMBEDDING_MODEL = "https://tfhub.dev/google/universal-sentence-encoder/4"

Everything done!! lets go!!


The sequence that you have to follow for better understand TFX is given bellow. The notebooks are created in the way one will be depend on previous one.

Releases

No releases published

Packages

No packages published