Staging to master to add the latest fixes (#503)

* update mlflow version to match the other azureml versions * Update generate_conda_file.py * added temporary * doc: update github url references * docs: update nlp recipes references * Minor bug fix for text classification of multi languages notebook * remove bert and xlnet notebooks * remove obsolete tests and links * Add missing tmp directories. * fix import error and max_nodes for the cluster * Minor edits. * Attempt to fix test device error. * Temporarily pin transformers version * Remove gpu tags temporarily * Test whether device error also occurs for SequenceClassifier. * Revert temporary changes. * Revert temporary changes.
microsoft · Nov 30, 2019 · ed04438 · ed04438
1 parent 967abcd
commit ed04438
Show file tree

Hide file tree

Showing 22 changed files with 125 additions and 3,199 deletions.
diff --git a/README.md b/README.md
@@ -85,6 +85,8 @@ The following is a list of related repositories that we like and think are usefu
 |[AzureML-BERT](https://github.com/Microsoft/AzureML-BERT)|End-to-end recipes for pre-training and fine-tuning BERT using Azure Machine Learning service.|
 |[MASS](https://github.com/microsoft/MASS)|MASS: Masked Sequence to Sequence Pre-training for Language Generation.|
 |[MT-DNN](https://github.com/namisan/mt-dnn)|Multi-Task Deep Neural Networks for Natural Language Understanding.|
+|[UniLM](https://github.com/microsoft/unilm)|Unified Language Model Pre-training.|
+
 
 
 ## Build Status

diff --git a/SETUP.md b/SETUP.md
@@ -47,9 +47,9 @@ You can learn how to create a Notebook VM [here](https://docs.microsoft.com/en-u
 We provide a script, [generate_conda_file.py](tools/generate_conda_file.py), to generate a conda-environment yaml file
 which you can use to create the target environment using the Python version 3.6 with all the correct dependencies.
 
-Assuming the repo is cloned as `nlp` in the system, to install **a default (Python CPU) environment**:
+Assuming the repo is cloned as `nlp-recipes` in the system, to install **a default (Python CPU) environment**:
 
- cd nlp
+ cd nlp-recipes
  python tools/generate_conda_file.py
  conda env create -f nlp_cpu.yaml
 
@@ -62,7 +62,7 @@ Click on the following menus to see how to install the Python GPU environment:
 
 Assuming that you have a GPU machine, to install the Python GPU environment, which by default installs the CPU environment:
 
- cd nlp
+ cd nlp-recipes
  python tools/generate_conda_file.py --gpu
  conda env create -n nlp_gpu -f nlp_gpu.yaml
 
@@ -79,7 +79,7 @@ Assuming that you have an Azure GPU DSVM machine, here are the steps to setup th
 
 2. Install the GPU environment.
 
-  cd nlp
+  cd nlp-recipes
   python tools/generate_conda_file.py --gpu
   conda env create -n nlp_gpu -f nlp_gpu.yaml
 
@@ -110,7 +110,7 @@ Running the command tells pip to install the `utils_nlp` package from source in
 
 > It is also possible to install directly from Github, which is the best way to utilize the `utils_nlp` package in external projects (while still reflecting updates to the source as it's installed as an editable `'-e'` package). 
 
->  `pip install -e [email protected]:microsoft/nlp.git@master#egg=utils_nlp` 
+>  `pip install -e [email protected]:microsoft/nlp-recipes.git@master#egg=utils_nlp` 
 
 Either command, from above, makes `utils_nlp` available in your conda virtual environment. You can verify it was properly installed by running: 
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -34,7 +34,7 @@
 # The full version, including alpha/beta/rc tags
 release = VERSION
 
-prefix = "NLP"
+prefix = "NLPRecipes"
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -2,9 +2,9 @@
 NLP Utilities
 ===================================================
 
-The `NLP repository <https://github.com/Microsoft/NLP>`_ provides examples and best practices for building NLP systems, provided as Jupyter notebooks. 
+The `NLP repository <https://github.com/microsoft/nlp-recipes>`_ provides examples and best practices for building NLP systems, provided as Jupyter notebooks. 
 
-The module `utils_nlp <https://github.com/microsoft/nlp/tree/master/utils_nlp>`_ contains functions to simplify common tasks used when developing and 
+The module `utils_nlp <https://github.com/microsoft/nlp-recipes/tree/master/utils_nlp>`_ contains functions to simplify common tasks used when developing and 
 evaluating NLP systems. 
 
 .. toctree::

diff --git a/examples/entailment/entailment_xnli_bert_azureml.ipynb b/examples/entailment/entailment_xnli_bert_azureml.ipynb
@@ -45,7 +45,7 @@
  "from azureml.core.runconfig import MpiConfiguration\n",
  "from azureml.core import Experiment\n",
  "from azureml.widgets import RunDetails\n",
- "from azureml.core.compute import ComputeTarget\n",
+ "from azureml.core.compute import ComputeTarget, AmlCompute\n",
  "from azureml.exceptions import ComputeTargetException\n",
  "from utils_nlp.azureml.azureml_utils import get_or_create_workspace, get_output_files"
  ]
@@ -169,7 +169,7 @@
  "except ComputeTargetException:\n",
  " print(\"Creating new compute target: {}\".format(cluster_name))\n",
  " compute_config = AmlCompute.provisioning_configuration(\n",
- " vm_size=\"STANDARD_NC6\", max_nodes=1\n",
+ " vm_size=\"STANDARD_NC6\", max_nodes=NODE_COUNT\n",
  " )\n",
  " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
  " compute_target.wait_for_completion(show_output=True)\n",
@@ -524,9 +524,9 @@
  "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python (nlp_gpu_transformer_bug_bash)",
  "language": "python",
- "name": "python3"
+ "name": "nlp_gpu_transformer_bug_bash"
  },
  "language_info": {
  "codemirror_mode": {

diff --git a/examples/question_answering/question_answering_system_bidaf_quickstart.ipynb b/examples/question_answering/question_answering_system_bidaf_quickstart.ipynb
@@ -175,7 +175,7 @@
  "metadata": {},
  "source": [
  "This step downloads the pre-trained [AllenNLP](https://allennlp.org/models) pretrained model and registers the model in our Workspace. The pre-trained AllenNLP model we use is called Bidirectional Attention Flow for Machine Comprehension ([BiDAF](https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02\n",
- ")) It achieved state-of-the-art performance on the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) dataset in 2017 and is a well-respected, performant baseline for QA. AllenNLP's pre-trained BIDAF model is trained on the SQuAD training set and achieves an EM score of 68.3 on the SQuAD development set. See the [BIDAF deep dive notebook](https://github.com/microsoft/nlp/examples/question_answering/bidaf_deep_dive.ipynb\n",
+ ")) It achieved state-of-the-art performance on the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) dataset in 2017 and is a well-respected, performant baseline for QA. AllenNLP's pre-trained BIDAF model is trained on the SQuAD training set and achieves an EM score of 68.3 on the SQuAD development set. See the [BIDAF deep dive notebook](https://github.com/microsoft/nlp-recipes/examples/question_answering/bidaf_deep_dive.ipynb\n",
  ") for more information on this algorithm and AllenNLP implementation."
  ]
  },

diff --git a/examples/text_classification/README.md b/examples/text_classification/README.md
@@ -19,8 +19,5 @@ The following summarizes each notebook for Text Classification. Each notebook pr
 |Notebook|Environment|Description|Dataset|
 |---|---|---|---|
 |[BERT for text classification on AzureML](tc_bert_azureml.ipynb) |Azure ML|A notebook which walks through fine-tuning and evaluating pre-trained BERT model on a distributed setup with AzureML. |[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)|
-|[XLNet for text classification with MNLI](tc_mnli_xlnet.ipynb)|Local| A notebook which walks through fine-tuning and evaluating a pre-trained XLNet model on a subset of the MultiNLI dataset|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)|
-|[BERT for text classification of Hindi BBC News](tc_bbc_bert_hi.ipynb)|Local| A notebook which walks through fine-tuning and evaluating a pre-trained BERT model on Hindi BBC news data|[BBC Hindi News](https://github.com/NirantK/hindi2vec/releases/tag/bbc-hindi-v0.1)|
-|[BERT for text classification of Arabic News](tc_dac_bert_ar.ipynb)|Local| A notebook which walks through fine-tuning and evaluating a pre-trained BERT model on Arabic news articles|[DAC](https://data.mendeley.com/datasets/v524p5dhpj/2)|
 |[Text Classification of MultiNLI Sentences using Multiple Transformer Models](tc_mnli_transformers.ipynb)|Local| A notebook which walks through fine-tuning and evaluating a number of pre-trained transformer models|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/)|
 |[Text Classification of Multi Language Datasets using Transformer Model](tc_multi_languages_transformers.ipynb)|Local|A notebook which walks through fine-tuning and evaluating a pre-trained transformer model for multiple datasets in different language|[MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) <br> [BBC Hindi News](https://github.com/NirantK/hindi2vec/releases/tag/bbc-hindi-v0.1) <br> [DAC](https://data.mendeley.com/datasets/v524p5dhpj/2)