Basics of NLP in Python Using spaCy Library
-
Capability of a Computer to Understand Human Language Or SPeech thereby Interpeting it and doing the required Task.
-
NLP is a Subset of AI - Works on Enabling Computer Systems to Effectively Understand and Process Natural Human Language by developing Computer Programs to Analyze and Process Massive amounts of Natural Language Data.
-
Mainly concerned with Gaining Useful Knowledge and Insights from the Raw Textual Data.
-
NLU : Natural Language Understanding
-
NLG : Natural Language Generation
- Focuses on Understanding anf Observing the Meaning of given User Input. Also Involves in Classifying it into Proper Intents and Entity.
-
Intents : Refers to the Verbs of the User Activities. Mainly used when we want to capture a user Request, or Perform any Action.
-
Entity : Refers to Noun or the Content for which that Action(or Intent) is Performed.
- Example : Play(Intent) Kishore Kumar Songs(Entity).
- Reading Aspect of NLP.
- Mapping the given Input into Variable and Usable Representation.
- Analysis of Different Language aspects based on Grammar.
- Sentiment Analysis : Describing Positive and Negative Indexes.
- Topic Classification : Tweet, Message or an E-Mail Type Detection.
- Entity Detection : Detecting Nouns in a Text such as Locations, Games or Names, Etc.
-
Alexa
-
Siri
-
Google Assistant
-
Described as Translator that Converts the Structured Computer Data into the Corresponding Natural language Representation.
-
Involves the Task of Content or Text Planning, Lexicalization, Sentence Planning, Aggregation, Text Realization.
-
Involves in Establishing Natural Language Outputs from Non-Linguistic Inputs. Deals with Process of Generating Language or Writing Aspect of NLP.
-
Generating Analytical Reports in Natural Language like English, Spanish, Etc.
-
Enabling Chatbots to interact in a more Efficient way.
-
Automated Contents Writing such as Aticles, Stories, Etc.
- Automated Question - Answering
- ChatBots
- Sentiment Analysis
- Spam Message Detection
- Machine Translation (e.g Google Translator)
- Spelling and Grammar Correction
- Speech-To-Text Conversion & Speech Recognition
- Information Retrieval & Web Searching
-
spaCy
-
scikit-learn
-
Natural language Toolkit(NLTK)
-
Textblob
-
Quepy
-
Token is Defined as each and Every Word along with Punctuations and Symbols present in a Sentence.
-
Example : "Hello! Welcome to my Github Profile." Here Every Word and Punctuation is Defined as Tokens.
Token Attributes
Tag | Description |
---|---|
.text | The Original Word Text |
.lemma_ | The Base form of the Word |
.pos_ | the Simple Part-Of-Speech |
.shape_ | The Word Shape - Capitalizations, Punctuation, Digits. |
.dep_ | The Syntactic Dependency - i.e The Relation Between the Tokens. |
.is_alpha | Is the Token an Alpha Character? |
.is_stop | Is the Token a Part of Stop list? Is it the Most Common Words of the Language? |
.tag_ | Detailed Part-Of-Speech |
-
Anaconda Navigator - Download Anaconda Navigator Individual Edition. Click Here to Download. Choose Python 3.7
-
Install spaCy : Follow the Below Steps:
-
Run Anaconda Command Prompt as Administrator
-
Type
conda install -c conda-forge spacy
or
pip install -U spacy
-
A list of Packages will be Displayed and Asks our Approval to Proceed Further. Just type
y
and pressEnter
-
Now Download the Specific Model you want, Based on English language
Type the Command python -m spacy download en
for Default English Model(50MB
).
For Large Library Use:
Type the Command python -m download en_core_web_lg
Then
Type the Command python -m spacy link en_core_web_lg en1
to link the Language model with "en1"
- You are Good to Go once you see the
You can now load the model via spacy.load('en1')
in the Anaconda Command Prompt.
Now we can Load the Large Library Model using "en1".
- Library Used : spaCy
Name | File |
---|---|
NLP Basic | NLP_spaCy_Basics.ipynb |
** Will be Updating it Further. Keep Checking! **