Using spaCy
Using spaCy¶
spaCy is a module for NLP is an open-source library that similar to gensim
. It has useful modules such as Displacy
. SpaCy is useful for NER as it has a different set of entity types and can label data different from nltk
. It has informal lagnuage corpura as well which is useful for chat and Tweets. spaCy is the fastest library, and is designed to perform real work, rather than research.
Import modules¶
There are two ways to import spacy models as follows:
Using spacy.load
¶
import spacy
nlp=spacy.load(`en`)
doc = nlp(u"This is a sentence.")
print([(w.text, w.pos_) for w in doc])
Importing as a module¶
!python -m spacy download en
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp(u"This is a sentence.")
print([(w.text, w.pos_) for w in doc])
spaCy can also implement models of other languages such as german, spanish, portugese, french, multi-language, etc. The main difference would be changing en
to de
, es
, pt
, fr
, etc.
Importing the french module¶
!python -m spacy download fr
import fr_core_news_sm
nlp = fr_core_news_sm.load()
doc = nlp(u"C'est une phrase.")
print([(w.text, w.pos_) for w in doc])
Importing the multi-language module¶
!python -m spacy download XX
import xx_ent_wiki_sm
nlp = xx_ent_wiki_sm.load()
doc = nlp(u"This is a line about Python")
print([(ent.text, ent.label) for ent in doc.ents])
In [30]:
# The '!' runs a command in the terminal. For some reason, windows is unable to perform the symlink that is required for using spacy.load, thus we import spacy as a module
!python -m spacy download en
import spacy
import os
import en_core_web_sm
Read file¶
In [31]:
currDir = os.getcwd()
inputDir = '\\inputs\\'
fileName = 'aeon.txt'
readFile = currDir + inputDir + fileName
f = open(readFile,'r')
doc_input = f.read()
f.close()
In [40]:
nlp = en_core_web_sm.load() # similar to gensim's corpus ans has pre-trained word vectors such that it can perform NER automatically.
doc = nlp(doc_input)
for ent in doc.ents[0:10]:
print(ent.label_,ent.text)
Creating list of tuple information
In [49]:
[(ent,ent.label_,) for ent in doc.ents[0:10]]
Out[49]:
In [46]:
sent = nlp(u'This is a sentence.')
from spacy import displacy
displacy.serve(sent,style='dep')
In [ ]:
Comments
Comments powered by Disqus