Introduction to Introduction to NLTK NLTK
Text Analytics Text Analytics Giuseppe Attardi Giuseppe Attardi Università di Pisa Università di Pisa
Installing NLTK Installing NLTK
Download and InstallDownload and Install
http://nltk.org/install.html
Download NLTK data Download NLTK data
>>> import nltk
>>> nltk.download()
Jupyter Notebook Jupyter Notebook
Register with your UniPi credentials to activate Register with your UniPi credentials to activate your free account for a G Suite at:
your free account for a G Suite at:
this page.
Astart your Jupyter Notebook here:Astart your Jupyter Notebook here:
https://attardi-4.di.unipiit:8000/
NLTK NLTK
NLTK NLTK
Suite of classes for several NLP tasksSuite of classes for several NLP tasks
Parsing, POS tagging, classifiers…Parsing, POS tagging, classifiers…
Several text processing utilities, corporaSeveral text processing utilities, corpora
Brown, Penn Treebank corpus…
Your data was divided into sentences using ‘punkt’
NLTK NLTK
Text materialText material
Raw text
Annotated Text
ToolsTools
Part of speech taggers
Semantic analysis
ResourcesResources
WordNet, Treebanks
Linguistic Tasks Linguistic Tasks
Part of Speech TaggingPart of Speech Tagging
ParsingParsing
Word NetWord Net
Named Entity Named Entity Recognition Recognition
Information RetrievalInformation Retrieval
Sentiment AnalysisSentiment Analysis
Document ClusteringDocument Clustering
Topic SegmentationTopic Segmentation
ClassificationClassification
AuthoringAuthoring
Machine TranslationMachine Translation
SummarizationSummarization
Information ExtractionInformation Extraction
Spoken Dialog SystemsSpoken Dialog Systems
Natural Language Natural Language Generation
Generation
Word Sense Word Sense
Disambiguation Disambiguation
‘ ‘ import nltk’ import nltk’
You will need to import the necessary modules to You will need to import the necessary modules to create objects and call member functions
create objects and call member functions
import ~ include objects from pre-built packages
FreqDist, ConditionalFreqDist are in FreqDist, ConditionalFreqDist are in nltk.probability
nltk.probability
PlaintextCorpusReader is in nltk.corpusPlaintextCorpusReader is in nltk.corpus
Basic NLTK usage Basic NLTK usage
Load the notebook ‘Intro to NLTK’ using:Load the notebook ‘Intro to NLTK’ using:
File > Open > Text Anaytics > Intro to NLTK
Explore the examples by advancing through them Explore the examples by advancing through them with the button ►
with the button ►
Exercise 1.
Exercise 1.
Run examples from Chapter 1 of NLTK book:Run examples from Chapter 1 of NLTK book:
http://nltk.googlecode.com/svn/trunk/doc/book/ch01.
html
Exercise 2.
Exercise 2.
Run examples from Chapter 3 of NLTK bookRun examples from Chapter 3 of NLTK book
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.
html