Details
Title | Community experience distilled. — Natural language processing with Java: techniques for building machine learning and neural network models for NLP. — Second edition. |
---|---|
Creators | Reese Richard Martin |
Other creators | Bhatia AshishSingh |
Collection | Электронные книги зарубежных издательств ; Общая коллекция |
Subjects | Natural language processing (Computer science) ; Java (Computer program language) ; Machine learning. ; Neural networks (Computer science) ; COMPUTERS / General. ; EBSCO eBooks |
Document type | Other |
File type | |
Language | English |
Rights | Доступ по паролю из сети Интернет (чтение, печать, копирование) |
Record key | on1050170091 |
Record create date | 8/29/2018 |
Allowed Actions
pdf/1862376.pdf | – |
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
|
---|---|---|
epub/1862376.epub | – |
Action 'Download' will be available if you login or access site from another network
|
Group | Anonymous |
---|---|
Network | Internet |
Network | User group | Action |
---|---|---|
ILC SPbPU Local Network | All |
|
Internet | Authorized users SPbPU |
|
Internet | Anonymous |
|
- Cover
- Title Page
- Copyright and Credits
- Dedication
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Introduction to NLP
- What is NLP?
- Why use NLP?
- Why is NLP so hard?
- Survey of NLP tools
- Apache OpenNLP
- Stanford NLP
- LingPipe
- GATE
- UIMA
- Apache Lucene Core
- Deep learning for Java
- Overview of text-processing tasks
- Finding parts of text
- Finding sentences
- Feature-engineering
- Finding people and things
- Detecting parts of speech
- Classifying text and documents
- Extracting relationships
- Using combined approaches
- Understanding NLP models
- Identifying the task
- Selecting a model
- Building and training the model
- Verifying the model
- Using the model
- Preparing data
- Summary
- Chapter 2: Finding Parts of Text
- Understanding the parts of text
- What is tokenization?
- Uses of tokenizers
- Simple Java tokenizers
- Using the Scanner class
- Specifying the delimiter
- Using the split method
- Using the BreakIterator class
- Using the StreamTokenizer class
- Using the StringTokenizer class
- Performance considerations with Java core tokenization
- Using the Scanner class
- NLP tokenizer APIs
- Using the OpenNLPTokenizer class
- Using the SimpleTokenizer class
- Using the WhitespaceTokenizer class
- Using the TokenizerME class
- Using the Stanford tokenizer
- Using the PTBTokenizer class
- Using the DocumentPreprocessor class
- Using a pipeline
- Using LingPipe tokenizers
- Training a tokenizer to find parts of text
- Comparing tokenizers
- Using the OpenNLPTokenizer class
- Understanding normalization
- Converting to lowercase
- Removing stopwords
- Creating a StopWords class
- Using LingPipe to remove stopwords
- Using stemming
- Using the Porter Stemmer
- Stemming with LingPipe
- Using lemmatization
- Using the StanfordLemmatizer class
- Using lemmatization in OpenNLP
- Normalizing using a pipeline
- Summary
- Chapter 3: Finding Sentences
- The SBD process
- What makes SBD difficult?
- Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
- Simple Java SBDs
- Using regular expressions
- Using the BreakIterator class
- Using NLP APIs
- Using OpenNLP
- Using the SentenceDetectorME class
- Using the sentPosDetect method
- Using the Stanford API
- Using the PTBTokenizer class
- Using the DocumentPreprocessor class
- Using the StanfordCoreNLP class
- Using LingPipe
- Using the IndoEuropeanSentenceModel class
- Using the SentenceChunker class
- Using the MedlineSentenceModel class
- Using OpenNLP
- Training a sentence-detector model
- Using the Trained model
- Evaluating the model using the SentenceDetectorEvaluator class
- Summary
- Chapter 4: Finding People and Things
- Why is NER difficult?
- Techniques for name recognition
- Lists and regular expressions
- Statistical classifiers
- Using regular expressions for NER
- Using Java's regular expressions to find entities
- Using the RegExChunker class of LingPipe
- Using NLP APIs
- Using OpenNLP for NER
- Determining the accuracy of the entity
- Using other entity types
- Processing multiple entity types
- Using the Stanford API for NER
- Using LingPipe for NER
- Using LingPipe's named entity models
- Using the ExactDictionaryChunker class
- Using OpenNLP for NER
- Building a new dataset with the NER annotation tool
- Training a model
- Evaluating a model
- Summary
- Chapter 5: Detecting Part of Speech
- The tagging process
- The importance of POS taggers
- What makes POS difficult?
- Using the NLP APIs
- Using OpenNLP POS taggers
- Using the OpenNLP POSTaggerME class for POS taggers
- Using OpenNLP chunking
- Using the POSDictionary class
- Obtaining the tag dictionary for a tagger
- Determining a word's tags
- Changing a word's tags
- Adding a new tag dictionary
- Creating a dictionary from a file
- Using Stanford POS taggers
- Using Stanford MaxentTagger
- Using the MaxentTagger class to tag textese
- Using the Stanford pipeline to perform tagging
- Using LingPipe POS taggers
- Using the HmmDecoder class with Best_First tags
- Using the HmmDecoder class with NBest tags
- Determining tag confidence with the HmmDecoder class
- Training the OpenNLP POSModel
- Using OpenNLP POS taggers
- Summary
- The tagging process
- Chapter 6: Representing Text with Features
- N-grams
- Word embedding
- GloVe
- Word2vec
- Dimensionality reduction
- Principle component analysis
- Distributed stochastic neighbor embedding
- Summary
- Chapter 7: Information Retrieval
- Boolean retrieval
- Dictionaries and tolerant retrieval
- Wildcard queries
- Spelling correction
- Soundex
- Vector space model
- Scoring and term weighting
- Inverse document frequency
- TF-IDF weighting
- Evaluation of information retrieval systems
- Summary
- Chapter 8: Classifying Texts and Documents
- How classification is used
- Understanding sentiment analysis
- Text-classifying techniques
- Using APIs to classify text
- Using OpenNLP
- Training an OpenNLP classification model
- Using DocumentCategorizerME to classify text
- Using the Stanford API
- Using the ColumnDataClassifier class for classification
- Using the Stanford pipeline to perform sentiment analysis
- Using LingPipe to classify text
- Training text using the Classified class
- Using other training categories
- Classifying text using LingPipe
- Sentiment analysis using LingPipe
- Language identification using LingPipe
- Using OpenNLP
- Summary
- Chapter 9: Topic Modeling
- What is topic modeling?
- The basics of LDA
- Topic modeling with MALLET
- Training
- Evaluation
- Summary
- Chapter 10: Using Parsers to Extract Relationships
- Relationship types
- Understanding parse trees
- Using extracted relationships
- Extracting relationships
- Using NLP APIs
- Using OpenNLP
- Using the Stanford API
- Using the LexicalizedParser class
- Using the TreePrint class
- Finding word dependencies using the GrammaticalStructure class
- Finding coreference resolution entities
- Extracting relationships for a question-answer system
- Finding the word dependencies
- Determining the question type
- Searching for the answer
- Summary
- Chapter 11: Combined Pipeline
- Preparing data
- Using boilerpipe to extract text from HTML
- Using POI to extract text from Word documents
- Using PDFBox to extract text from PDF documents
- Using Apache Tika for content analysis and extraction
- Pipelines
- Using the Stanford pipeline
- Using multiple cores with the Stanford pipeline
- Creating a pipeline to search text
- Summary
- Chapter 12: Creating a Chatbot
- Chatbot architecture
- Artificial Linguistic Internet Computer Entity
- Understanding AIML
- Developing a chatbot using ALICE and AIML
- Summary
- Other Books You May Enjoy
- Index