Details
Title | Python machine learning by example: easy-to-follow examples that get you up and running with machine learning. — Second edition. |
---|---|
Creators | Liu Yuxi (Hayden |
Imprint | Birmingham, UK: Packt Publishing, 2019 |
Collection | Электронные книги зарубежных издательств ; Общая коллекция |
Subjects | Python (Computer program language) ; Machine learning. ; COMPUTERS / Programming Languages / Python. ; COMPUTERS / Data Processing. ; COMPUTERS / Databases / Data Mining. ; EBSCO eBooks |
Document type | Other |
File type | |
Language | English |
Rights | Доступ по паролю из сети Интернет (чтение, печать, копирование) |
Record key | on1089574604 |
Record create date | 3/14/2019 |
Allowed Actions
pdf/2037541.pdf | – |
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
|
---|---|---|
epub/2037541.epub | – |
Action 'Download' will be available if you login or access site from another network
|
Group | Anonymous |
---|---|
Network | Internet |
Network | User group | Action |
---|---|---|
ILC SPbPU Local Network | All |
|
Internet | Authorized users SPbPU |
|
Internet | Anonymous |
|
- Cover
- Title Page
- Copyright and Credits
- About Packt
- Dedication
- Foreword
- Contributors
- Table of Contents
- Preface
- Section 1: Fundamentals of Machine Learning
- Getting Started with Machine Learning and Python
- A very high-level overview of machine learning technology
- Types of machine learning tasks
- A brief history of the development of machine learning algorithms
- Core of machine learning – generalizing with data
- Overfitting, underfitting, and the bias-variance trade-off
- Avoiding overfitting with cross-validation
- Avoiding overfitting with regularization
- Avoiding overfitting with feature selection and dimensionality reduction
- Preprocessing, exploration, and feature engineering
- Missing values
- Label encoding
- One hot encoding
- Scaling
- Polynomial features
- Power transform
- Binning
- Combining models
- Voting and averaging
- Bagging
- Boosting
- Stacking
- Installing software and setting up
- Setting up Python and environments
- Installing the various packages
- NumPy
- SciPy
- Pandas
- Scikit-learn
- TensorFlow
- Summary
- Exercises
- A very high-level overview of machine learning technology
- Section 2: Practical Python Machine Learning By Example
- Exploring the 20 Newsgroups Dataset with Text Analysis Techniques
- How computers understand language - NLP
- Picking up NLP basics while touring popular NLP libraries
- Corpus
- Tokenization
- PoS tagging
- Named-entity recognition
- Stemming and lemmatization
- Semantics and topic modeling
- Getting the newsgroups data
- Exploring the newsgroups data
- Thinking about features for text data
- Counting the occurrence of each word token
- Text preprocessing
- Dropping stop words
- Stemming and lemmatizing words
- Visualizing the newsgroups data with t-SNE
- What is dimensionality reduction?
- t-SNE for dimensionality reduction
- Summary
- Exercises
- Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms
- Learning without guidance – unsupervised learning
- Clustering newsgroups data using k-means
- How does k-means clustering work?
- Implementing k-means from scratch
- Implementing k-means with scikit-learn
- Choosing the value of k
- Clustering newsgroups data using k-means
- Discovering underlying topics in newsgroups
- Topic modeling using NMF
- Topic modeling using LDA
- Summary
- Exercises
- Detecting Spam Email with Naive Bayes
- Getting started with classification
- Types of classification
- Applications of text classification
- Exploring Naïve Bayes
- Learning Bayes' theorem by examples
- The mechanics of Naïve Bayes
- Implementing Naïve Bayes from scratch
- Implementing Naïve Bayes with scikit-learn
- Classification performance evaluation
- Model tuning and cross-validation
- Summary
- Exercise
- Getting started with classification
- Classifying Newsgroup Topics with Support Vector Machines
- Finding separating boundary with support vector machines
- Understanding how SVM works through different use cases
- Case 1 – identifying a separating hyperplane
- Case 2 – determining the optimal hyperplane
- Case 3 – handling outliers
- Implementing SVM
- Case 4 – dealing with more than two classes
- The kernels of SVM
- Case 5 – solving linearly non-separable problems
- Choosing between linear and RBF kernels
- Understanding how SVM works through different use cases
- Classifying newsgroup topics with SVMs
- More example – fetal state classification on cardiotocography
- A further example – breast cancer classification using SVM with TensorFlow
- Summary
- Exercise
- Finding separating boundary with support vector machines
- Predicting Online Ad Click-Through with Tree-Based Algorithms
- Brief overview of advertising click-through prediction
- Getting started with two types of data – numerical and categorical
- Exploring decision tree from root to leaves
- Constructing a decision tree
- The metrics for measuring a split
- Implementing a decision tree from scratch
- Predicting ad click-through with decision tree
- Ensembling decision trees – random forest
- Implementing random forest using TensorFlow
- Summary
- Exercise
- Predicting Online Ad Click-Through with Logistic Regression
- Converting categorical features to numerical – one-hot encoding and ordinal encoding
- Classifying data with logistic regression
- Getting started with the logistic function
- Jumping from the logistic function to logistic regression
- Training a logistic regression model
- Training a logistic regression model using gradient descent
- Predicting ad click-through with logistic regression using gradient descent
- Training a logistic regression model using stochastic gradient descent
- Training a logistic regression model with regularization
- Training on large datasets with online learning
- Handling multiclass classification
- Implementing logistic regression using TensorFlow
- Feature selection using random forest
- Summary
- Exercises
- Scaling Up Prediction to Terabyte Click Logs
- Learning the essentials of Apache Spark
- Breaking down Spark
- Installing Spark
- Launching and deploying Spark programs
- Programming in PySpark
- Learning on massive click logs with Spark
- Loading click logs
- Splitting and caching the data
- One-hot encoding categorical features
- Training and testing a logistic regression model
- Feature engineering on categorical variables with Spark
- Hashing categorical features
- Combining multiple variables – feature interaction
- Summary
- Exercises
- Learning the essentials of Apache Spark
- Stock Price Prediction with Regression Algorithms
- Brief overview of the stock market and stock prices
- What is regression?
- Mining stock price data
- Getting started with feature engineering
- Acquiring data and generating features
- Estimating with linear regression
- How does linear regression work?
- Implementing linear regression
- Estimating with decision tree regression
- Transitioning from classification trees to regression trees
- Implementing decision tree regression
- Implementing regression forest
- Estimating with support vector regression
- Implementing SVR
- Estimating with neural networks
- Demystifying neural networks
- Implementing neural networks
- Evaluating regression performance
- Predicting stock price with four regression algorithms
- Summary
- Exercise
- Section 3: Python Machine Learning Best Practices
- Machine Learning Best Practices
- Machine learning solution workflow
- Best practices in the data preparation stage
- Best practice 1 – completely understanding the project goal
- Best practice 2 – collecting all fields that are relevant
- Best practice 3 – maintaining the consistency of field values
- Best practice 4 – dealing with missing data
- Best practice 5 – storing large-scale data
- Best practices in the training sets generation stage
- Best practice 6 – identifying categorical features with numerical values
- Best practice 7 – deciding on whether or not to encode categorical features
- Best practice 8 – deciding on whether or not to select features, and if so, how to do so
- Best practice 9 – deciding on whether or not to reduce dimensionality, and if so, how to do so
- Best practice 10 – deciding on whether or not to rescale features
- Best practice 11 – performing feature engineering with domain expertise
- Best practice 12 – performing feature engineering without domain expertise
- Best practice 13 – documenting how each feature is generated
- Best practice 14 – extracting features from text data
- Best practices in the model training, evaluation, and selection stage
- Best practice 15 – choosing the right algorithm(s) to start with
- Naïve Bayes
- Logistic regression
- SVM
- Random forest (or decision tree)
- Neural networks
- Best practice 16 – reducing overfitting
- Best practice 17 – diagnosing overfitting and underfitting
- Best practice 18 – modeling on large-scale datasets
- Best practice 15 – choosing the right algorithm(s) to start with
- Best practices in the deployment and monitoring stage
- Best practice 19 – saving, loading, and reusing models
- Best practice 20 – monitoring model performance
- Best practice 21 – updating models regularly
- Summary
- Exercises
- Other Books You May Enjoy
- Index