Название: Python machine learning by example: easy-to-follow examples that get you up and running with machine learning. — Second edition.
Авторы: Liu Yuxi (Hayden
Выходные сведения: Birmingham, UK: Packt Publishing, 2019
Коллекция: Электронные книги зарубежных издательств; Общая коллекция
Тематика: Python (Computer program language); Machine learning.; COMPUTERS / Programming Languages / Python.; COMPUTERS / Data Processing.; COMPUTERS / Databases / Data Mining.; EBSCO eBooks
Тип документа: Другой
Тип файла: PDF
Язык: Английский
Права доступа: Доступ по паролю из сети Интернет (чтение, печать, копирование)
  • Cover
  • Title Page
  • Copyright and Credits
  • About Packt
  • Dedication
  • Foreword
  • Contributors
  • Table of Contents
  • Preface
  • Section 1: Fundamentals of Machine Learning
  • Getting Started with Machine Learning and Python
    • A very high-level overview of machine learning technology
      • Types of machine learning tasks
      • A brief history of the development of machine learning algorithms
    • Core of machine learning – generalizing with data
      • Overfitting, underfitting, and the bias-variance trade-off
      • Avoiding overfitting with cross-validation
      • Avoiding overfitting with regularization
      • Avoiding overfitting with feature selection and dimensionality reduction
    • Preprocessing, exploration, and feature engineering
      • Missing values
      • Label encoding
      • One hot encoding
      • Scaling
      • Polynomial features
      • Power transform
      • Binning
    • Combining models
      • Voting and averaging
      • Bagging
      • Boosting
      • Stacking
    • Installing software and setting up
      • Setting up Python and environments
      • Installing the various packages
        • NumPy
        • SciPy
        • Pandas
        • Scikit-learn
        • TensorFlow
    • Summary
    • Exercises
  • Section 2: Practical Python Machine Learning By Example
  • Exploring the 20 Newsgroups Dataset with Text Analysis Techniques
    • How computers understand language - NLP
    • Picking up NLP basics while touring popular NLP libraries
      • Corpus
      • Tokenization
      • PoS tagging
      • Named-entity recognition
      • Stemming and lemmatization
      • Semantics and topic modeling
    • Getting the newsgroups data
    • Exploring the newsgroups data
    • Thinking about features for text data
      • Counting the occurrence of each word token
      • Text preprocessing
      • Dropping stop words
      • Stemming and lemmatizing words
    • Visualizing the newsgroups data with t-SNE
      • What is dimensionality reduction?
      • t-SNE for dimensionality reduction
    • Summary
    • Exercises
  • Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms
    • Learning without guidance – unsupervised learning
    • Clustering newsgroups data using k-means
      • How does k-means clustering work?
      • Implementing k-means from scratch
      • Implementing k-means with scikit-learn
      • Choosing the value of k
      • Clustering newsgroups data using k-means
    • Discovering underlying topics in newsgroups
    • Topic modeling using NMF
    • Topic modeling using LDA
    • Summary
    • Exercises
  • Detecting Spam Email with Naive Bayes
    • Getting started with classification
      • Types of classification
      • Applications of text classification
    • Exploring Naïve Bayes
      • Learning Bayes' theorem by examples
      • The mechanics of Naïve Bayes
      • Implementing Naïve Bayes from scratch
      • Implementing Naïve Bayes with scikit-learn
    • Classification performance evaluation
    • Model tuning and cross-validation
    • Summary
    • Exercise
  • Classifying Newsgroup Topics with Support Vector Machines
    • Finding separating boundary with support vector machines
      • Understanding how SVM works through different use cases
        • Case 1 – identifying a separating hyperplane
        • Case 2 – determining the optimal hyperplane
        • Case 3 – handling outliers
      • Implementing SVM
        • Case 4 – dealing with more than two classes
      • The kernels of SVM
        • Case 5 – solving linearly non-separable problems
      • Choosing between linear and RBF kernels
    • Classifying newsgroup topics with SVMs
    • More example – fetal state classification on cardiotocography
    • A further example – breast cancer classification using SVM with TensorFlow
    • Summary
    • Exercise
  • Predicting Online Ad Click-Through with Tree-Based Algorithms
    • Brief overview of advertising click-through prediction
    • Getting started with two types of data – numerical and categorical
    • Exploring decision tree from root to leaves
      • Constructing a decision tree
      • The metrics for measuring a split
    • Implementing a decision tree from scratch
    • Predicting ad click-through with decision tree
    • Ensembling decision trees – random forest
      • Implementing random forest using TensorFlow
    • Summary
    • Exercise
  • Predicting Online Ad Click-Through with Logistic Regression
    • Converting categorical features to numerical – one-hot encoding and ordinal encoding
    • Classifying data with logistic regression
      • Getting started with the logistic function
      • Jumping from the logistic function to logistic regression
    • Training a logistic regression model
      • Training a logistic regression model using gradient descent
      • Predicting ad click-through with logistic regression using gradient descent
      • Training a logistic regression model using stochastic gradient descent
      • Training a logistic regression model with regularization
    • Training on large datasets with online learning
    • Handling multiclass classification
    • Implementing logistic regression using TensorFlow
    • Feature selection using random forest
    • Summary
    • Exercises
  • Scaling Up Prediction to Terabyte Click Logs
    • Learning the essentials of Apache Spark
      • Breaking down Spark
      • Installing Spark
      • Launching and deploying Spark programs
    • Programming in PySpark
    • Learning on massive click logs with Spark
      • Loading click logs
      • Splitting and caching the data
      • One-hot encoding categorical features
      • Training and testing a logistic regression model
    • Feature engineering on categorical variables with Spark
      • Hashing categorical features
      • Combining multiple variables – feature interaction
    • Summary
    • Exercises
  • Stock Price Prediction with Regression Algorithms
    • Brief overview of the stock market and stock prices
    • What is regression?
    • Mining stock price data
      • Getting started with feature engineering
      • Acquiring data and generating features
    • Estimating with linear regression
      • How does linear regression work?
      • Implementing linear regression
    • Estimating with decision tree regression
      • Transitioning from classification trees to regression trees
      • Implementing decision tree regression
      • Implementing regression forest
    • Estimating with support vector regression
      • Implementing SVR
    • Estimating with neural networks
      • Demystifying neural networks
      • Implementing neural networks
    • Evaluating regression performance
    • Predicting stock price with four regression algorithms
    • Summary
    • Exercise
  • Section 3: Python Machine Learning Best Practices
  • Machine Learning Best Practices
    • Machine learning solution workflow
    • Best practices in the data preparation stage
      • Best practice 1 – completely understanding the project goal
      • Best practice 2 – collecting all fields that are relevant
      • Best practice 3 – maintaining the consistency of field values
      • Best practice 4 – dealing with missing data
      • Best practice 5 – storing large-scale data
    • Best practices in the training sets generation stage
      • Best practice 6 – identifying categorical features with numerical values
      • Best practice 7 – deciding on whether or not to encode categorical features
      • Best practice 8 – deciding on whether or not to select features, and if so, how to do so
      • Best practice 9 – deciding on whether or not to reduce dimensionality, and if so, how to do so
      • Best practice 10 – deciding on whether or not to rescale features
      • Best practice 11 – performing feature engineering with domain expertise
      • Best practice 12 – performing feature engineering without domain expertise
      • Best practice 13 – documenting how each feature is generated
      • Best practice 14 – extracting features from text data
    • Best practices in the model training, evaluation, and selection stage
      • Best practice 15 – choosing the right algorithm(s) to start with
        • Naïve Bayes
        • Logistic regression
        • SVM
        • Random forest (or decision tree)
        • Neural networks
      • Best practice 16 – reducing overfitting
      • Best practice 17 – diagnosing overfitting and underfitting
      • Best practice 18 – modeling on large-scale datasets
    • Best practices in the deployment and monitoring stage
      • Best practice 19 – saving, loading, and reusing models
      • Best practice 20 – monitoring model performance
      • Best practice 21 – updating models regularly
    • Summary
    • Exercises
  • Other Books You May Enjoy
  • Index

