Hands-On Data Analysis with Pandas: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization

Molin, Stefanie.

Details

Title	Hands-On Data Analysis with Pandas: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization. — Second edition.
Creators	Molin Stefanie.
Collection	Электронные книги зарубежных издательств ; Общая коллекция
Subjects	Python (Computer program language) ; Data mining. ; EBSCO eBooks
Document type	Other
File type	PDF
Language	English
Rights	Доступ по паролю из сети Интернет (чтение, печать, копирование)
Record key	on1249629222
Record create date	5/5/2021

Allowed Actions

pdf/2922165.pdf	–	Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network
epub/2922165.epub	–	Action 'Download' will be available if you login or access site from another network

Group	Anonymous
Network	Internet

Network	User group	Action
ILC SPbPU Local Network	All
Internet	Authorized users SPbPU
Internet	Anonymous

Cover
Title Page
Copyright and Credits
Dedicated
Foreword to the Second Edition
Foreword to the First Edition
Contributors
Table of Contents
Preface
Section 1: Getting Started with Pandas
Chapter 1: Introduction to Data Analysis
- Chapter materials
- The fundamentals of data analysis
  - Data collection
  - Data wrangling
  - Exploratory data analysis
  - Drawing conclusions
- Statistical foundations
  - Sampling
  - Descriptive statistics
  - Prediction and forecasting
  - Inferential statistics
- Setting up a virtual environment
  - Virtual environments
  - Installing the required Python packages
  - Why pandas?
  - Jupyter Notebooks
- Summary
- Exercises
- Further reading
Chapter 2: Working with Pandas DataFrames
- Chapter materials
- Pandas data structures
  - Series
  - Index
  - DataFrame
- Creating a pandas DataFrame
  - From a Python object
  - From a file
  - From a database
  - From an API
- Inspecting a DataFrame object
  - Examining the data
  - Describing and summarizing the data
- Grabbing subsets of the data
  - Selecting columns
  - Slicing
  - Indexing
  - Filtering
- Adding and removing data
  - Creating new data
  - Deleting unwanted data
- Summary
- Exercises
- Further reading
Section 2: Using Pandas for Data Analysis
Chapter 3: Data Wrangling with Pandas
- Chapter materials
- Understanding data wrangling
  - Data cleaning
  - Data transformation
  - Data enrichment
- Exploring an API to find and collect temperature data
- Cleaning data
  - Renaming columns
  - Type conversion
  - Reordering, reindexing, and sorting data
- Reshaping data
  - Transposing DataFrames
  - Pivoting DataFrames
  - Melting DataFrames
- Handling duplicate, missing, or invalid data
  - Finding the problematic data
  - Mitigating the issues
- Summary
- Exercises
- Further reading
Chapter 4: Aggregating Pandas DataFrames
- Chapter materials
- Performing database-style operations on DataFrames
  - Querying DataFrames
  - Merging DataFrames
- Using DataFrame operations to enrich data
  - Arithmetic and statistics
  - Binning
  - Applying functions
  - Window calculations
  - Pipes
- Aggregating data
  - Summarizing DataFrames
  - Aggregating by group
  - Pivot tables and crosstabs
- Working with time series data
  - Time-based selection and filtering
  - Shifting for lagged data
  - Differenced data
  - Resampling
  - Merging time series
- Summary
- Exercises
- Further reading
Chapter 5: Visualizing Data with Pandas and Matplotlib
- Chapter materials
- An introduction to matplotlib
  - The basics
  - Plot components
  - Additional options
- Plotting with pandas
  - Evolution over time
  - Relationships between variables
  - Distributions
  - Counts and frequencies
- The pandas.plotting module
  - Scatter matrices
  - Lag plots
  - Autocorrelation plots
  - Bootstrap plots
- Summary
- Exercises
- Further reading
Chapter 6: Plotting with Seaborn and Customization Techniques
- Chapter materials
- Utilizing seaborn for advanced plotting
  - Categorical data
  - Correlations and heatmaps
  - Regression plots
  - Faceting
- Formatting plots with matplotlib
  - Titles and labels
  - Legends
  - Formatting axes
- Customizing visualizations
  - Adding reference lines
  - Shading regions
  - Annotations
  - Colors
  - Textures
- Summary
- Exercises
- Further reading
Section 3: Applications – Real-World Analyses Using Pandas
Chapter 7: Financial Analysis – Bitcoin and the Stock Market
- Chapter materials
- Building a Python package
  - Package structure
  - Overview of the stock_analysis package
  - UML diagrams
- Collecting financial data
  - The StockReader class
  - Collecting historical data from Yahoo! Finance
- Exploratory data analysis
  - The Visualizer class family
  - Visualizing a stock
  - Visualizing multiple assets
- Technical analysis of financial instruments
  - The StockAnalyzer class
  - The AssetGroupAnalyzer class
  - Comparing assets
- Modeling performance using historical data
  - The StockModeler class
  - Time series decomposition
  - ARIMA
  - Linear regression with statsmodels
  - Comparing models
- Summary
- Exercises
- Further reading
Chapter 8: Rule-Based Anomaly Detection
- Chapter materials
- Simulating login attempts
  - Assumptions
  - The login_attempt_simulator package
  - Simulating from the command line
- Exploratory data analysis
- Implementing rule-based anomaly detection
  - Percent difference
  - Tukey fence
  - Z-score
  - Evaluating performance
- Summary
- Exercises
- Further reading
Section 4: Introduction to Machine Learning with Scikit-Learn
Chapter 9: Getting Started with Machine Learning in Python
- Chapter materials
- Overview of the machine learning landscape
  - Types of machine learning
  - Common tasks
  - Machine learning in Python
- Exploratory data analysis
  - Red wine quality data
  - White and red wine chemical properties data
  - Planets and exoplanets data
- Preprocessing data
  - Training and testing sets
  - Scaling and centering data
  - Encoding data
  - Imputing
  - Additional transformers
  - Building data pipelines
- Clustering
  - k-means
  - Evaluating clustering results
- Regression
  - Linear regression
  - Evaluating regression results
- Classification
  - Logistic regression
  - Evaluating classification results
- Summary
- Exercises
- Further reading
Chapter 10: Making Better Predictions – Optimizing Models
- Chapter materials
- Hyperparameter tuning with grid search
- Feature engineering
  - Interaction terms and polynomial features
  - Dimensionality reduction
  - Feature unions
  - Feature importances
- Ensemble methods
  - Random forest
  - Gradient boosting
  - Voting
- Inspecting classification prediction confidence
- Addressing class imbalance
  - Under-sampling
  - Over-sampling
- Regularization
- Summary
- Exercises
- Further reading
Chapter 11: Machine Learning Anomaly Detection
- Chapter materials
- Exploring the simulated login attempts data
- Utilizing unsupervised methods of anomaly detection
  - Isolation forest
  - Local outlier factor
  - Comparing models
- Implementing supervised anomaly detection
  - Baselining
  - Logistic regression
- Incorporating a feedback loop with online learning
  - Creating the PartialFitPipeline subclass
  - Stochastic gradient descent classifier
- Summary
- Exercises
- Further reading
Section 5: Additional Resources
Chapter 12: The Road Ahead
- Data resources
  - Python packages
  - Searching for data
  - APIs
  - Websites
- Practicing working with data
- Python practice
- Summary
- Exercises
- Further reading
Solutions
Appendix
About Packt
Other Books You May Enjoy
Index

pdf/2922165.pdf

Access count: 4
Last 30 days: 0

Detailed usage statistics

epub/2922165.epub

Access count: 0
Last 30 days: 0

Detailed usage statistics