Details
Title | Hands-On Data Analysis with Pandas: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization. — Second edition. |
---|---|
Creators | Molin Stefanie. |
Collection | Электронные книги зарубежных издательств ; Общая коллекция |
Subjects | Python (Computer program language) ; Data mining. ; EBSCO eBooks |
Document type | Other |
File type | |
Language | English |
Rights | Доступ по паролю из сети Интернет (чтение, печать, копирование) |
Record key | on1249629222 |
Record create date | 5/5/2021 |
Allowed Actions
pdf/2922165.pdf | – |
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
|
---|---|---|
epub/2922165.epub | – |
Action 'Download' will be available if you login or access site from another network
|
Group | Anonymous |
---|---|
Network | Internet |
Network | User group | Action |
---|---|---|
ILC SPbPU Local Network | All |
|
Internet | Authorized users SPbPU |
|
Internet | Anonymous |
|
- Cover
- Title Page
- Copyright and Credits
- Dedicated
- Foreword to the Second Edition
- Foreword to the First Edition
- Contributors
- Table of Contents
- Preface
- Section 1: Getting Started with Pandas
- Chapter 1: Introduction to Data Analysis
- Chapter materials
- The fundamentals of data analysis
- Data collection
- Data wrangling
- Exploratory data analysis
- Drawing conclusions
- Statistical foundations
- Sampling
- Descriptive statistics
- Prediction and forecasting
- Inferential statistics
- Setting up a virtual environment
- Virtual environments
- Installing the required Python packages
- Why pandas?
- Jupyter Notebooks
- Summary
- Exercises
- Further reading
- Chapter 2: Working with Pandas DataFrames
- Chapter materials
- Pandas data structures
- Series
- Index
- DataFrame
- Creating a pandas DataFrame
- From a Python object
- From a file
- From a database
- From an API
- Inspecting a DataFrame object
- Examining the data
- Describing and summarizing the data
- Grabbing subsets of the data
- Selecting columns
- Slicing
- Indexing
- Filtering
- Adding and removing data
- Creating new data
- Deleting unwanted data
- Summary
- Exercises
- Further reading
- Section 2: Using Pandas for Data Analysis
- Chapter 3: Data Wrangling with Pandas
- Chapter materials
- Understanding data wrangling
- Data cleaning
- Data transformation
- Data enrichment
- Exploring an API to find and collect temperature data
- Cleaning data
- Renaming columns
- Type conversion
- Reordering, reindexing, and sorting data
- Reshaping data
- Transposing DataFrames
- Pivoting DataFrames
- Melting DataFrames
- Handling duplicate, missing, or invalid data
- Finding the problematic data
- Mitigating the issues
- Summary
- Exercises
- Further reading
- Chapter 4: Aggregating Pandas DataFrames
- Chapter materials
- Performing database-style operations on DataFrames
- Querying DataFrames
- Merging DataFrames
- Using DataFrame operations to enrich data
- Arithmetic and statistics
- Binning
- Applying functions
- Window calculations
- Pipes
- Aggregating data
- Summarizing DataFrames
- Aggregating by group
- Pivot tables and crosstabs
- Working with time series data
- Time-based selection and filtering
- Shifting for lagged data
- Differenced data
- Resampling
- Merging time series
- Summary
- Exercises
- Further reading
- Chapter 5: Visualizing Data with Pandas and Matplotlib
- Chapter materials
- An introduction to matplotlib
- The basics
- Plot components
- Additional options
- Plotting with pandas
- Evolution over time
- Relationships between variables
- Distributions
- Counts and frequencies
- The pandas.plotting module
- Scatter matrices
- Lag plots
- Autocorrelation plots
- Bootstrap plots
- Summary
- Exercises
- Further reading
- Chapter 6: Plotting with Seaborn and Customization Techniques
- Chapter materials
- Utilizing seaborn for advanced plotting
- Categorical data
- Correlations and heatmaps
- Regression plots
- Faceting
- Formatting plots with matplotlib
- Titles and labels
- Legends
- Formatting axes
- Customizing visualizations
- Adding reference lines
- Shading regions
- Annotations
- Colors
- Textures
- Summary
- Exercises
- Further reading
- Section 3: Applications – Real-World Analyses Using Pandas
- Chapter 7: Financial Analysis – Bitcoin and the Stock Market
- Chapter materials
- Building a Python package
- Package structure
- Overview of the stock_analysis package
- UML diagrams
- Collecting financial data
- The StockReader class
- Collecting historical data from Yahoo! Finance
- Exploratory data analysis
- The Visualizer class family
- Visualizing a stock
- Visualizing multiple assets
- Technical analysis of financial instruments
- The StockAnalyzer class
- The AssetGroupAnalyzer class
- Comparing assets
- Modeling performance using historical data
- The StockModeler class
- Time series decomposition
- ARIMA
- Linear regression with statsmodels
- Comparing models
- Summary
- Exercises
- Further reading
- Chapter 8: Rule-Based Anomaly Detection
- Chapter materials
- Simulating login attempts
- Assumptions
- The login_attempt_simulator package
- Simulating from the command line
- Exploratory data analysis
- Implementing rule-based anomaly detection
- Percent difference
- Tukey fence
- Z-score
- Evaluating performance
- Summary
- Exercises
- Further reading
- Section 4: Introduction to Machine Learning with Scikit-Learn
- Chapter 9: Getting Started
with Machine Learning in Python
- Chapter materials
- Overview of the machine learning landscape
- Types of machine learning
- Common tasks
- Machine learning in Python
- Exploratory data analysis
- Red wine quality data
- White and red wine chemical properties data
- Planets and exoplanets data
- Preprocessing data
- Training and testing sets
- Scaling and centering data
- Encoding data
- Imputing
- Additional transformers
- Building data pipelines
- Clustering
- k-means
- Evaluating clustering results
- Regression
- Linear regression
- Evaluating regression results
- Classification
- Logistic regression
- Evaluating classification results
- Summary
- Exercises
- Further reading
- Chapter 10: Making Better Predictions – Optimizing Models
- Chapter materials
- Hyperparameter tuning with grid search
- Feature engineering
- Interaction terms and polynomial features
- Dimensionality reduction
- Feature unions
- Feature importances
- Ensemble methods
- Random forest
- Gradient boosting
- Voting
- Inspecting classification prediction confidence
- Addressing class imbalance
- Under-sampling
- Over-sampling
- Regularization
- Summary
- Exercises
- Further reading
- Chapter 11: Machine Learning Anomaly Detection
- Chapter materials
- Exploring the simulated login attempts data
- Utilizing unsupervised methods of anomaly detection
- Isolation forest
- Local outlier factor
- Comparing models
- Implementing supervised anomaly detection
- Baselining
- Logistic regression
- Incorporating a feedback loop with online learning
- Creating the PartialFitPipeline subclass
- Stochastic gradient descent classifier
- Summary
- Exercises
- Further reading
- Section 5: Additional Resources
- Chapter 12: The Road Ahead
- Data resources
- Python packages
- Searching for data
- APIs
- Websites
- Practicing working with data
- Python practice
- Summary
- Exercises
- Further reading
- Data resources
- Solutions
- Appendix
- About Packt
- Other Books You May Enjoy
- Index