Data Science
Notes along the Data Science, Python, R journey
Books
Collateral for Books
- An Introduction to Statistics with Python (Thomas Haslwanter)
- Numerical Python (J Robert Johansson)
- Python for Data Analysis, 2nd Edition (Wes McKinney)
- Introduction to Python for Econometrics, Statistics and Numerical Analysis: Third Edition (Kevin Sheppard)
- Hands-On Machine Learning with Scikit-Learn and TensorFlow (Aurélien Géron)
MOOCS
- Intro to Python for Data Science (Datacamp - Filip Schouwenaars)
- Foundations of Data Analysis - Part 1 (edX - UTAustinX - Dr. Michael Mahometa)
- Statistical Learning (Stanford - Trevor Hastie, Robert Tibshirani)
- Statistics and Probability (Khan Academy)
- Statistics and Probability Courses (cK12)
- Machine Learning (Coursera - Stanford - Andrew Ng)
- Single Variable Calculus (MIT Open CourseWare)
- Multi Variable Calculus (MIT Open CourseWare)
- Introduction to Probability and Statistics (MIT Open CourseWare)
- Linear Algebra (MIT Open CourseWare)
- Introduction to Computer Science and Programming Using Python (edX - MITx - John Guttag, Eric Grimson, Ana Bell)
- Introduction to Computational Thinking and Data Science (edX - MITx - John Guttag, Eric Grimson, Ana Bell)
- Introduction to R (Datacamp)
- Try R (Codeschool)
- Introduction to Data Science (Harvard School of Public Health)
- An Introduction to Interactive Programming in Python (Part 1) (Coursera - Rice University)
Competitions
Portals/Blogs
- Analytics Vidhya
- Data School
- KDNuggets
- Machine Learning Mastery
- NumFOCUS (Sponsors of several Python related projects)
- PyData
Influencers
- Guido van Rossum (Python)
- Eric Holscher (ReadTheDocs, WriteTheDocs)
- Raymond Hettinger (Python Core Developer)
- Travis Oliphant (NumPy, SciPy, Continuum)
- Tom Caswell (matplotlib)
- Michael Wascom (seaborn)
- Wes McKinney (pandas)
- Jake VanderPlas
- Chris Parmer (Plotly, Dash)
- LeLand Wilkinson (The Grammar of Graphics)
- Hadley Wickham (ggplot2, RStudio)
- Dr. Yifan Hu (Yahoo)
- Mike Bostock (d3js)
- Chris Albon
- Bruce Sherwood (vpython / GlowScript)
- Bret Victor
- Peter Norvig (Google Research)
- Kevin Sheppard
Software
- Python
- R (The Comprehensive R Archive Network)
- R Studio and Shiny
- Python and R based Data Science toolkit (Anaconda)
- Tableau
- SAS
- MATLAB (MathWorks)
- Mathematica (Wolfram)
- Octave (GNU)
- Gephi
- Paraview (Kitware)
- Dash (Plotly)
- RAWGraphs
- GitHub Desktop
- Rodeo (Python IDE) (yhat)
Important Modules/Packages/Libraries/Frameworks
Python
- NumPy, SciPy, Matplotlib, pandas, IPython
- scikit-image
- scikit-learn
- Bottle
- Flask, Jinja 2, Pygments, Sphinx, Werkzeug
- Django
Datasets
- Stanford Large Network Dataset Collection
- UCI Network Data Repository
- UCI Machine Learning Repository
- Matrix Market
- enigma
Pretrained Networks
Documentation
Tutorials
- Markdown Tutorial
- Python
- R
Document Generation
Cheatsheets
- Markdown Reference
- reStructuredText (rst) Cheatsheet
- Git Cheatsheet (GitHub)
- Git Tutorials (Atlassian)
Important Python PEPs
Important Python References
Resources
- The History of Python (Guido van Rossum)
- PEP 8 - Style Guide for Python Code
- Scipy Lecture Notes
- Roadmap: How to Learn Machine Learning in 6 Months
- {swirl} Learn R in R
- r-statistics - Tutorials on Advanced Stats and Machine Learning with R
- scikit-learn Algorithm Cheatsheet
- An R Introduction to Statistics
- Datacamp Tutorials
- A Gallery of Large Graphs (Dr. Yifan Hu)
- Mike Bostock’s Blocks
- Other Blocks
- vpython
- GlowScript
- Curated Awesome Lists
- Subtleties of Color (Robert Simmon - NASA)
- Density Design Research Labs
- Calibro
- Effectively Using Matplotlib (Chris Moffitt)
- Python Crash Course - Python Cheatsheets (Eric Matthes)
- pythontutor.com
- 28 Jupyter Notebook tips, tricks, and shortcuts (Dataquest)
- Making Publication Ready Python Notebooks (Julius Schulz)
- Statistics and Machine Learning Toolbox (MATLAB)
- A Concrete Introduction to Probability (using Python) (Peter Norvig)
- TOC of “Modern Python: Big Ideas and Little Code in Python” Live Lesson (Raymond Hettinger)
- Python Recipes
- Gitflow (Vincent Driessen)
Videos
- Python Related Videos (pyvideo.org)
- Python 3000 (Guido van Rossum)
- Machine Learning Lectures by Andrew Ng at Stanford
- Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger - Core Developer of Python)
- Inside NumPy (Nathaniel Smith)
- Losing your Loops - Fast Numerical Computing with NumPy (Jake VanderPlas)