Computational Data Analytics Assignments
Code for this project is available on GitHub.
Description:
Course includes hands-on introduction to programming techniques relevant to data analysis and machine learning. Most of the programming exercises are based on Python and SQL.
Notebooks are built “from scratch,” of the basic components of a data analysis pipeline: collection, preprocessing, storage, analysis, and visualization. There are several examples of high-level data analysis questions, concepts and techniques for formalizing those questions into mathematical or computational tasks, and methods for translating those tasks into code. Beyond programming and best practices, notebooks include elementary data processing algorithms, notions of program correctness and efficiency, and numerical methods for linear algebra and mathematical optimization.
Notebooks:
- Notebook 1: Python Essentials
- Notebook 2: Pairwise Association Mining
- Notebook 3: Math Review (Not present)
- Notebook 4: Representing Numbers
- Notebook 5: Preprocessing Unstructured Text (Regex)
- Notebook 6: Mining the Web (BeautifulSoup, APIs, JSON)
- Notebook 7: Tidying Data (Tibbles, Melting, and Casting)
- Notebook 8: Visualizing Data and Results (Bokeh, Seaborn)
- Notebook 9: Relational Data (SQL, SQLite3)
- Notebook 10: Numerical Computing with Numpy/Scipy (Sparse Matrix, COO, CSR)
- Notebook 11: Ranking Relational Objects (Markov Chain Analysis)
- Notebook 12: Linear Regression
- Notebook 13: Classification (Logistic Regression)
- Notebook 14: Clustering via k-means
- Notebook 15: Compression via PCA
- Notebook 16: Eigenfaces
Languages:
Python 3.6, SQL
Libraries:
Pandas, NumPy, SciPy, re, matplotlib, seaborn, bokeh, collections, itertools
Relational Database Management System:
SQLite3
Environments:
Jupyter Notebooks