Computational Data Analytics Assignments

1 minute read

Code for this project is available on GitHub.

Description:

Course includes hands-on introduction to programming techniques relevant to data analysis and machine learning. Most of the programming exercises are based on Python and SQL.

Notebooks are built “from scratch,” of the basic components of a data analysis pipeline: collection, preprocessing, storage, analysis, and visualization. There are several examples of high-level data analysis questions, concepts and techniques for formalizing those questions into mathematical or computational tasks, and methods for translating those tasks into code. Beyond programming and best practices, notebooks include elementary data processing algorithms, notions of program correctness and efficiency, and numerical methods for linear algebra and mathematical optimization.

Notebooks:

  1. Notebook 1: Python Essentials
  2. Notebook 2: Pairwise Association Mining
  3. Notebook 3: Math Review (Not present)
  4. Notebook 4: Representing Numbers
  5. Notebook 5: Preprocessing Unstructured Text (Regex)
  6. Notebook 6: Mining the Web (BeautifulSoup, APIs, JSON)
  7. Notebook 7: Tidying Data (Tibbles, Melting, and Casting)
  8. Notebook 8: Visualizing Data and Results (Bokeh, Seaborn)
  9. Notebook 9: Relational Data (SQL, SQLite3)
  10. Notebook 10: Numerical Computing with Numpy/Scipy (Sparse Matrix, COO, CSR)
  11. Notebook 11: Ranking Relational Objects (Markov Chain Analysis)
  12. Notebook 12: Linear Regression
  13. Notebook 13: Classification (Logistic Regression)
  14. Notebook 14: Clustering via k-means
  15. Notebook 15: Compression via PCA
  16. Notebook 16: Eigenfaces

Languages:
Python 3.6, SQL

Libraries:
Pandas, NumPy, SciPy, re, matplotlib, seaborn, bokeh, collections, itertools

Relational Database Management System:
SQLite3

Environments:
Jupyter Notebooks