A set of industry oriented examples with all the tools that I have dominated.
My full resume can be found here.
The topic of my portfolio is part of the final capstone project of my Google Data Analytics Professional Certificate. While the project had specific goals, I went beyond them, and I used them to demonstrate my analytical skills and the tools that I dominate.
Bellabeat is a high-tech manufacturer of health-focused products for women, and they can become a more prominent player in the global smart device market. The Chief Creative Officer of Bellabeat believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. The goal is to focus on one of Bellabeat’s products and analyze smart device data to understand how consumers use their smart devices. These insights will then help guide the marketing strategy for the company.
The data analytics process that I followed, as suggested by Google, is described step-by-step in this set of log files: Ask, Prepare, Process, Analyze, Share, Act.
In short, I am using three different datasets that I called: Fitabase
(Fitbit information from 30 individuals)[1], AppleWatchFitbit
(information from two smart devices from 23 men and 26 women)[2], and FitbitGrades
(Fitbit information from 400 college students, including grades)[3]. The following table can guide you through the different topics, the notebooks, and the collection of tools used.
Tools used | Goals | Code/Notebooks/Links |
---|---|---|
Bigquery , SQL , Python , Pandas , Matplotlib , Seaborn
|
Study the overall behavior of Fitbit consumers using the Fitabase dataset. |
|
R , Tidyverse , ggplot2
|
Study the behavior of Apple watch consumers using the AppleWatchFitbit dataset. Study differences between women and men consumers. |
|
Spreadsheets , Pivot tables
|
Study behavior of women/men Fitbit costumers related with their intellectual skills. |
|
Tableau , dashboards
|
Summarize and emphasize the previous findings using BI tools. |
|
Finally, the entire project is stored in this GitHub repository.
The outcomes of this study are:
A set of slides highlighting the results and recommendations can be found here.
After completing the capstone project and having a deeper look at the datasets, I got some ideas that I want to explore, showing other tools that I dominate. In this part of my portfolio, I am showing machine learning techniques applied to answer some questions that I got from the datasets:
Tools used | Goals | Code/Notebooks/Links |
---|---|---|
statsmodels regressions, scikit-learn , XGBoost , Feature importance , ML Optimizations
|
Can I infer the number of calories burned from other variables collected by the apple watch?. By answering this goal, I can show different regression techniques. | Link to the notebook |
sklearn , classification , LogisticRegression , RandomForest , XGBoost , Exploratory Data Analysis , matplotlib , seaborn
|
The famous Titanic competition. Here I test many different ML algorithms. | Link to the notebook |
pyspark , binary classification , SQL , Feature engineering
|
I use pyspark in one the Kaggle Monthly Challenges. |
Link to the notebook |
TensorFlow , Keras , sklearn , multilabel classification , deep neural network
|
Classification problem using one the CERN LHC datasets. | Link to the notebook |
More soon
[1] Furberg, R., Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.53894
[2] Fuller, Daniel, 2020, “Replication Data for: Using machine learning methods to predict physical activity types with Apple Watch and Fitbit data using indirect calorimetry as the criterion.”, https://doi.org/10.7910/DVN/ZS2Z2J, Harvard Dataverse, V1
[3] Broaddus, Allie; Jaquis, Brandon; Jones, Colt; Jost, Scarlet; Lang, Andrew; Li, Ailin; et al. (2018): Dataset: Fitbits, field-tests, and grades. The effects of a healthy and physically active lifestyle on the academic performance of first year college students.. figshare. Dataset. https://doi.org/10.6084/m9.figshare.7218497.v1