DATA SCIENCE/ANALYTICS PORTFOLIO
PYTHON
📔 COLAB NOTEBOOK
DayCare Management System
The provided Python code defines a DayCare management system that allows users to interactively manage child profiles for a daycare center. The system is built using ipywidgets for a user-friendly interface within a Jupyter notebook environment.
Classes and Their Functions
- Child
Represents a child enrolled in the daycare.
Stores details such as name, age, guardian’s contact, preferences, likes, dislikes, and special notes.
Provides methods to display the child’s information and convert the child’s data to a dictionary format.
- DayCare
Represents the daycare center.
Maintains a list of children enrolled in the daycare.
Provides methods to enroll and remove children, retrieve a specific child’s details, display all enrolled children, save and load children’s data to and from a file, remove duplicate entries, and delete a child’s profile.
- DayCareGUI
Provides a graphical user interface for the daycare management system.
Allows users to add, display, update, and delete child profiles.
Includes interactive widgets such as buttons and text fields for user input and actions.
User Interface Components
- Add Child Button: Opens input fields for adding a new child’s profile.
- Children List: Displays a list of enrolled children for selection.
- Show Info Button: Displays the selected child’s detailed information.
- Update Profile Button: Opens input fields for updating the selected child’s profile.
- Delete Child Button: Deletes the selected child’s profile from the system.
Example Usage
The code includes an example usage section that initializes the daycare center, loads existing data, removes duplicates, creates the GUI, and displays it for user interaction.
File Operations
The system can save the current state of enrolled children to a JSON file and load it back, ensuring data persistence across sessions.
Interactive Features
The system is designed to be interactive, with clear outputs and updates reflecting the user’s actions in real-time.
Upper Management
Operational Teams
Overview
In this project, I will analyze a dataset from a Car Insurance company. Utilizing a combination of data analysis and visualization tools, I will develop insightful dashboards designed for insurance analysis. The project caters to two distinct audiences: upper management, who will use Tableau, and operational teams, who will rely on Google Data Studio.
- Google BigQuery: Used for data storage and querying.
- Tableau: Employed for creating dashboards for upper management.
- Google Data Studio: Utilized for crafting operational dashboards.
- GitHub: The primary platform for version control and sharing the project’s codebase.
Process and Methodology
- Data Preparation: Begin by importing the car insurance dataset into Google BigQuery, ensuring the data is clean and structured for analysis.
- Data Analysis: Conduct an initial exploration to understand the data’s characteristics and identify any patterns.
- Dashboard Creation: Create dashboards in Tableau for upper management and another operational dashboard in Google Data Studio.
- Presentation: The findings will be presented to the relevant stakeholders.
Streamlit App Screenshot - P0SITIVE REVIEW
Streamlit App Screenshot - NEGATIVE REVIEW
Overview
In this project, I conducted sentiment analysis on a collection of product reviews from an e-commerce platform. Utilizing a combination of text reviews and associated ratings, I developed a model capable of classifying the sentiment of each review as positive, negative, or neutral. The project leveraged natural language processing techniques and machine learning to analyze and categorize sentiments, providing valuable insights into customer feedback.
- Python Libraries: Utilized for data manipulation and analysis (Pandas, NLTK).
- Machine Learning Libraries: Employed for building and evaluating the sentiment analysis model (Scikit-learn, Spacy).
- Jupyter Notebook: The primary platform for documenting the analysis process.
- GitHub: Used for version control and sharing the project’s codebase.
- Streamlit: Enabled the creation of an interactive web application to showcase the analysis results.
Process and Methodology
- Data Preparation: I began by importing the dataset into an SQL environment, ensuring the data was clean and structured for analysis.
- Exploratory Data Analysis (EDA): I conducted an initial exploration to understand the data’s characteristics and identify any patterns.
- Data Preprocessing: The text data was cleaned and preprocessed to ensure it was in the optimal format for modeling.
- Sentiment Labeling: Each review was labeled according to sentiment derived from the text content and associated ratings.
- Text Vectorization: I transformed the text data into a numerical format that machine learning models could interpret.
- Model Building: A machine learning model was constructed to classify the reviews’ sentiments accurately.
- Model Evaluation: The model’s performance was rigorously assessed using appropriate evaluation metrics.
- Sentiment Analysis Dashboard: An interactive dashboard was created using Streamlit to visualize and interact with the sentiment analysis results.
Objective
The goal of this project is to work with a hotel reservation dataset that contains information about reservations at two types of hotels: Resort Hotels (H1) and City Hotels (H2). I used SQL for data manipulation and Tableau for visualization to gain insights and create impactful visualizations.
- SQL (MySQL, PostgreSQL, BigQuery, Athena)
- Tableau
- GitHub
Steps
- Data Import:
- Import the hotel reservation dataset into your preferred SQL environment.
- Data Exploration:
- Explore the dataset to understand the variables and the relationships between them.
- SQL Analysis:
- Use SQL for data manipulation and analysis.
- Tableau Visualization:
- Create visualizations in Tableau based on my findings from the SQL analysis.
- GitHub Repository:
- Maintain a GitHub repository with all my scripts, code, and visualizations.
- Tableau Dashboards publishing:
- Publish my Tableau dashboards for easy access and sharing.
- Presentation:
- Prepare a presentation or report summarizing my findings and recommendations.
Objective
In this project, I conducted a market basket analysis, for retail and e-commerce. I extracted valuable insights from transaction data, to understand customer purchasing behavior, and use this knowledge for business optimization.
- Data Analysis Tool: Python (using libraries like Pandas, etc.)
- Data Visualization Tool: Matplotlib, Seaborn
- Scikit-Learn
- Jupyter NotebookTools
Steps
- Data Preparation:
- Load the dataset using pandas.
- Clean the data by handling missing values and removing duplicates.
- Preprocess the data if necessary, such as converting data types, encoding categorical variables, etc.
- Exploratory Data Analysis (EDA):
- Analyze the dataset to understand the distribution and relationship of the variables.
- Use visualization tools like Matplotlib and Seaborn to create plots for better understanding.
- Market Basket Analysis:
- Use the Apriori algorithm or other association rule mining algorithms for market basket analysis.
- Generate frequent itemsets and strong association rules.
- Visualization:
- Visualize the results of the market basket analysis.
- Create plots to show the most frequent itemsets, the items that are most commonly bought together, etc.
- Interpretation and Insights:
- Interpret the results of the market basket analysis.
- Extract insights about customer purchasing behavior.
- Recommendations:
- Based on the insights, make recommendations for business optimization.
- Suggestions could include changes in product placement, pricing strategies, cross-selling tactics, etc.
- Presentation:
- Document all my findings, code, and visualizations in a Jupyter Notebook.
- Prepare a presentation or report for my mentorship group
In this project, I utilized a linear regression model to predict car prices and further explored the methods used to interpret and evaluate the results of our model. The project involved the following steps:
- Data Assessment: The initial step involved a thorough assessment of the data to understand its structure and content.
- Building Models: I built a basic linear regression model using selected features and ran the model summary. I also constructed a full linear regression model using a comprehensive range of features available.
- Feature Engineering: I plotted a correlation matrix and reduced the number of independent variables through feature engineering. I also performed one-hot encoding on categorical variables.
- Model Training and Prediction: I split the data into train and test sets, fitted the model to the training data, and performed predictions on the test set.
- Model Refinement: I printed the summary output of the model, selected variables that were statistically significant (p-value < 0.05), and retrained the model.
- Scaling Variables: I scaled the independent variables and fitted the model with the standardized data.
- Model Evaluation: Finally, I evaluated the performance of the Linear Regression model.
I developed an automatic credit card approval predictor using machine learning classification techniques in this project. The project involved the following steps:
- Data Loading: The initial step involved loading and viewing the dataset.
- Data Preprocessing: The dataset contained a mixture of numerical and non-numerical features, values from different ranges, and several missing entries. I preprocessed the dataset to ensure the machine learning model could make good predictions.
- Exploratory Data Analysis: After the data was in good shape, I performed some exploratory data analysis to build my intuitions.
- Model Building: Finally, I built a machine learning classification model that predicts if an individual’s application for a credit card will be accepted.
MICROSOFT EXCEL
The above charts are derived from an Excel 3-statement financial model. These charts represent the Income Statement and Cash Flow Statement of a company from 2016A to 2023E.
The 3-statement financial model is a type of financial model that uses three financial statements of a company: the Income Statement, Balance Sheet, and Cash Flow Statement. It is a highly interconnected model where changes in one statement flow through to the others, providing a comprehensive view of a company’s financial health. The model helps in financial analysis, decision-making, and valuation of a company.
Income Statement Chart:
- Revenue (Blue Bars): The revenue shows an increasing trend from 2016A to 2023E, indicating the company’s business growth.
- EBITDA Margin (Orange Line): The EBITDA margin is relatively stable with slight fluctuations. It represents the company’s operational profitability.
- Net Income Margin (Grey Line): This margin is decreasing over time, suggesting that net profitability is declining relative to revenue, possibly due to increased costs or expenses.
Cash Flow Statement Chart:
- Operations (Blue Bars): Positive values suggest that the company’s core business operations are generating cash inflows.
- Investing (Orange Bars): Negative values indicate cash outflows related to investments. There’s an increase in investment activities over time.
- Financing (Yellow Bars): It fluctuates over the years, representing changes in debt, equity, or dividend payments.
- Change in Cash (Line Graph): The line graph indicates variations in the company’s cash position over time.
Building the Dashboard:
- Visualization Tools: Utilized Excel to create visual representations displaying key metrics.
- Dashboard Elements:
- Top left Section: Contains a bar graph showing the revenue generated by 3 different business units from 2014 to 2018.
- Top Right Section: Displays a line and bar graph combination depicting profit margins from 2014 to 2023, with a distinction between historical and forecasted data.
- Bottom Section: Shows a waterfall chart illustrating the cumulative revenue of 3 business units in 2018, an area chart displaying expenses related to Materials and Bandwidth, Depreciation and amortization, Rents/Overhead, and Others from 2014 to 2023, and tables summarizing five-year performance, actual vs planned income statement figures for FY (Fiscal Year) 2018, detailed information on revenues, COGS (Cost Of Goods Sold), expenses, and balance sheet summary for 2018.
Interpretation of Findings:
- Business Unit Revenue: The bar graph provides a year-by-year breakdown of the revenue generated by 3 different business units. It helps to understand which business unit is generating the most revenue.
- Profit Margin: The line and bar graph combination depicts profit margins from 2014 to 2023, with a distinction between historical and forecasted data. It helps to understand the profitability of the company over time.
- 2018 Cumulative Revenue: The waterfall chart illustrates the cumulative revenue of 3 business units in 2018. It helps to understand the contribution of each business unit to the total revenue.
- Expenses: The area chart displays expenses related to Materials and Bandwidth, Depreciation and amortization, Rents/Overhead, and Others from 2014 to 2023. It helps to understand the major expense areas and their trends over time.
- Five-Year Performance Summary: The table summarizes five-year performance including revenue, COGS (Cost Of Goods Sold), expenses, and operating profit margin with their respective averages and trends. It provides a quick snapshot of the company’s financial performance over the past five years.
- Income Statement FY 2018: The table presents actual vs planned income statement figures for FY (Fiscal Year) 2018 along with variances in percentages. It helps to understand how the actual figures deviated from the planned figures.
- P&L Summary 2018: The table provides detailed information on revenues, COGS (Cost Of Goods Sold), expenses broken down into salaries & benefits; rent and overhead; depreciation & amortization; interest; total expenses; net operating profit for the year 2018. It provides a detailed view of the company’s profit and loss statement for the year 2018.
- Balance Sheet Summary 2018: This table outlines assets including current assets/non-current assets/total assets/liabilities including current liabilities/long-term liabilities/shareholders’ equity/total liabilities & shareholders’ equity. It provides a snapshot of the company’s financial position at the end of the year 2018.
Building the Dashboard:
- Visualization Tools: Utilized Excel to create visual representations displaying key metrics.
- Dashboard Elements:
- Top Section: Contains four semi-circular gauges indicating Website Traffic, number of Page Views, Conversion Rate, and New Customers.
- Middle Section: Displays a bar graph titled “# of Orders” with bars representing the number of orders each month and orange triangles indicating the target number of orders.
- Bottom Section: Shows an area graph titled “Revenue” the area representing actual revenue and a line representing target revenue, and a line graph titled “EBITDA Margin” with two lines showing actual EBITDA margin and target EBITDA margin.
Interpretation of Findings:
- Website Traffic: The gauge indicates that the website traffic is at 75%.
- # of Page Views: The number of page views is at 40%.
- Conversion Rate: The conversion rate is at 32%.
- New Customers: The number of new customers is at 65%.
- # of Orders: This graph provides a month-by-month breakdown of the number of orders placed. The blue bars represent the actual number of orders received each month, while the orange triangles indicate the target number of orders. By comparing the height of the bars with the position of the triangles, you can see how well the company is meeting its targets. For instance, if the blue bar exceeds the orange triangle in a given month, it means the company has surpassed its order target for that month.
- Revenue: The area graph shows the actual revenue and the orange line target revenue. This graph shows the company’s revenue performance over time. The solid area represents the actual revenue, while the dotted line represents the target revenue. The distance between these two lines indicates the gap between the company’s actual and target revenues. If the solid area is above the dotted line, it means the company’s actual revenue has exceeded its target. Conversely, if the solid area is below the dotted line, it means the company has fallen short of its revenue target.
- EBITDA Margin: EBITDA (Earnings Before Interest, Taxes, Depreciation, and Amortization) Margin is a measure of a company’s operating profitability as a percentage of its total revenue. The graph shows the actual EBITDA margin (solid line) and the target EBITDA margin (dotted line). If the solid line is above the dotted line, it means the company’s actual EBITDA margin is higher than its target, indicating better-than-expected profitability. If the solid line is below the dotted line, it means the company’s profitability is lower than its target.
This dashboard serves as a powerful tool for decision-making, helping to identify trends, monitor performance, and guide strategic planning for the website.
Building the Dashboards:
- Data Collection & Preparation: Used raw data from the company’s internal database. The data was cleaned, transformed, and organized to ensure accuracy and consistency.
- Visualization Tools: Utilized Excel(Power Pivot) to create visual representations displaying key metrics.
Dashboard Elements:
- Top Section: Provides store size, category, brand, and year filters.
- Middle Section: Displays detailed tables of sales metrics by city on the left side and by category & brand on the right side.
- Bottom Section: Presents a bar graph showing monthly sales trends for 2018 & 2019.
Interpretation of Findings:
- Austin: Showed an impressive 8.7% YoY growth in sales with a 30% margin, indicating strong market penetration.
- Lux Bed: Holds the highest share at 30.9% and has seen a YoY Margin of 1.4%, suggesting a need for strategies to improve profitability.
- Monthly Sales: Peaked from October to December and dropped from May to September, indicating seasonality trends that can be leveraged for future marketing campaigns.
This dashboard serves as a powerful tool for decision-making, helping to identify trends, monitor performance, and guide strategic planning for the Head Rest Bed Company.
Dashboard Elements:
- Top Section: Provides filters for date and location.
- Middle Section: Displays detailed tables of sales metrics by day of the week and by product type.
- Right Section: Presents a table showing individual employee performance metrics.
Interpretation of Findings:
- Sales Share: Sunday has the highest sales share at 30%, indicating that it’s the busiest day of the week for the company.
- Product Type: The Mattress category has the highest sales at $17,882,010 with a YoY sales increase of 6.7%.
- Employee Performance: Letisha from Detroit has the highest sales share among employees at 20.7% with a sales per day figure of $23,857.
This dashboard serves as a powerful tool for decision-making, helping to identify trends, monitor performance, and guide strategic planning for the Head Rest Bed Company
SQL
In this project, I analyzed international debt data collected by The World Bank. The dataset contains information about the amount of debt (in USD) owed by developing countries across several categories. The project aimed to answer the following questions:
- Total Debt: What is the total debt owed by the countries listed in the dataset?
- Maximum Debt: Which country owns the maximum amount of debt and what is that amount?
- Average Debt: What is the average amount of debt owed by countries across different debt indicators?
In this project, I conducted a comprehensive analysis of data on SATs across public schools in New York City. The project involved the following steps:
- Data Analysis: I performed an exploratory data analysis on the SATs dataset, looking for patterns, trends, and insights.
- Insights Extraction: I extracted insights from the dataset, focusing on the distribution of scores across different schools, districts, and over time.
- Data Visualization: I visualized the data to better understand the patterns and trends observed.
💡Embracing the Future: My Journey with Applied Artificial Intelligence
🚀 As we journey through the data universe, let’s take a detour into the captivating world of AI! 🤖 Don’t miss out on these AI projects, each a testament to the power of machine learning. Click away! 👇
CORE COMPETENCIES
- Methodologies: Data Management, Statistics, Data Visualization, Data Presentation and Communication, Machine Learning, Problem-solving, Research, Collaboration, Financial Analysis, Modeling and Valuation, Business Intelligence, Exploratory Data Analysis, Feature Engineering, Deep Learning, Time Series Analysis, Natural Language Processing.
- Languages: Python (Pandas, Numpy, Scikit-Learn, Scipy, TensorFlow, Keras, Seaborn, Plotly, Matplotlib), R, SQL.
- Tools: Excel, Power BI, Tableau, BigQuery, Google Data Studio, Power Pivot, Power Query, VBA, Macabacus, Azure
CERTIFICATES
© 2023 Annet Chebukati. Powered by Jekyll and the Minimal Theme.