Having a good portfolio is like having a passport in the world of data science. It’s like having a passport to all the exciting career opportunities out there and the chance to show off your skills as a data scientist.

 

Whether you’re just starting out or you're an experienced data scientist looking to get your foot into data science, Microsoft Excel is a great place to start. It’s easy to use and versatile, so you can use it to create your data science portfolio.

 

Excel, often overlooked as a data science instrument, can be a powerful tool in your efforts to demonstrate your proficiency in data manipulation, analysis and visualization. These projects provide you with tangible proof of your data proficiency;

 

So, let’s dive in and find out how you can create a powerful data science portfolio with Excel.

 

Here are, ‘5 Excel Project Ideas’ that can be used by both novice and experienced data science professionals to construct a data science portfolio.

 

#1 Data Cleaning and Validation

 

Project Description: Cleaning and validating data are important steps in the data preparation process. In this project, you will work with a dataset containing customer contact information, which is likely to have errors and inconsistencies. Your goal is to use Microsoft Excel to clean and validate this data, resulting in a clean datasheet ready for further analysis.

 

Project Steps:

-       Data Import: Begin by importing the dataset into Excel. You can do this by opening the Excel file or using the “Get Data” feature (Power Query) if the data is stored in an external file or database.

-       Data Assessment: Look for common problems like missing values, duplicates, and inconsistent formatting.

-       Handling Missing Values: Identify columns with missing values (e.g., empty cells or placeholders like "N/A"). Decide how to handle missing data: delete rows, fill in missing values, or leave them as-is based on the context.

-       Removing Duplicates: Identify and remove duplicate rows if they exist in the dataset. Excel provides a “remove duplicates” feature in the Data tab for this purpose.

-       Data Standardization: Standardize the format of data where necessary. (For example, ensure that all phone numbers follow the same format, postal codes are in a consistent format, and dates are in a uniform style.)

-       Data Validation: Excel allows you to create custom data validation rules for specific columns. (For instance, you can set rules for valid email addresses, phone numbers, or ZIP codes.)

-       Error Handling: Create a new column to flag or record errors and issues found during the cleaning process. This column can be used to document what changes were made to the data.

-       Documentation: Document the cleaning process thoroughly. This includes recording the changes made, reasons for those changes, and any data quality issues encountered.

 

Example Project: Cleaning and Validating Customer Contact Information

 

Suppose you have a dataset containing customer contact information like this:

 

 

ID No.

First Name

Last Name

Email

Phone

Zip Code

1

John

Smith

john@eg.com

123-456789

1234

2

Jane

Doe

jane@eg.com

N/A

54321

3

Bob

Johnson

bobj@eg.com

987654798

5432-6

4

Alice

Brown

alice@eg.com

 

98765-4

5

 

 

 

 

 

 

In this example, you would perform the following cleaning and validation tasks:

 

-       Replace “N/A” in the email address column with actual missing values.

-       Remove duplicate rows if necessary.

-       Standardize phone numbers to a consistent format.

-       Apply data validation rules to ensure valid email address and ZIP codes.

-       Create an error column to flag rows with missing values.

-       Document all changes made during the cleaning process.

 

Once you've completed these steps, you will have a clean and validated dataset ready for further analysis, ensuring the accuracy and reliability of your data for any data science project.

 

#2 Inventory Management System

 

Project Description: An inventory management system (IMS) helps businesses manage inventory, track inventory levels, and automate the ordering process. For this project, we’ll build an Excel-based inventory management system for a retail store.

 

Project Steps:

 

-       Product Database: Maintain a list of products with details such as product name, SKU, category, cost price, selling price, and current stock level.

-       Stock Trading: Automatically update stock levels when new products are added or sales are made.

-       Purchase Orders: Create purchase orders for restocking products when stock levels are low.

-       Sales Records: Record sales transactions, including date, product sold, quantity, and customer details.

-       Reporting: Generate reports on current stock levels, sales history, and purchase orders.

 

Example Project:

 

- Start by opening a new Excel spreadsheet. Create a table for example,

 

 

Item ID

Item Name

Description

Qty. in Stock

Price per unit

Total value

1001

Laptop

Dell XPS 13

10

$1000

=$D2*E2

1002

Smartphone

iPhone 12

20

$1200

=$D3*E3

1003

Monitor

LG 27-inch

15

$300

=$D4*E4

1004

Keyboard

Logitech K7

30

$50

=$D5*E5

1005

Mouse

Logitech MX

25

$70

=$D6*E6

 

*In “Total Value” column, you can use the formula ‘=$D2*E2’ for the first row and then drag it down to apply the formula to all rows.

 

- Create a Simple Dashboard to display key information from your inventory. (For instance, you might want to show the total value of your inventory and number of items in stock: Create a new sheet in Excel called "Dashboard.” Use Excel functions like SUM and COUNT to calculate the total value and total number of items in stock based on the data in your inventory sheet.)

 

- Test your inventory management system by adding, editing, and deleting items. Ensure that the formulas and data validation rules work as expected. As your inventory changes, remember to update your Excel sheet accordingly.

 

This is a basic example to get you started with an Inventory Management System in Excel. Depending on your needs, you can add more features, such as automated alerts for low stock levels or integration with barcode scanners for easier data input.

 

 

#3 Data Visualization Dashboard

 

Creating a basic data visualization dashboard in Microsoft Excel can be accomplished in a few simple steps.

 

Here is a short example of how to create a dashboard to visualize monthly sales using Excel:

 

- Organize your data in a structured format with columns for the month and sales figures like,

 

Month

Sales

January

1000

February

1200

Marc

1500

-

-

 

- Create a Pivot Table for your dashboard by,

1.    Selecting your data range (including headers)

2.    Insert “PivotTable” from the “Insert” tab

3.    In the PivotTable dialog box, ensure your data range is correctly selected and choose where to place the PivotTable (e.g., a new worksheet).

4.    In the PivotTable Field List on the right, drag "Month" to the Rows area and "Sales" to the Values area.

5.    Ensure that the "Values" field is set to summarize as "Sum."

 

You now have a PivotTable showing monthly sales totals.

 

- Create charts based on the PivotTable you have created by inserting the chart type you prefer from the “Insert Tab” in Excel, and further customizing the chart by adding titles, labels, and other formatting options to make it visually appealing.

 

- Organize your dashboard layout by arranging charts and slicers in a structured manner to make sure your dashboard is clear, concise and easy to understand.

 

You have created a basic Excel data visualization dashboard by following these steps. Users can use slicers to view monthly sales data. You can add more charts, use different chart types or add advanced features as required for your project.

 

 

#4 Advanced Statistical Analysis

 

Dive deeper into Excel’s statistical capabilities. Explore advanced techniques such as regression analysis, ANOVA, or chi-square tests.

 

Example Project: Regression Analysis of Housing Prices

 

- Gather and organize a dataset with the information on housing prices, such as square footage, number of bedrooms, bathrooms, location and sale prices.

- Use the built-in Regression tool in Excel’s Data Analysis add-in to perform a multiple linear regression analysis, specify the dependent variable (sale price) and independent variables (eg.,bedrooms, bathrooms, sq.footage)

 

- Examine the regression output generated by Excel, interpret the coefficients, check the p-values to determine the statistical importance of each variable’s contribution.

 

- Create scatter plots to visualize the relationship between the independent variable and sale price, generate a regression plot with regression line, use excel’s charting tools to create additional visualizations ( eg., residual plots, observed value plots)

 

- Draw conclusions and assess the factors that affect the housing prices based on the analysis and visualizations, then present your findings in a clear and concise manner in a report or presentation manner.

 

This type of analysis can be used in a variety of industries, such as real estate, business, and social science, to gain insight and forecast results based on multiple factors.

 

 

#5 Text Analysis and Sentiment Analysis

 

In this project, perform text analysis on a collection of documents on social media data. Use Excel’s text functions to extract insights and perform sentiment analysis.

 

Example Project: Twitter Sentiment Analysis of a Movie Release

 

- Create tweets related to a recent movie release,

- Analyze sentiment using Excel functions and VBA macros,

- Visualize sentiment trends over time.

 

 

In Conclusion,  as a data scientist, Excel is an invaluable resource. These project ideas can assist you in constructing a comprehensive portfolio that demonstrates your proficiency in data manipulation, analyzing, and visualizing data. As you progress through these projects, it is important to document your progress, provide clear clarifications, and effectively present your findings. Having a well-structured portfolio can be beneficial in conveying your expertise in data science to prospective employers or colleagues.