Find Patterns in Data
In this module, we will learn several important tools in data analytics: identifying correlations, fitting lines and curves through data, and forecasting outcomes through regression.
Please use the comment section below to post your questions and thoughts!
Demand modeling with curve fitting
A company has a new product, and would like to use analytics to estimate its demand curve for better demand forecasting. To do that, the company selected 25 small geographical areas, and experimented with different prices, some higher and others lowers. The resulting demand in all markets is in the Excel file below. Estimate the demand curve for this product, and forecasts its demand at a price point of $70.
Demand modeling: Regression with LINEST
Fit a linear regression model with the same price-demand data used in the previous example. Use the model to forecast demand at the price of 70.
Forecasting with exponential smoothing
An online business has generate two years of weekly demand data for one product. Each week it employs a clerk to manually generate sales forecasts based on his experience. Use exponential smoothing to generate another set of forecasts for year 2, and compare those to their current method measured by forecast errors.
A Short Note on Exponential Smoothing
Suppose we record monthly sales data for a product starting January of this year. Let’s call the first data point x0. We also want to generate forecasts for each month onward. To do that, we use the formula:
s1= a.x0 +(1-a)s0,
s2= a.x1 +(1-a)s1,
s3= a.x2 +(1-a)s2,
The parameter a in the equations is a weight constant between 0 and 1, which we freely choose. It is also called the smoothing factor. So the idea is that each forecast is a linear combination of (1) last period’s actual and (2) last period’s forecast. As a goes larger, the forecasts put more weights on previous actual data.
A Case Study: Sex Discrimination
Some years ago, women faculty members in junior colleges in Massachusetts sued state junior colleges for gender discrimination in salaries. The case ended up lasting many years.
Before we hear the verdict, it will be useful to look at some of the data on which this case is based. Here is a subset of the type of data used to analyze the validity of claims of gender discrimination.
The data set has 5 variables for 81 junior college faculty at one college: rank (instructor, assistant professor, associate professor, full professor), gender, age, number of years at the college and salary.
Use analytics tools to analyze the extent to which this data set suggests sex discrimination in salaries.