AllBestEssays.com - All Best Essays, Term Papers and Book Report
Search

Airfare Prediction Model

Essay by   •  July 20, 2016  •  Coursework  •  1,407 Words (6 Pages)  •  2,095 Views

Essay Preview: Airfare Prediction Model

1 rating(s)
Report this essay
Page 1 of 6
  1. Marketing to Frequent Fliers.

The file EastWestAirlinesCluster.xls (available on the textbook website http://dataminingbook.com/) contains information on 4000 passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.

a) Apply hierarchical clustering with Euclidean distance and Ward’s method. Make sure to

standardize the data first. How many clusters appear?

b) What would happen if the data were not standardized?

c) Compare the cluster centroids to characterize the different clusters and try to give each cluster a

label.

d) To check the stability of the clusters, remove a random 5% of the data (by taking a random

sample of 95% of the records), and repeat the analysis. Does the same picture emerge?

e) Use k-means clustering with the number of clusters that you found above in Part (a). Does the

same picture emerge ? If not, how does it contrast or validate the finding in Part c above?

f) Which cluster(s) would you target for offers, and what type of offers would you target to

customers in that cluster? Include proper reasoning in support of your choice of cluster(s) and the

corresponding offer(s).

2. Wine Data:

Step 1: Download the Wine data from the UCI machine learning repository

(http://archive.ics.uci.edu/ml/datasets/Wine)

Step 2: Do a Principal Components Analysis (PCA) on the data. Please include (copy-paste) the

relevant software outputs in your submission while answering the following questions.

a. Enumerate the insights you gathered during your PCA exercise. (Please do not clutter your

report with too MANY insignificant insights as it will dilute the value of your other significant

findings)

b. What are the social and business values of those insights, and how the value of those insights

can be harnessed?

Step 3: Do a cluster analysis using (i) all chemical measurements (ii) using two most significant PC

scores. Please include (copy-paste) the relevant software outputs in your submission while answering

the following questions.

c. Any more insights you come across during the clustering exercise?

d. Are there clearly separable clusters of wines? How many clusters did you go with? How the

clusters obtained in part (i) are different from or similar to clusters obtained in part (ii),

qualitatively?

e. Could you suggest a subset of the chemical measurements that can separate wines more

distinctly? How did you go about choosing that subset? How do the rest of the measurements

that were not included while clustering, vary across those clusters?


Question 1.

  1. Apply hierarchical clustering with Euclidean distance and Ward’s method. Make sure to standardize the data first. How many clusters appear?

Solution.   

                      Number of clusters: 3

Cluster I

13

16

2

17

10

14

15

18

5

20

19

Cluster II

3

12

21

1

8

9

4

16

22

Cluster III

1

23

6

11

24

25

30

27

29

28

[pic 1]

Dendrogram remains constant when cluster was 3 and when no restriction was given to it.

  1. What would happen if the data were not standardized?

Solution: Balance, Bonus miles and Days since enrolled will take higher weights hence the result will be skewed towards those variables.

Ex- Predicted Clusters will be as follows:

[pic 2]

Balance, Bonus miles and Days since enrolled will take higher weights hence the result will be skewed towards those variables.

C) Compare the cluster centroids to characterize the different clusters and try to give each cluster a label.

Solution.

Cluster 1

Less frequent Fliers

Cluster 2

Frequent fliers

Clusters 3

Intermittent Fliers: Between cluster 1 and 3 hence Customers group for promotions

[pic 3]

  1. To check the stability of the clusters, remove a random 5% of the data (by taking a random sample of 95% of the records), and repeat the analysis. Does the same picture emerge?

Solution.

Part A

Structure of the Dendrogram remained same but there was a change noticed in the formation of clusters.

[pic 4]

Part b

Total

3999

less 5%

3790

Removed

209

Count Where Cluster1 = cluster 2

Count Where Sub cluster 1= Sub cluster 2

1368

274

3999

0

34%

ID which remained constant after removing 5 % of data

...

...

Download as:   txt (9.4 Kb)   pdf (761.3 Kb)   docx (459.7 Kb)  
Continue for 5 more pages »
Only available on AllBestEssays.com