MBA Assignment: 2013

Friday, 29 March 2013

3DPlots

Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
Create 3 dimensional plot of the same.

Solution:

Step 1:
Create dataset of 50 items with mean =22 and standard deviation =5

data<-rnorm(50,mean=22,sd=5).

Step2 :

Find out three random sample of equal length to create 3 vectors x,y,z,

> x<-sample(data,15)
> y<-sample(data,15)
> z<-sample(data,15)

Step3: Bind the 3 vectors together.

> bindedData<-cbind(x,y,z)
> bindedData

3D Plots

a) Normal 3d Plot:

plot3d(bindedData[,1:3])

b) With labelled Axis

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000))

c) With Spheres

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="s")

d)

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="l")

Assignment 2:

Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Solution :

Step1:
Create 2 random variables x,y using rnorm
Add a 3rd variable by sampling data and using the factor as shown -:

> x <- rnorm(1000, mean= 40 , sd=10)
> y <- rnorm(1000, mean= 30, sd=10)
> z1 <- sample(letters, 5)
> z2 <- sample(z1, 1000, replace=TRUE)
> z <- as.factor(z2)

Plots:

a) x & y

b) qplot(x,z)

c)Semi transparent qplot between x and z with alpha

qplot(x,z , alpha=I(4/10))

d) Colored Plot

qplot(x,y , color=z)

e) Logarithmic Colored Plot

qplot(log(x),log(y) , color=z)

f)smooth curve and best fit line using geom

qplot(x,y,geom=c("path","smooth"))

qplot(x,y,geom=c("point","smooth"))

g) qplot(x,y,geom=c("boxplot","jitter"))

Saturday, 23 March 2013

Data Visualization

Facebook Page Statistics Tracking Tools : Infographics

The main goal of creating a Facebook page or any other social networking profile is to develop a community through interaction with the public. As measuring the statistics of things happening on Facebook page is very important for an individual or business especially for marketing. Although Facebook itself, provide some details to page admin about the number of likes, how many people talking about it, number of likes etc sometimes it is very important to get the deeper insights especially from statistical point of view.There are various free tools available to get deeper insights of Facebook Statistics. Few of them are

1. Visual.ly

Features: This tool fetches data from last one month of activities happening on this page. It gives categorical distribution of users posting on the page,liking the page. etc. The categories of distribution are demographic, countries they are living in etc. Also it tells how many impression & deeper impressions does the page has. It gives very clear statistics over the last one month..

Limitation :Its limitation is that it gives the statistics of data only for the one month.

2.Quintly:

Features : It tells the statistics about each post and also gives the types of post. Also it gives Statistical view of Fan growth rate & interaction rate.It stresses more on interactions & give statistical picture of every aspect.

Limitations : Free version show analysis of data for only one month.. Advanced & Advanced plus version show analysis of data for last 3 months & unlimited respectively.

3.Wildfire App:

Features: It gives an option to compare different pages on the basis of number of likes & number of check ins.

It helps in tracking the competitiors page & help business making a marketing strategy accordingly.It also works for twitter profiles also.

Limitations: Information given by this tool is restricted and mainly used for comparison.

Comparison on basis of Likes

Comparison on basis of Checkins

Friday, 15 March 2013

Session 8 : Panel Data Analysis

Assignment 1:

Do the panel data analysis of "Produc" data in package "plm"

Solution:

Produc Data:

state : the state
year : the year
pcap : private capital stock
hwy : highway and streets
water: water and sewer facilities
util : other public buildings and structures
pc : public capital
gsp : gross state products
emp : labor input measured by the employement in non–agricultural payrolls
unemp : state unemployment rate

Commands:

> data("Produc",package="plm")
> head(Produc)

Pooled Effect Model:

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)

Fixed Effects Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)

Random Effects Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)

To determine which model is best:

Test 1 : Pooled vs Fixed:

Ho: Null Hypothesis: the individual index and time based params are all zero.

H1: Alternate Hypothesis: at least one of the index and time based params is non zero.

pFtest(fixed,pool)

F test for individual effects

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

alternative hypothesis: significant effects.

As p-value is too small, null hypothesis is rejected. Therefore Fixed Effect Model is better than Pooled Model.

Test 2 : Pooled vs Random:

Command:

>plmtest(pool)

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16

alternative hypothesis: significant effects

As p-value is too small, null hypothesis is rejected. Therefore Random Effect Model is better than Pooled Model.

Test 3 : Fixed vs Random:

Ho: Null Hypothesis: Individual effects are not correlated with any regressor. : Random Effect Model

H1: Alternate Hypothesis: Individual effects are correlated. : Fixed Effect Model

Command:

>phtest(random,fixed)

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16

alternative hypothesis: one model is inconsistent

As p-value is too low, null Hypothesis is rejected. Therefore, Fixed Model is better than Random Model

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Wednesday, 13 February 2013

Session 6

Assignment 1 -

Download data for NIFTY index from 1st Jan , 2012 to 31st Jan 2013..

Calculate the log of returns data and find out the historical volatility.

Commands Used:

data<-read.csv(file.choose() , header=T)
closePrice<-data$Close
closePrice.ts<-ts(closePrice , frequenxy=252)
varLag<- lag(closePrice.ts , k=-1)
logClosePrice<- log(closePrice.ts , base=exp(1)) - log(varLag , base=exp(1))
LogReturns<-logClosePrice/log(varLag , base=exp(1))

To Calculate Historical Volatility:

> sqrt<-252^0.5

> historicalVol<-sd(LogReturns)*sqrt

> historicalVol

[1] 0.01719952

Assignment 2 :

To create an acf plot for the log returns data calculated previously. Also do and adf test and interpret the result

acf(LogReturns)

Grahical Interpretation
- As all the co-relations plots(vertical lines) lie inside confidence interval for the hypothesis (95% in default case)represented by two blue dotted lines , we can interpret that the returns data is "Stationary" in nature. This is visual inspection method for determining stationarity.

ADF Test

Command Used:
adf.test(LogReturns)

Output
Augmented Dickey-Fuller Test

data: LogReturns
Dickey-Fuller = -5.6217, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(LogReturns) : p-value smaller than printed p-value

Interpretation from ADF test
Null Hypothesis -: The returns data is not Stationary
Alternative Hypothesis -: Returns Data is stationary

As from the test results p-value = 0.01 which is less than 0.05 value as stated for 95%confidence interval.
Hence Null Hypothesis is rejected.

Results -: given data is stationary in nature

Thursday, 7 February 2013

Session5

ASSIGNMENT 1:

Find returns of NSE data of greater than 6 months having selected the 10th data point as start and 95th data point as end and plot that return

z<-read.csv(file.choose(),header=T)
> close<-z$Close[10:95]
> close.ts<-ts(close,deltat=1/252)
> close.ts
> summary(close.ts)
> z.diff<-diff(close.ts)
> z.diff
> returns<-cbind(close.ts,z.diff,lag(close.ts,k=-1))
> returns
> returns<-z.diff/lag(close.ts,k=-1)
> returns
> plot(returns)

ASSIGNMENT 2:

1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT Analysis for the same.

Commands :

> z<-read.csv(file.choose(),header=T)
> z.data<-z[1:700,1:9]
> z.data$ed<-factor(z.data$ed)
>logit.est<glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")
> summary(logit.est)
> confint.default(logit.est)
>logit.eg2<with(z[701:850,1:8],data.frame(age=age,employ=employ,address=address,income=income,debtinc=debtinc,creddebt=creddebt,othdebt=othdebt,ed=factor(1:3))
> logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")
> head(logit.eg2)

Wednesday, 23 January 2013

Session 3

Assignment 1(a):

Given is data set Mileage & Groove. Groove Impacts Mileage.Fit Linear Model and comment on the applicability of Linear Model

> data<-read.csv(file.choose(),header=T)

> data

mileage groove

1 0 394.33

2 4 329.50

3 8 291.00

4 12 255.17

5 16 229.33

6 20 204.83

7 24 179.00

8 28 163.83

9 32 150.33

> reg1<-lm(data$mileage~data$groove)

> reg1

Call:

lm(formula = data$mileage ~ data$groove)

Coefficients:

(Intercept) data$groove

47.9446 -0.1308

> res<-resid(reg1)

> res

1 2 3 4 5 6 7 8 9

3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038 1.4912269 3.7248633

> plot(data$groove,res)

Residual plot is parabolic,hence we cannot do linear regression.

Assignment 1(b):

Given is data set Alpha & Pluto.Fit Linear Model and comment on the applicability of Linear Model

ab<-read.csv(file.choose(),header=T)

alpha pluto

1 0.150 20

2 0.004 0

3 0.069 10

4 0.030 5

5 0.011 0

6 0.004 0

7 0.041 5

8 0.109 20

9 0.068 10

10 0.009 0

11 0.009 0

12 0.048 10

13 0.006 0

14 0.083 20

15 0.037 5

16 0.039 5

17 0.132 20

18 0.004 0

19 0.006 0

20 0.059 10

21 0.051 10

22 0.002 0

23 0.049 5

> y<-ab$pluto

> x<-ab$alpha

> x

[1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041 0.109 0.068 0.009 0.009 0.048 0.006 0.083 0.037 0.039 0.132 0.004 0.006 0.059 0.051 0.002 0.049

> y

[1] 20 0 10 5 0 0 5 20 10 0 0 10 0 20 5 5 20 0 0 10 10 0 5

> reg<-lm(y~x)

> res<-resid(reg)

> res

1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4.2173758 -0.0643108 -0.8173877 0.6344584 -1.2223345 -0.0643108 -1.1852930 2.5653342 -0.6519557 -0.8914706 -0.8914706 2.6566833 -0.3951747 6.8665650

15 16 17 18 19 20 21 22 23

-0.5235652 -0.8544291 -1.2396007 -0.0643108 -0.3951747 0.8369318 2.1603874 0.2665531 -2.5087486

>plot(x,res)

>qqnorm(res)

>qqline(res)

Assignment 3:Justify Null Hypothesis Using ANOVA

As P=0.687 and it is >5%, we cannot reject Null Hypothesis