Friday, 29 March 2013

3DPlots

Assignment 1: 

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
Create 3 dimensional plot of the same.


Solution:

Step 1:
Create dataset of 50 items with mean =22 and standard deviation =5

data<-rnorm(50,mean=22,sd=5).

Step2 :

Find out three random sample of equal length to create 3 vectors x,y,z,


> x<-sample(data,15)
> y<-sample(data,15)
> z<-sample(data,15)

Step3:  Bind the 3 vectors together.


> bindedData<-cbind(x,y,z)
> bindedData




3D Plots

a) Normal 3d Plot: 

plot3d(bindedData[,1:3])



b) With labelled Axis

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000))


c) With Spheres

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="s")




d)

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="l")







Assignment 2:

Create 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) 
3. Color code and draw the graph 
4. Smooth and best fit line for the curve 

Solution :

Step1:
Create 2 random variables x,y using rnorm
Add a 3rd variable by sampling data and using the factor as shown -:



> x <- rnorm(1000, mean= 40 , sd=10)
>   y <- rnorm(1000, mean= 30, sd=10)
>  z1 <- sample(letters, 5)
>  z2 <- sample(z1, 1000, replace=TRUE)
>  z <- as.factor(z2)




Plots:


a)  x & y



b) qplot(x,z)


c)Semi transparent qplot between x and z with alpha


qplot(x,z , alpha=I(4/10))


d) Colored Plot

qplot(x,y , color=z)


e) Logarithmic Colored Plot

qplot(log(x),log(y) , color=z)



f)smooth curve and best fit line using geom

 qplot(x,y,geom=c("path","smooth"))



qplot(x,y,geom=c("point","smooth"))


g) qplot(x,y,geom=c("boxplot","jitter"))



Saturday, 23 March 2013

Data Visualization

Facebook Page Statistics Tracking Tools : Infographics

The main goal of creating a Facebook page or any other social networking profile is to develop a community through interaction with the public. As  measuring the statistics of things happening on Facebook page is very important for an individual or business especially for marketing. Although Facebook itself, provide some details to page admin about the number of likes, how many people talking about it, number of likes etc sometimes it is very important to get the deeper insights especially from statistical point of view.There are various free tools available to get deeper insights of  Facebook Statistics. Few of them are


1. Visual.ly


Features:  This tool fetches data from last one month of activities happening on this page. It gives categorical distribution of users posting on the page,liking the page. etc. The categories of distribution are demographic, countries they are living in etc. Also it tells how many  impression & deeper impressions does the page has. It gives very clear statistics over the last one month..

Limitation :Its limitation is that it  gives the statistics of data only for the one month.

2.Quintly:


Features : It tells the statistics about each post and also gives the types of post. Also it gives Statistical view of  Fan growth rate & interaction rate.It stresses more on interactions & give statistical picture of every aspect.

Limitations : Free version show analysis of data for only one month.. Advanced & Advanced plus version show analysis of data for last 3 months & unlimited respectively. 



3.Wildfire App:

Features: It gives an option to compare different pages on the basis of number of likes & number of check ins.
It helps in tracking the competitiors page & help business making a marketing strategy accordingly.It also works for twitter profiles also.

Limitations:  Information given by this tool is restricted and mainly used for comparison.

                                                        
                                                                      Comparison on basis of Likes

                                       
                                                              Comparison on basis of Checkins

Friday, 15 March 2013

Session 8 : Panel Data Analysis

Assignment 1:

Do the panel data analysis of "Produc" data in package "plm"

Solution:

Produc Data:


state  :  the state
year  :  the year
pcap :  private capital stock
hwy  :  highway and streets
water: water and sewer facilities
util    :  other public buildings and structures
pc    :   public capital
gsp  :   gross state products
emp :   labor input measured by the employement in non–agricultural payrolls
unemp : state unemployment rate

Commands:

> data("Produc",package="plm")
> head(Produc)

Pooled Effect Model:

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)


Fixed Effects Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)



Random Effects Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)



To determine which model is best:



Test 1 : Pooled vs Fixed:

Ho: Null Hypothesis: the individual index and time based params are all zero.

H1: Alternate Hypothesis: at least one of the index and time based params is non zero.

pFtest(fixed,pool)

        F test for individual effects

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects.

As p-value is too small, null hypothesis is rejected.  Therefore Fixed Effect Model is better than Pooled Model.



 Test 2 : Pooled vs Random:

Command:

>plmtest(pool)

  Lagrange Multiplier Test - (Honda)

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects 

As p-value is too small, null hypothesis is rejected.  Therefore Random Effect Model is better than Pooled Model.



 Test 3 : Fixed vs Random:


Ho: Null Hypothesis: Individual effects are not correlated with any regressor. : Random Effect Model

H1: Alternate Hypothesis:  Individual effects are  correlated. : Fixed Effect Model

Command:

>phtest(random,fixed)

  Hausman Test

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent 

As p-value is too low, null Hypothesis is rejected. Therefore, Fixed Model is better than Random Model



So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.




Wednesday, 13 February 2013

Session 6


Assignment 1 - 

Download data for NIFTY index from 1st Jan , 2012 to 31st Jan 2013.. 
Calculate the log of returns data and find out the historical volatility.

Commands Used:

data<-read.csv(file.choose() , header=T)
closePrice<-data$Close
closePrice.ts<-ts(closePrice , frequenxy=252)
varLag<- lag(closePrice.ts , k=-1)
logClosePrice<- log(closePrice.ts , base=exp(1)) - log(varLag , base=exp(1))
LogReturns<-logClosePrice/log(varLag , base=exp(1)) 


To Calculate Historical Volatility:

> sqrt<-252^0.5
> historicalVol<-sd(LogReturns)*sqrt
> historicalVol
[1] 0.01719952

Assignment 2 :


To create an acf plot for the log returns data calculated previously. Also do and adf test and interpret the result

acf(LogReturns)

Grahical Interpretation
- As all the co-relations plots(vertical lines) lie inside confidence interval for the hypothesis (95% in default case)represented by two blue dotted lines , we can interpret that the returns data is "Stationary" in nature. This is visual inspection method for determining stationarity.




ADF Test


Command Used:
adf.test(LogReturns)

Output
     Augmented Dickey-Fuller Test

data:  LogReturns
Dickey-Fuller = -5.6217, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(LogReturns) : p-value smaller than printed p-value



Interpretation from ADF test
Null Hypothesis -: The returns data is not Stationary
Alternative Hypothesis -: Returns Data is stationary

As from the test results p-value = 0.01 which is less than 0.05 value as stated for 95%confidence interval.
Hence Null Hypothesis is rejected.

Results -: given data is stationary in nature


Thursday, 7 February 2013

Session5




ASSIGNMENT 1:
Find returns of NSE data of greater than 6 months having selected the 10th data point as start and 95th data point as end and plot  that return


z<-read.csv(file.choose(),header=T)
> close<-z$Close[10:95]
> close.ts<-ts(close,deltat=1/252)
> close.ts
> summary(close.ts)
> z.diff<-diff(close.ts)
> z.diff
> returns<-cbind(close.ts,z.diff,lag(close.ts,k=-1))
> returns
> returns<-z.diff/lag(close.ts,k=-1)
> returns
> plot(returns)




ASSIGNMENT 2:
1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT Analysis for the same.


Commands : 

> z<-read.csv(file.choose(),header=T)
> z.data<-z[1:700,1:9]
> z.data$ed<-factor(z.data$ed)
>logit.est<glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")
> summary(logit.est)
> confint.default(logit.est)
>logit.eg2<with(z[701:850,1:8],data.frame(age=age,employ=employ,address=address,income=income,debtinc=debtinc,creddebt=creddebt,othdebt=othdebt,ed=factor(1:3))
> logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")
> head(logit.eg2)





Wednesday, 23 January 2013

Session 3


Assignment 1(a):

 Given is data set Mileage & Groove. Groove Impacts Mileage.Fit Linear Model and comment on the applicability of Linear Model


> data<-read.csv(file.choose(),header=T)
> data
  mileage groove
1       0 394.33
2       4 329.50
3       8 291.00
4      12 255.17
5      16 229.33
6      20 204.83
7      24 179.00
8      28 163.83
9      32 150.33
> reg1<-lm(data$mileage~data$groove)
> reg1

Call:
lm(formula = data$mileage ~ data$groove)

Coefficients:
(Intercept)  data$groove 
    47.9446      -0.1308 

> res<-resid(reg1)
> res
         1          2          3          4          5          6          7          8          9
 3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038  1.4912269  3.7248633
> plot(data$groove,res)
 Residual plot is parabolic,hence we cannot do linear regression.






Assignment 1(b):

 Given is data set Alpha & Pluto.Fit Linear Model and comment on the applicability of Linear Model

ab<-read.csv(file.choose(),header=T)
ab
   alpha pluto
1  0.150    20
2  0.004     0
3  0.069    10
4  0.030     5
5  0.011     0
6  0.004     0
7  0.041     5
8  0.109    20
9  0.068    10
10 0.009     0
11 0.009     0
12 0.048    10
13 0.006     0
14 0.083    20
15 0.037     5
16 0.039     5
17 0.132    20
18 0.004     0
19 0.006     0
20 0.059    10
21 0.051    10
22 0.002     0
23 0.049     5
> y<-ab$pluto
> x<-ab$alpha
> x
 [1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041 0.109 0.068 0.009 0.009 0.048 0.006 0.083 0.037 0.039 0.132 0.004 0.006 0.059 0.051 0.002 0.049
> y
 [1] 20  0 10  5  0  0  5 20 10  0  0 10  0 20  5  5 20  0  0 10 10  0  5
> reg<-lm(y~x)
> res<-resid(reg)
> res
         1          2          3          4          5          6          7          8          9         10         11         12         13         14 
-4.2173758 -0.0643108 -0.8173877  0.6344584 -1.2223345 -0.0643108 -1.1852930  2.5653342 -0.6519557 -0.8914706 -0.8914706  2.6566833 -0.3951747  6.8665650 
        15         16         17         18         19         20         21         22         23 
-0.5235652 -0.8544291 -1.2396007 -0.0643108 -0.3951747  0.8369318  2.1603874  0.2665531 -2.5087486 

>plot(x,res)

>qqnorm(res)

>qqline(res)



Assignment 3:Justify Null Hypothesis Using ANOVA

 

 As P=0.687 and it is >5%, we cannot reject Null Hypothesis