Friday, 29 March 2013

3DPlots

Assignment 1: 

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
Create 3 dimensional plot of the same.


Solution:

Step 1:
Create dataset of 50 items with mean =22 and standard deviation =5

data<-rnorm(50,mean=22,sd=5).

Step2 :

Find out three random sample of equal length to create 3 vectors x,y,z,


> x<-sample(data,15)
> y<-sample(data,15)
> z<-sample(data,15)

Step3:  Bind the 3 vectors together.


> bindedData<-cbind(x,y,z)
> bindedData




3D Plots

a) Normal 3d Plot: 

plot3d(bindedData[,1:3])



b) With labelled Axis

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000))


c) With Spheres

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="s")




d)

plot3d(bindedData[,1:3],xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type="l")







Assignment 2:

Create 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) 
3. Color code and draw the graph 
4. Smooth and best fit line for the curve 

Solution :

Step1:
Create 2 random variables x,y using rnorm
Add a 3rd variable by sampling data and using the factor as shown -:



> x <- rnorm(1000, mean= 40 , sd=10)
>   y <- rnorm(1000, mean= 30, sd=10)
>  z1 <- sample(letters, 5)
>  z2 <- sample(z1, 1000, replace=TRUE)
>  z <- as.factor(z2)




Plots:


a)  x & y



b) qplot(x,z)


c)Semi transparent qplot between x and z with alpha


qplot(x,z , alpha=I(4/10))


d) Colored Plot

qplot(x,y , color=z)


e) Logarithmic Colored Plot

qplot(log(x),log(y) , color=z)



f)smooth curve and best fit line using geom

 qplot(x,y,geom=c("path","smooth"))



qplot(x,y,geom=c("point","smooth"))


g) qplot(x,y,geom=c("boxplot","jitter"))



Saturday, 23 March 2013

Data Visualization

Facebook Page Statistics Tracking Tools : Infographics

The main goal of creating a Facebook page or any other social networking profile is to develop a community through interaction with the public. As  measuring the statistics of things happening on Facebook page is very important for an individual or business especially for marketing. Although Facebook itself, provide some details to page admin about the number of likes, how many people talking about it, number of likes etc sometimes it is very important to get the deeper insights especially from statistical point of view.There are various free tools available to get deeper insights of  Facebook Statistics. Few of them are


1. Visual.ly


Features:  This tool fetches data from last one month of activities happening on this page. It gives categorical distribution of users posting on the page,liking the page. etc. The categories of distribution are demographic, countries they are living in etc. Also it tells how many  impression & deeper impressions does the page has. It gives very clear statistics over the last one month..

Limitation :Its limitation is that it  gives the statistics of data only for the one month.

2.Quintly:


Features : It tells the statistics about each post and also gives the types of post. Also it gives Statistical view of  Fan growth rate & interaction rate.It stresses more on interactions & give statistical picture of every aspect.

Limitations : Free version show analysis of data for only one month.. Advanced & Advanced plus version show analysis of data for last 3 months & unlimited respectively. 



3.Wildfire App:

Features: It gives an option to compare different pages on the basis of number of likes & number of check ins.
It helps in tracking the competitiors page & help business making a marketing strategy accordingly.It also works for twitter profiles also.

Limitations:  Information given by this tool is restricted and mainly used for comparison.

                                                        
                                                                      Comparison on basis of Likes

                                       
                                                              Comparison on basis of Checkins

Friday, 15 March 2013

Session 8 : Panel Data Analysis

Assignment 1:

Do the panel data analysis of "Produc" data in package "plm"

Solution:

Produc Data:


state  :  the state
year  :  the year
pcap :  private capital stock
hwy  :  highway and streets
water: water and sewer facilities
util    :  other public buildings and structures
pc    :   public capital
gsp  :   gross state products
emp :   labor input measured by the employement in non–agricultural payrolls
unemp : state unemployment rate

Commands:

> data("Produc",package="plm")
> head(Produc)

Pooled Effect Model:

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)


Fixed Effects Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)



Random Effects Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)



To determine which model is best:



Test 1 : Pooled vs Fixed:

Ho: Null Hypothesis: the individual index and time based params are all zero.

H1: Alternate Hypothesis: at least one of the index and time based params is non zero.

pFtest(fixed,pool)

        F test for individual effects

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects.

As p-value is too small, null hypothesis is rejected.  Therefore Fixed Effect Model is better than Pooled Model.



 Test 2 : Pooled vs Random:

Command:

>plmtest(pool)

  Lagrange Multiplier Test - (Honda)

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects 

As p-value is too small, null hypothesis is rejected.  Therefore Random Effect Model is better than Pooled Model.



 Test 3 : Fixed vs Random:


Ho: Null Hypothesis: Individual effects are not correlated with any regressor. : Random Effect Model

H1: Alternate Hypothesis:  Individual effects are  correlated. : Fixed Effect Model

Command:

>phtest(random,fixed)

  Hausman Test

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent 

As p-value is too low, null Hypothesis is rejected. Therefore, Fixed Model is better than Random Model



So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.