# Statistical Functions in Python

In this tutorial, we would be covering some useful statistical functions which can be applied to pandas and series objects.

Statistical functions are of great help in analyzing the data and making meaningful conclusions. In this tutorial, we would be covering some useful statistical functions which can be applied to pandas and series objects

The following statistical functions would be covered in the tutorial:

• pct_change()
• cov ()
• corr ()
• corrwith ()

# pct_change()

The method pct_change () can be applied to a panda’s series and Data Frame to calculate the percent change over a specific number of periods

## Calculating pct_change() without specifying the number of periods

Code:

```import pandas as pd
import numpy as np

series = pd.Series(np.random.randn(10))

series.pct_change()```

Output:

```0         NaN

1   -0.881470

2   -5.025007

3    0.728078

4   -0.577371

5    1.173420

6   -1.578389

7   -3.520208

8   -1.927874

9   -1.600583

dtype: float64```

## Calculating pct_change() by specifying the number of periods

Code:

```df = pd.DataFrame(np.random.randn(10,2))

df.pct_change(periods = 2)```

Output:

0 1
0 NaN NaN
1 NaN NaN
2 -0.095052 -1.399525
3 0.073909 -7.491512
4 -0.882174 -1.150202

# Covariance: cov()

The method cov () is used to calculate the covariance in a series and Data Frame. While calculating the covariance in a Data Frame, pairwise covariance is calculated amongst the series in a Data Frame.

While calculating the covariance in series and Data Frame missing values are excluded if any

## Calculating covariance between two series

Code:

```series1 = pd.Series(np.random.randn(200))
series2 = pd.Series(np.random.randn(200))

series1.cov(series2)```

Output:

`-0.14817157321848334`

## Calculating covariance of a Data Frame

Code:

```df = pd.DataFrame(np.random.randn(4,5),columns = ["a","b","c","d","e"])
df.cov()```

Output:

a b c d e
a 2.095402 0.191502 0.049185 0.090229 -1.052856
b 0.191502 0.628889 0.377184 -0.507893 0.404180
c 0.049185 0.377184 0.336220 -0.077814 0.571139
d 0.090229 -0.507893 -0.077814 0.950198 0.164894
e -1.052856 0.404180 0.571139 0.164894 1.722546

# Correlation: corr ()

Correlation is computed using the corr () method, the corr () method has a method parameter that has the following method name option's available:

1. Pearson(default) which is the Standard correlation coefficient
2. Kendall Tau correlation coefficient
3. Spearman rank correlation coefficient

## Calculating the correlation between series in a Data Frame using the default Pearson

Code:

```df = pd.DataFrame(np.random.randn(200,4), columns = ["a","b","c","d"])
df["a"]. corr(df["b"])```

Output:

`0.08425780768544051`

## Calculating the correlation between series in a Data Frame using the method spearman

Code:

`df["a"]. corr(df["b"],method = "spearman")`

Output:

`0.053819845496137414`

## Calculating the pairwise correlation between Data Frame columns

Code:

`df.corr()`

Output:

a b c d
a 1.000000 0.084258 -0.074284 0.054453
b 0.084258 1.000000 0.022995 0.029727
c -0.074284 0.022995 1.000000 -0.028279
d 0.054453 0.029727 -0.028279 1.000000

# corrwith ()

Corrwith () method is applied to a Data Frame  to calculate the correlation between the same - labeled Series in different Data Frame objects

Code:

```index = ["a","b","c","d","e"]

columns = ["one","two","three","four"]

df1 = pd.DataFrame(np.random.randn(5,4), index = index, columns = columns )

df2 = pd.DataFrame(np.random.randn(4,4), index = index[:4], columns = columns)

df1.corrwith(df2)```

Output:

```one      0.277569

two     -0.052151

three   -0.754392

four     0.526614

dtype: float64```

Code:

`df2.corrwith(df1, axis=1)`

Output:

```a    0.346955

b   -0.707590

c    0.711081

d    0.753457

e         NaN

dtype: float64```

Priya Sengar (Medium, Github) is a Data Scientist with Old Dominion University. Priya is passionate about solving problems in data and converting them into solutions.