Computational Tools

Other topics

Find The Correlation Between Columns

Suppose you have a DataFrame of numerical values, for example:

df = pd.DataFrame(np.random.randn(1000, 3), columns=['a', 'b', 'c'])

Then

>>> df.corr()
    a    b    c
a    1.000000    0.018602    0.038098
b    0.018602    1.000000    -0.014245
c    0.038098    -0.014245    1.000000

will find the Pearson correlation between the columns. Note how the diagonal is 1, as each column is (obviously) fully correlated with itself.

pd.DataFrame.correlation takes an optional method parameter, specifying which algorithm to use. The default is pearson. To use Spearman correlation, for example, use

>>> df.corr(method='spearman')
    a    b    c
a    1.000000    0.007744    0.037209
b    0.007744    1.000000    -0.011823
c    0.037209    -0.011823    1.000000

Contributors

Topic Id: 5620

Example Ids: 19945

This site is not affiliated with any of the contributors.