How To Handle nan In Numpy
In this article, I will show you how to handle nans in Numpy.
Now lets import the necessary packages.
import pandas as pd
import numpy as np
Lets create some dummy data for this example.
a=np.array([1,np.nan,np.nan,np.nan,3,4,5,6,7,8,9])
a
type(a)
Numpy calculate mean with nanvalues
Lets check the mean first.
a.mean()
We got nan which is not correct. We need to exclude the nans before calculating the mean. Numpy has nanmean which does the mean for only non nan values.
a.nanmean()
You would run into following error.
AttributeError: 'numpy.ndarray' object has no attribute 'nanmean'. Correct way is to pass numpy array to nanmean function.
np.nanmean(a)
How to handle product of two vectors having nan values in Numpy
Lets create another numpy vector of same dimensions as a.
b=np.array([11,np.nan,np.nan,np.nan,12,13,14,15,16,17,18])
Lets do product of two vectors a and b.
c = np.outer(a,b)
c.shape
Covriance between two vectors with nan values in Numpy
Lets see what is the covariance between array a and b.
np.cov([a,b])
To resolve the above situation we will have to use numpy masks. Masks are used to mask the values which need not to be used in computation.
Lets first import the package numpy masks.
import numpy.ma as ma
To masks nan , we can use ma.masked_invalid. Lets apply this method on array a and b.
ma.masked_invalid(a)
ma.masked_invalid(b)
As we can see, all nan values are masked as False.
Ok we are good to go now. To calculate variance, numpy mask has variance function as shown below.
ma.cov(ma.masked_invalid(np.outer(a,b)),rowvar=False)
Wrap Up!
Thats it for now. I would add more examples to this post in next few days.
Related Topics:
Related Notebooks
- How To Solve Error Numpy Has No Attribute Float In Python
- How To Write DataFrame To CSV In R
- How To Plot Histogram In R
- How To Use Grep In R
- How To Iterate Over Rows In A Dataframe In Pandas
- How to Plot a Histogram in Python
- How to Generate Random Numbers in Python
- How To Run Logistic Regression In R
- How to Analyze the CSV data in Pandas