🌟 Hit star button to save this repo in your profile
NumPy is a fundamental library in Python for numerical and array operations. While it's often used in conjunction with pandas for EDA, it can also be directly applied for specific tasks within EDA. Here are some common NumPy syntax and functions suitable for EDA:
-
Basic NumPy Operations:
-
Import NumPy:
import numpy as np
-
-
Creating NumPy Arrays:
-
Create a NumPy array from a Python list:
numpy_array = np.array([1, 2, 3, 4, 5])
-
-
Array Shape and Dimensions:
-
Get the shape and dimensions of a NumPy array:
numpy_array.shape numpy_array.ndim
-
-
Array Indexing and Slicing:
-
Access elements and slices of a NumPy array:
numpy_array[2] # Access element at index 2 numpy_array[1:4] # Slice from index 1 to 3
-
-
Array Operations:
-
Perform element-wise operations on arrays:
numpy_array + 2 # Add 2 to each element numpy_array * 3 # Multiply each element by 3
-
-
Array Aggregation:
-
Calculate statistics on arrays:
np.mean(numpy_array) # Mean np.median(numpy_array) # Median np.std(numpy_array) # Standard deviation
-
-
Array Concatenation and Stacking:
-
Combine multiple arrays:
np.concatenate([array1, array2]) np.vstack([array1, array2]) # Vertically stack arrays np.hstack([array1, array2]) # Horizontally stack arrays
-
-
Array Filtering:
-
Filter elements based on a condition:
numpy_array[numpy_array > 3]
-
-
Random Number Generation:
-
Generate random numbers or arrays:
np.random.rand(3, 3) # Generate a 3x3 array of random values between 0 and 1
-
-
Reshaping Arrays:
-
Change the shape of an array:
numpy_array.reshape((2, 3))
-
-
Missing Data Handling:
-
Handle missing values in arrays:
numpy_array[numpy.isnan(numpy_array)] # Detect missing values numpy_array[~numpy.isnan(numpy_array)] # Remove missing values
-
-
Statistical Tests:
-
Conduct statistical tests on arrays for hypothesis testing:
from scipy import stats t_stat, p_value = stats.ttest_ind(array1, array2)
-
```python
import numpy as np
import urllib.request
import csv
# URL of the Titanic dataset
url = 'https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv'
# Function to load the dataset
def load_titanic_data(url):
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
data = np.array(list(csv.reader(lines)))
return data
# Load the Titanic dataset
titanic_data = load_titanic_data(url)
# Display the first 5 rows
print(titanic_data[:5])
```
NumPy is a powerful library for numerical operations and array manipulation, making it a valuable tool for various tasks in EDA, especially when dealing with numerical data or conducting statistical tests. It is often used alongside pandas for data analysis and manipulation in EDA workflows.
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.