🌟 Hit star button to save this repo in your profile

NumPy

NumPy is a fundamental library in Python for numerical and array operations. While it's often used in conjunction with pandas for EDA, it can also be directly applied for specific tasks within EDA. Here are some common NumPy syntax and functions suitable for EDA:

Basic NumPy Operations:
- Import NumPy:
```
import numpy as np
```
Creating NumPy Arrays:
- Create a NumPy array from a Python list:
```
numpy_array = np.array([1, 2, 3, 4, 5])
```
Array Shape and Dimensions:
- Get the shape and dimensions of a NumPy array:
```
numpy_array.shape
numpy_array.ndim
```

Array Indexing and Slicing:

Access elements and slices of a NumPy array:

numpy_array[2]  # Access element at index 2
numpy_array[1:4]  # Slice from index 1 to 3

Array Operations:

Perform element-wise operations on arrays:

numpy_array + 2  # Add 2 to each element
numpy_array * 3  # Multiply each element by 3

Array Aggregation:

Calculate statistics on arrays:

np.mean(numpy_array)  # Mean
np.median(numpy_array)  # Median
np.std(numpy_array)  # Standard deviation

Array Concatenation and Stacking:

Combine multiple arrays:

np.concatenate([array1, array2])
np.vstack([array1, array2])  # Vertically stack arrays
np.hstack([array1, array2])  # Horizontally stack arrays

Array Filtering:
- Filter elements based on a condition:
```
numpy_array[numpy_array > 3]
```

Random Number Generation:

Generate random numbers or arrays:

np.random.rand(3, 3)  # Generate a 3x3 array of random values between 0 and 1

Reshaping Arrays:
- Change the shape of an array:
```
numpy_array.reshape((2, 3))
```

Missing Data Handling:

Handle missing values in arrays:

numpy_array[numpy.isnan(numpy_array)]  # Detect missing values
numpy_array[~numpy.isnan(numpy_array)]  # Remove missing values

Statistical Tests:

Conduct statistical tests on arrays for hypothesis testing:

from scipy import stats
t_stat, p_value = stats.ttest_ind(array1, array2)

Example

   ```python
  import numpy as np
  import urllib.request
  import csv

  # URL of the Titanic dataset
  url = 'https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv'

  # Function to load the dataset
  def load_titanic_data(url):
  response = urllib.request.urlopen(url)
  lines = [l.decode('utf-8') for l in response.readlines()]
  data = np.array(list(csv.reader(lines)))
  return data

  # Load the Titanic dataset
  titanic_data = load_titanic_data(url)

  # Display the first 5 rows
  print(titanic_data[:5])
  ```

NumPy is a powerful library for numerical operations and array manipulation, making it a valuable tool for various tasks in EDA, especially when dealing with numerical data or conducting statistical tests. It is often used alongside pandas for data analysis and manipulation in EDA workflows.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy.md

numpy.md

NumPy

Example

Contribution 🛠️

Files

numpy.md

Latest commit

History

numpy.md

File metadata and controls

NumPy

Example

Contribution 🛠️