Skip to content

Latest commit

 

History

History
143 lines (102 loc) · 4.88 KB

numpy.md

File metadata and controls

143 lines (102 loc) · 4.88 KB

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

🌟 Hit star button to save this repo in your profile

NumPy

NumPy is a fundamental library in Python for numerical and array operations. While it's often used in conjunction with pandas for EDA, it can also be directly applied for specific tasks within EDA. Here are some common NumPy syntax and functions suitable for EDA:

  1. Basic NumPy Operations:

    • Import NumPy:

      import numpy as np
  2. Creating NumPy Arrays:

    • Create a NumPy array from a Python list:

      numpy_array = np.array([1, 2, 3, 4, 5])
  3. Array Shape and Dimensions:

    • Get the shape and dimensions of a NumPy array:

      numpy_array.shape
      numpy_array.ndim
  4. Array Indexing and Slicing:

    • Access elements and slices of a NumPy array:

      numpy_array[2]  # Access element at index 2
      numpy_array[1:4]  # Slice from index 1 to 3
  5. Array Operations:

    • Perform element-wise operations on arrays:

      numpy_array + 2  # Add 2 to each element
      numpy_array * 3  # Multiply each element by 3
  6. Array Aggregation:

    • Calculate statistics on arrays:

      np.mean(numpy_array)  # Mean
      np.median(numpy_array)  # Median
      np.std(numpy_array)  # Standard deviation
  7. Array Concatenation and Stacking:

    • Combine multiple arrays:

      np.concatenate([array1, array2])
      np.vstack([array1, array2])  # Vertically stack arrays
      np.hstack([array1, array2])  # Horizontally stack arrays
  8. Array Filtering:

    • Filter elements based on a condition:

      numpy_array[numpy_array > 3]
  9. Random Number Generation:

    • Generate random numbers or arrays:

      np.random.rand(3, 3)  # Generate a 3x3 array of random values between 0 and 1
  10. Reshaping Arrays:

    • Change the shape of an array:

      numpy_array.reshape((2, 3))
  11. Missing Data Handling:

    • Handle missing values in arrays:

      numpy_array[numpy.isnan(numpy_array)]  # Detect missing values
      numpy_array[~numpy.isnan(numpy_array)]  # Remove missing values
  12. Statistical Tests:

    • Conduct statistical tests on arrays for hypothesis testing:

      from scipy import stats
      t_stat, p_value = stats.ttest_ind(array1, array2)

Example

   ```python
  import numpy as np
  import urllib.request
  import csv

  # URL of the Titanic dataset
  url = 'https://raw.githubusercontent.com/drshahizan/dataset/main/titanic/train.csv'

  # Function to load the dataset
  def load_titanic_data(url):
  response = urllib.request.urlopen(url)
  lines = [l.decode('utf-8') for l in response.readlines()]
  data = np.array(list(csv.reader(lines)))
  return data

  # Load the Titanic dataset
  titanic_data = load_titanic_data(url)

  # Display the first 5 rows
  print(titanic_data[:5])
  ```

NumPy is a powerful library for numerical operations and array manipulation, making it a valuable tool for various tasks in EDA, especially when dealing with numerical data or conducting statistical tests. It is often used alongside pandas for data analysis and manipulation in EDA workflows.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors