It is a Jupyter Notebook that performs an analysis of sugarcane production data using Python and popular data analysis libraries such as Pandas, Seaborn, and Matplotlib. The dataset used in this analysis is named "List of Countries by Sugarcane Production.csv."
- Python Libraries:
- Pandas
- Seaborn
- Matplotlib
The dataset contains information about sugarcane production in various countries. It includes the following columns:
- Country: The name of the country.
- Continent: The continent where the country is located.
- Production(Tons): Total sugarcane production in tons.
- Production_per_person(Kg): Sugarcane production per person in kilograms.
- Acreage(Hectare): Total acreage of land used for sugarcane cultivation in hectares.
- Yield(Kg/Hectare): Yield of sugarcane in kilograms per hectare.
- Data Cleaning: Removing unwanted characters characters (e.g., commas, dots) from numeric columns and dropping irrelevant columns. Changing the data type accordingly.
- Univariate Analysis: Examinnig of each column separately, Identifying the outliers using boxplot and data distribution.
- Bivariate Analysis: Exploring the relationship between two different columns. Scatterplots and bar plots are used to visualize these relationships.
- Correlation Analysis: The correlation analysis investigates the relationships between numerical variables. A heatmap is used to visualize the correlation matrix and identify any significant correlations.
- Analysis by Continent: Understanding the data at continent level.
- Africa produces the most Sugarcane in Tons.
- South America is the Continent with the maximum sugarcane production.
- The country "Brazil" produces maximum sugarcane out of all countries.
- Brazil has the highest Land.
- Guatemala has the highest yield(kg/hectare).
- Production per Person is highest in Paraguay.