According to a research done by the National Institute for Space Research (INPE), satellite data revealed an 84 percent increase in Amazon forest fires compared to 2018. A majority of the fires are caused by regional deforestation whose rates have spiked in 2019, driving the devastating fire outbreaks in August that destroyed part of one of the most important carbon storehouses left on the planet. Understanding the location of deforestation and human encroachment on forests is the key to speedy response times and curbing further damage to the ecosystem.
Previously, the tracking efforts largely relied on coarse-resolution imagery from Landsat (30-meter pixels). However, advancement in satellite imagery and machine learning (ML) has pulled us closer to detecting small-scale deforestation and differentiating between human and natural causes of the degradation. This advancement allows us to accurately track the changes in the Amazon rainforest, and focus the efforts of the government in the areas most vulnerable. Additionally, we can maintain a log of condition of a particular geographic location and measure the results of conservation or encroachments.
Planet, designer and builder of the world’s largest constellation of Earth-imaging satellites has a labelled dataset of land surfaces at the 3-5 meter resolution, and we aim to leverage modern deep learning techniques to identify activities happening within the images. We treat this as a multi-label classification problem, to label satellite image chips with one or more of 17 labels that indicates atmospheric conditions, land cover, and land use. We used Tensorflow, Keras and scikit-learn to develop the CNN models and implemented them on Google Cloud Platform. There are 8 jupyter notebooks in this repository and each one of these corresponds to specific models or functions which we came up with to tackle this competition.
For the complete walkthrough of the project, you can read the blog post that I co-authored with my teammates (Aishwarya Pawar, Ananya Garg, Sachin Balakrishnan and Kachi Ugo).
Source: Kaggle Competition
- The data for this competition consists of 40,479 training samples and 61,192 test samples from satellite imagery.
- Each image is of size (256, 256, 3), with the channels representing R, G, B. Each pixel in an image corresponds to a resolution of 3.7 m meters on ground.
- The data was also provided in 4-channel TIF format, with the fourth channel being infrared.
We implemented multiple deep neural networks along with domain-specific pre-processing techniques (Haze Removal) to achieve a F2 score of 0.9257, landing us in the top 20% of the Kaggle leaderboard.
Since the dataset was highly imbalanced, we attempted to handle this by additional pre-processing and developing an additional model exclusively for training the rare labels and stacked this on top of the base model. This approach helped to considerably elevate our model’s ability to detect and predict the rare labels which was evident from the lift in precision and F1-scores for the rare labels.