Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture Matching Function - Possible AI supported outputs #157

Open
Davidezrajay opened this issue Sep 14, 2023 · 3 comments
Open

Capture Matching Function - Possible AI supported outputs #157

Davidezrajay opened this issue Sep 14, 2023 · 3 comments
Assignees

Comments

@Davidezrajay
Copy link
Contributor

Davidezrajay commented Sep 14, 2023

Capture Matching Function - Possible AI supported outputs

Challenge Brief:

Captures as images of trees and other data are collected with smartphone apps and are used to verify environmental work. Users often return to the same plants/trees over time resulting in "layers of images" of the same tree / same location. These images must be linked together.

Current 'capture matching' is done with a front-end react web app and backed by a RESTful API. This 'capture matching administration panel' is how users match them manually; the interface is supplied with images from an API and displayed based on GPS coordinates/distances and other filters that the user sets, including time range and organization. If several matches are found, the capture matching system displays the GPS-related images in order of distance. These captures are then matched by an operator. The process is slow, has room for improvement accuracy, and requires a level of automation at scale.

Besides being prone to error, the current operation doesn't account for data related to identified species, leaf morphology, trees already capture matched, species, track files, tracking seasons and other attributes.

The goal of this ticket is to identify and test methods and solutions that augment, verify, or replace the current capture matching process.

How to contribute

Just go for it and solve it.

If you get stuck, you can ask questions on this ticket, via Greenstand's Slack, or by emailing the ticket contact below. (join on slack and introduce yourself in the community_intro channel. From there you will be invited to the project specific channels)

Note: IF you believe there is insufficient information or infrastructure provided to solve this critical issue, please reach out.

Deliverable:

  1. Any integration improvement to the Greenstand stack as a pull request to the appropriate Greenstand repository as a script/airflow function that feeds the user interfaces.
  2. Any integration improvement to the Greenstand stack as a service based on current API functions that "pre-tags" captures.
  3. An Open and lead ADR on recommendations to change or improve a process, data collection etc. Note: Viable data collection recommendations cannot increase workload for users or apps.

Full Challenge Narrative:

The underlying value proposition on the Greenstand Token Model is the ability for individuals to earn and trade tokens linked to work surround ecological restoration, which is often based on the growth of plants or trees over time. The issue of identifying repeating captures / visits to individual trees is critical to the success of many projects using the model which encourages the re-tracking of trees to document maintenance, tree health and growth over time as a means of employment, poverty alleviation and ground verification around successfully implementing carbon and reforestation projects.

Solving this challenge will:

  • Drive more community based engagement in carbon offset projects
  • Support the identification of successful and unsuccessful restoration methods.
  • Increase tree survival rates (most tree planting is a plant and forget model)
  • Identifies duplicated images and scammers.
  • Add value to the Greenstand Token Model

Each capture contains a geo tagged image collected from a mobile app. It enters the Greenstand system and is tagged with various attributes (such as species) using a number of different microservices and manual operators. The first time a capture is captured is unique to the location and context. However, a re-tracked tree creates data points that are similar to the initial tree capture.

Users tend to double track trees, intentionally or unintentionally, in single tracking sessions, or at later dates, or multiple users overlap their tracking at different times, especially when implementing larger operations (hundreds or thousands of trees.)

GPS inaccuracy is an issue. Most user phones are cheap models and limited in their ability to pinpoint locations and many trees are often collected within the "area of GPS error." The GPS data alone is not accurate enough to match the images. Trees are often planted a meter or less apart, while GPS accuracy is often 10 meters or more.

Related operational issues

  1. Trees die and are often replaced, in the same geo location.
  2. Users and tree growers are incentivized to take duplicate images of trees and some have tried to scam the system by taking multiple images of the same plant from various directions.
  3. User and phone specific data is not considered in this as the same user or different users returns to the same cluster of trees at undefined times.
  4. GPS accuracy radius overlaps multiple trees possibilities.
  5. Physical tree tags and RFID tags have been tried and ruled out as not a scalable option for our users.

Solutions:

It is not expected to have a single solution to 100% solve this challenge, rather a solution is expected to be built by many incremental improvements and tools added to the process from different sources.

Possible solutions:

GPS coordinate accuracy enhancement, using filtering algorithms. Object recognition coupled with GPS to link trees across the maintenance period. ML image verification

  1. "pre match" as many captures as possible before showing them to users.
    and put in place a machine learning process that will
  2. Create algorithms that automatically match the captures.
  3. Utilize other layers of data - track files, species data.
  4. Scrub data priory to evaluation (adjusting inaccurate GPS data)
  5. Statistically match based on total number of possibilities.
  6. Use images based attributes to match. (such as background rocks and unique environmental attributes)
  7. Redesign the UX of the capture matching tool in the admin panel
  8. Enhancements to GPS accuracy (see issue)

Supporting ideas include:

  • Identifying and matching image background.
  • Using species diversification to limit options shown to admin operator
  • Using leaf morphology to limit options shown to admin operator
  • Users tend to travel in relatively predictable paths.
  • Each tree is unique
  • Many projects have multiple updates of individual trees.

Barriers to completion:

  • User Privacy issues limit access to some data.
  • Lack of curated or accuracy matched data sets
  • Testing solutions may require setting up a number of Greenstand's microservices.
  • Limited Feedback on solutions due to limited organizational capacity
  • Limited organizational capacity to quickly review and integrate new solutions into the full stack
  • Can be challenging to visualize results (Greenstand admin tools can be quite helpful for these and can be set up independently, however production services are not set up for open access with real data.)

Resources:

Links

Data Resources

Suggested data sets

There are data sets of trees with repeat captures with sticks painted with colored stripes (In Haiti) which can provide an extra layer of support creating curated sets.

The Freetown City data has been mostly manually matched (although there has not been much quality control on that data set)

Greenstand respects our users privacy. For more data needs, please contact the issue lead and articulate why you need it and be prepared to provide a government issued id and sign a legally binding data privacy policy.

Related Projects/tools:

Related Issues:
Greenstand/treetracker-admin-client#568
Greenstand/treetracker-admin-client#1029
Greenstand/treetracker-admin-client#949
Greenstand/treetracker-admin-client#781
Greenstand/treetracker-admin-client#568
Https://github.com/Greenstand/Greenstand-Overview/issues/54
Https://github.com/Greenstand/Greenstand-Overview/issues/52
Https://github.com/Greenstand/treetracker-android#197
#75

Contacts on this issue

Primary: Xinyi Hu [email protected]

Secondary: Info (at) Greenstand.org

To do:

  • Add Data download file
  • Add track Files
@ahs0katan0
Copy link

A couple of questions from the ML team.

  • Matching up the background requires the images to be taken from roughly same angle each time to get the same background - a hard requirement to reinforce. As noted in the ticket, backgrounds can be noisy, making it harder for image identification.

One approach @shubhomb pointed out was to have humans match the captures at large scale (to create the training data), we may be able to use that dataset to train an automated algorithm for it. In the absence of that data, the algo can only generate a defined list of recommendations to match.

  • I see the other issues mentioned in the description are closed - I'm interested in this one in particular.

#54
Does it show the probability of duplicate images based on timestamp and location sequence? If this exists, then it holds more promise as those images will be grouped by tree>>user.

@ahs0katan0
Copy link

Update from Ezra - The duplicate issue has never been solved, only the release of the admin panel (duplicate and capture matching are the same thing in his mind)
The trained set is a viable idea, and we have some data for it from Free town and from Haiti. (Although our experience with this last time was a trained professional is required

@ahs0katan0
Copy link

Further discussion with @shubhomb indicated that more data is needed to validate the system that will be created. Shubhom examined the CSV linked and is exploring creating a simulator to approximate the planter movement. However his assessment is that the data may not be adequate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants