Crowd-sourcing and Open Street Maps #41
Replies: 20 comments 10 replies
-
The benefits to crowd-sourced data is the ability to very easily (and likely cost-effectively) collect A LOT of data from around the world. However, these datasets will likely include biases and human sourced errors that could complicate open science efforts, and bring into question the validity of some aspects of the data. The paper/wiki page also mentioned that crowd-sourced data usually attracts certain people who are very interested in the research topics and have the means to contribute to data collection, but often leaves gaps in large parts of the world that may not have the technological abilities or interest in contributing. This means that there will be areas of the world that are very much underrepresented in the dataset. As a data scientist, if I'm pulling from crowd-sourced data, I would look into what validation techniques were used on each entry, and also take into account the areas/populations that are not well represented, so as not to over interpret/generalize any results I may find across those underrepresented areas and/or populations. |
Beta Was this translation helpful? Give feedback.
-
Crowdsourced data can be very valuable because it covers a variety of locations. For example, during the solar eclipse this past spring, people along the path of totality could record their observations in real time across all of North America. One issue that arises, however is the way that data is collected. If everyone is providing data on temperature, it would be helpful to know what device, units, etc. they are using so that those measurements could be compared directly. For data scientists, I would imagine the meta-data about what is being collected would be very important. In addition, some types of measurements are dependent on the protocol used - especially water quality. Ideally, training would be provided to the crowd collecting the data, otherwise comparisons might be difficult. |
Beta Was this translation helpful? Give feedback.
-
There seem to be a lot of benefits to crowd-sourced data. The data could be more updated/recent and crowd-sourced data could increase the amount of data received from groups that historically have been excluded in data collection. Also, crowd-sourced data could highlight things researchers hadn't thought about before. However, I could see drawbacks relating to privacy, ethics, and education. Maybe people don't know they're sharing their data or don't know what their data will be used for. How are people gathering crowd-sourced data compensated or credited? Additionally, if the humans gathering the crowd-sourced data haven't been trained on data collection, there could be issues on whether the data is of good quality. When using crowd-source data, I think it would be important to consider:
|
Beta Was this translation helpful? Give feedback.
-
Crowd-sourced data can be of benefit to academic or government research with the recognition of its potential contributions and limitations. Across organizations there are various approaches to data management and one prominent approach comprises a set of principles known as FAIR (Findable Accessible Interoperable Reusable) data. These stewardship principles are designed to support cross-organization collaboration and data longevity. With regards to the use of crowd-sourced data, data interoperability is essential because it requires shared expectations for the meaning of data collected. Citizen scientists who participate in crowd-sourcing efforts should be educated on the purpose of the data collection and best practices for data capture relevant to the domain of interest. Data obtained without a similar framework should be further evaluated to understand the data validity and scope of application. A notable platform for crowdsourcing geospatial data is ISeeChange, which connects citizens with government agencies, organizations, as well as researchers to track local climate change impacts. Individuals can report qualitative and quantitative observations and participate in community partner initiatives. |
Beta Was this translation helpful? Give feedback.
-
It is super exciting that crowd-sourced data is as widespread as it is. It certainly has the potential to be highly impactful and could greatly increase the amount of data collected, from all over the world. As the article noted, some of the drawbacks are that still some of the sectors, including public health and remote sensing, there is a lack of crowd-sourced data. It seems like there are still some barriers in place that prevent crowd-sourcing to be a source for all areas of science. Additionally, the seven specific challenges identified by the researchers in the “Crowdsourcing Geospatial Data for Earth and Human Observations: A Review” included ensuring data quality and accuracy; protecting data privacy; training and educating non-experts; sustaining data collection; navigating legal and ethical issues; and interpreting data. |
Beta Was this translation helpful? Give feedback.
-
There are many challenges in using crowd-sourced data for analysis. A researcher might not be able to fully trust that the population sampled will provide honest information (especially in response to culturally sensitive topics, such as election exit polls). Additionally, if data are sourced from members of the public with varying degrees of background knowledge in the related field of research, it becomes difficult to compare these data to each other. There is also the potential for data bias to influence trends. Those with an interest in the research topic tend to be more likely to contribute to a project, rather than individuals with little to no interest in the topic. On the other hand, crowd-sourcing data is a cost-effective way for researchers to build databases. Social media has helped to generate a wealth of crowd-sourced data available to researchers for little to no cost, and it can be applied to various scenarios. This, of course, assumes those data are reliable and reflect accurately on the communities being researched. As data scientists, especially in regard to natural resource work, quality control and assurance measures function as critical procedural steps. If crowd sourced data come from the public (citizen scientists, for example), it is important to establish a baseline standard for topic expertise. Were all citizen scientists vetted, or did they go through an identical training to ensure certain baseline standards are met? If so, this ensures that the collected data are, somewhat, comparable to one another. |
Beta Was this translation helpful? Give feedback.
-
I started a different discussion thread as I did not see to reply in this one - summary of statement is that crowdsourced data can be ok when conflated to known locations based upon uncorrected GPS. lauren-alexandra thanks for the iseedata link. Some similar government crowdsourcing locally is for City of Boulder via Inquire using GovernmentOutreach.com (example https://user.govoutreach.com/boulder/faq.php?cmd=shell&goparms=classificationId%3D23682) |
Beta Was this translation helpful? Give feedback.
-
Crowdsourced data is very useful to get a large volume of data for relatively little cost, but the data are not always the highest quality, being that the data is not standardized and can be contributed to by anyone. However, with a large volume of data, outliers and faulty data can be filtered out. The SciTechDaily article talked about using machine learning in order to convert the crowdsourced data into useable products. Machine learning is going to prove to be an invaluable resource for the management of big data, as it can allow anyone who has access to data and a ML model to use the open source crowd data to create products in a GIS. In my opinion, this will be a great thing for open science, as long as the ML models are used responsibly, and not generating false data. Crowdsourcing and open data also has its drawbacks, though. Because ANYONE can make a product from data that is freely available, there will inevitably be products that do not draw any meaningful conclusions, or are of low quality (see garbage-in-garbage-out). |
Beta Was this translation helpful? Give feedback.
-
Crowd sourced data has drastically changed the data landscape in many beneficial ways; crowd sourced data provides real time, community driven data that ordinary people can contribute to (which is not the case with authoritative data sources). While this democratization of data and diversified types of data available has led to richer multifaceted insights and shifts across industry types, this comes with drawbacks. As a data scientist it is important to account for the potential drawbacks when using crowd sourced data. Data quality and accuracy, data privacy, potential lack of sustainable data, legal and ethical issues, data biases, and data interpretation are all things to consider. There are possible ways to try to account for these considerations such as: assessing and mitigating data biases, executing quality assessments, and having stringent agreements with third parties/ clear explicit consent with volunteers. It is essential to know your source to properly account for the corresponding drawbacks. Crowd sourced data, like ‘open science’, is aimed to be open to everyone to contribute to or access regardless of expertise, demographics, etc. Crowd sourced data can add potential new insights to open science through the expanded data sources; however, there would likely be difficulty in being able to reproduce or replicate crowdsourced data which would complicate open science efforts of replication or reproducibility. Although the challenges of crowd sourced data create complexity to using it, crowd sourced data’s potential should not be underscored. |
Beta Was this translation helpful? Give feedback.
-
Hello! I read the article and took a look at the wiki. I also glanced at the review article, which looks fascinating! In terms of crowdsourcing, there are a number of clear pros and cons. On the pro side, this greatly increases the quantity of data that can be gathered. In the past, most data would have been gathered by professionals going out into the field and actively recording observations, but now we have millions of people gathering data of their own free will or carrying around machines that automatically collect it. This is a vast new resource of available data. On the con side, a lot of this data is being gathered by non-professionals. This has the potential to introduce all kinds of bias, inaccuracy, and subjectivity. As data scientists, we should at the very least be capable of making the distinction between data gathered by professionals and non-professionals, or data that's "official" versus "unofficial". Of course, professionals can make mistakes too, but at least in that case there can be some expectation of standards in terms of accuracy and removal of bias. That's also not to say that some "amateurs" might not have very good data - it's a question of being able to evaluate the standards / accuracy / bias to some degree. I'll admit, I hadn't heard the term "open science" before reading it in Elsa's assignment post. I looked it up on wikipedia (shame on me... but you've gotta start somewhere!) and from what I read there it refers to the movement to make science more accessible to the public, both in terms of making publications more accessible, and in terms of citizens and members of the public undertaking their own scientific endeavors. I see a lot of the same pros and cons here - a lot more science potentially being done, but being done by non-professionals. In terms of the relation to crowd-sourced data, it seems like the kind of thing where public organizations can be both gathering their own data, and conducting their own research. That amps up the power of the science, and also potentially puts a powerful tool in the hands of ... whatever public organization wants to use it. I could see both great things coming of this, and also not so great things. |
Beta Was this translation helpful? Give feedback.
-
The main draw of crowd sourced data is the ability to generate a large volume of data for little cost. Researchers don't have to account for the cost of the devices used to collect data or wages for the data collectors. In addition, the data collected can have a wide range of time/space coverage by having more people available to collect data. On the flip-side, so much data is being generated that much of it is never analyzed. By having the public collect data casually, a bias is introduced to the data set for data that is easy to collect. An example of this is on iNaturalist, which crowd sources biodiversity data around the globe. The most common species observed in the database is the Mallard Duck, which is not the most common species around the entire world but is easily accessible in the public's community retention ponds. A data scientist needs to be mindful of the limitations of the data source, such as how the data is verified. Is it reviewed by experts before fully joining the database or are the reviewers other public contributors? They should also seek to understand how the data is collected, who is collecting the data, and how might the data need to be processed before it can be analyzed. Crowd sourced data can contribute to open science by fostering interest in science with the public. By providing ways for the public to engage in the scientific process with low barriers to entry, individuals may find that they wish to learn more about the subject and participate at a higher level. I believe it can provide a good place for exploratory analysis; to find areas or trends that appear strange and inspire more formal data collection. |
Beta Was this translation helpful? Give feedback.
-
I largely agree with what has been discussed earlier in this thread. I think that there are two additional benefits to crowd-sourced data that have yet to be discussed. First, because data collection is often expensive, it is only collected once or on an irregular basis. The option of crowd-sourced data, especially if there are no sustainability problems, is that it makes it easier to do longitudinal studies. This is very helpful when looking at the the impacts of climate change, as well as examining the progress of certain location-based solutions. Having access to data over time and at regular intervals will let researchers and land managers more quickly understand what is happening and if certain mitigation measures are working. Second - crowd-sourcing opens up unlimited collaborative opportunities. For example, say I wanted to get data about the interior of a remote National Park, I could recruit the rangers that patrol this area to gather the data. This limited 'crowd' could provide high-quality data that I could, in turn, share with them once it has been processed. The opportunity for collaborative science means there are virtually unlimited opportunities for individuals and communities to collaborate. |
Beta Was this translation helpful? Give feedback.
-
Speaking in terms related to Tribal interests and data sovereignty, there are many great advances here but first come the issues. Both articles bring up only red flags that are rather alarming. First, Indian Country needs all of this. Second they don't trust you! With the processes mentioned in the articles it looks like mega exploitation for non-tribal people to tribal lands. You think it would be beneficial but it's really the opposite. I believe more tribal lands have been lost due to maps than war conflicts in the past. Here is an example: if you create a great public road network using such methods, you also provide pathways for exploiters. What will they exploit? Access to a tribal members favorite berry bush, medicinal plants and herbs, fossil access, off road 4-wheelers, water, and the list goes on. But, they need this Crowdsourcing Are Redefining Geospatial Data and OpenStreetMap technology! Just not necessary for the public but they have too few options to move this forward themselves. And don't forget, they don't trust you! ;) More later in the class discussion. Great topic! |
Beta Was this translation helpful? Give feedback.
-
Crowd sourced data has many benefits including greater spatial resolution and number of observations. Further, it typically is more recent or has greater time resolution. The main drawback that concerns me is systematic quality of the data. Random error or random biases can be flushed out by the greater collection frequency. However, systematic biases (whether in the device or the collector) are much more difficult to understand and correct. As a data scientist using crowd-sourced data I would be most concerned with the systematic biases in the dataset which requires sophisticated cleaning. For example, if I wanted to know if there is a "safe" place to run in Boulder on a high smoke day I may look at the Purple Air network. However, in doing so one realizes that while most sensors report a large particulate matter value, some report zero. It is more likely that a sensor will report false "clean" air than false "very dirty" air because the sensor is left indoors or the inlet is blocked. I would need to have a granular level view of the data to spot this issue. In the end, the dataset will likely need to be cleaned, which likely requires specialist knowledge. Crowd sourced data, overall, improves open science in a few ways. It generates data that is usually more accessible to the public in useful volumes. It also places the ownership of the data from academia or businesses to people. One complication that worries me is the interpretation phase of the data. That is, if non-specialists interpret specialized and complex datasets. Considering that mis-information can spread rapidly and easily, abundant datasets combined with poor or malicious interpretation can be used to push narratives. Once something becomes a "study" it is not necessarily scrutinized by policy-makers. One argument for institutional datasets is that the they are often public and require collaboration with the expert. For example, NASA, NOAA, EPA, etc. generate earth science datasets that are high-quality. When the data is specialized they require the user to discuss the use with the data collector before it is used in a public-facing way. This helps prevent mis-interpretation or malicious use. |
Beta Was this translation helpful? Give feedback.
-
There are many benefits and drawbacks of crowd-sourced data. typically, crowd-sourced data can be completed on a continuous timeframe over a widespread geography. It may also be cheaper to acquire making it more accessible. There are some drawbacks though as well. This kind of data can be collected incorrectly due to inexperience and may have 'holes' of less popular but still important information. When using crowd-sourced data, some things that one should consider is human error. Like typos in the open map website that we saw. Other factors to consider are bias and inexperience. These are important to consider in regard to how much training people go through before they start recording data and is something to consider when working with crowd-sourced data. It is easier to make data free to use if it was relatively cheap to obtain, which allows for more people to contribute and manipulate the data for their own research. Open science and open data can provide opportunities for people who might not have the funding of private industry or major universities to complete research. The people who collect the data may be more interested and therefore more likely to contribute and get involved in the science. However, they may not be trained in best practices and misinterpret or be misleading in how data is presented. |
Beta Was this translation helpful? Give feedback.
-
Crowd-sourced data is very compelling! Anyone can collect it with any sort of smart device, including phones, watches, cars or anything else that computer are put nowadays. It is also relatively cheap, so data aggregators don't have to pay third party sources for this data. It also emphasis the collaborative promise of the internet, bringing everyone together from all around the world with different backgrounds to make something for the benefit of everyone. There are a few dark sides I see, however. One is nefarious actors could manipulate crowd-sourced data by putting in false or misleading information to further their aims. Corporations also could use crowdsourced data to for financial gain, with out due compensation to those who collected the data. I feel conflicted about my last point, as I believe good things come out of people using crowdsource data to make something that turns a profit, say like pointing out traffic obstacle on google maps, but there can be a line that is crossed Human error is big one to consider when using crowd sourced data. People may input the data in different formats, not fill out all the data correctly, or it may just be plain wrong. To correct for this, data scientist should be prepared to standardize crowdsourced data, and complete data that is incomplete, and just flush out data that is very incomplete or unuseable. Crowd-sourced data is perhaps the biggest way that average people can contribute to open science, for the reason mentioned above about smart devices. Since pretty much everyone has one, everyone can be a contributor! And because everyone can be a contributor, you have all the promise of having large diverse data sets, but all the problems of dealing with all the problems of human nature along with it. In my opinion, strong guidelines should be employed for those collecting the data, those synthesizing the data, and those using the data later on should be used in order to get the most out of it. |
Beta Was this translation helpful? Give feedback.
-
Crowd-sourced data opens up a lot of possibilities for community engagement in science as well as with important environmental and public policy issues. In that sense it is exciting to think about how we might conduct new and ongoing research using these tools. However, having started out designing and conducting community-based field data collection early in my career, there are many likely pitfalls or potential issues that can arise, particularly around data quality, continuity, and consistency. That said, platforms like Open Street Maps that carefully guide users to make the kinds of contributions that are desired and most useful, can be great examples for other efforts. As was mentioned in previous comments, implementing crowd-sourced data efforts in educational or organizational contexts might be the best bet for maximizing both engagement and data quality. |
Beta Was this translation helpful? Give feedback.
-
Benefits include more data available, which includes consistency of data from several source. This is an issue when you are using only government based data within time lines, and there are periods in the time frames were data is not available. However, some things to consider is if the data you do obtain is if the data is valid and can be used without permissions. Other issues may be is if the data is accurate. When it comes to open science data, in scientific studies the data used needs to be accurate so the source needs to be verified. Open source data is more freelance data that anyone can create and edit. Therefore, having a crowd-sourced data there would have to be greater steps taken for the user to research the data they are using. |
Beta Was this translation helpful? Give feedback.
-
There are multiple crowd-sourced data in our daily lives. The map apps on our phones use crowd-sourced data to give us real-time directions and warnings. The iNaturalist and Merlin apps use crowd-sourced data on nature and birds to help people identify the world around them--and can be useful for researchers. Another interesting example is PurpleAir, crowd-sourced air monitoring, although these are not recognized as quality enough for formal air quality monitoring. Note issues of who owns the data and what areas are missing, partially due to cost and culture differences. Another issue is weather monitoring stations, which might be made available to communities at reduced cost, but then who owns the data. For instance, see https://www.weather.gov/iln/cwop. |
Beta Was this translation helpful? Give feedback.
-
I think they have said quite a lot about crowd sourced data, so I want to summarize all the comments, and these are the ideas I have gathered from the discussion. Relation to Open Science Complications for Open Science Overall, while crowd-sourced data presents exciting opportunities for advancing open science, careful consideration is required to address its inherent challenges. |
Beta Was this translation helpful? Give feedback.
-
Before class on Week 2 (September 4/5), read the following short articles, respond to them in this discussion thread, and prepare to discuss them in class:
As a guide for your response, you can consider:
Beta Was this translation helpful? Give feedback.
All reactions