Removing WikiHow dataset at WikiHow request #3034

echo0x22 · 2023-05-04T12:48:12Z

Sadly, WikiHow Team contacted me and said that their data cannot be used in this way...

echo0x22 · 2023-05-04T13:03:05Z

I have contacted LAION and some other devs about the legality of WikiHow ban and potential solutions, but we probably won't be able to use my dataset in its current form.

z11h · 2023-05-05T05:01:00Z

What legal basis/ability does WikiHow even have to take against an open-source project like this? Is there any precedent for lawsuits for scraping websites and training a model on them?

Training algorithms on copyrighted data is not illegal, according to the United States 2nd Circuit Court btw.

olliestanley · 2023-05-05T08:16:27Z

What legal basis/ability does WikiHow even have to take against an open-source project like this? Is there any precedent for lawsuits for scraping websites and training a model on them?

Training algorithms on copyrighted data is not illegal, according to the United States 2nd Circuit Court btw.

I think you're right that there's no legal problem here, and we should not talk as if there is. However I think as a project OA is inclined to respect the wishes of website operators if they ask us not to use their data.

koosoli · 2023-05-05T15:18:42Z

Regarding the removal of the WikiHow dataset, I share your disappointment and believe that the decision made by the WikiHow team was unjustified considering the nature of both projects. The fact that the wikiHow content is licensed under Creative Commons. An unported License indicates that reuse and distribution are permitted, provided that attribution is given (if they want credit, let's just give it to them). Therefore, I don't see any clear reason why the wikiHow dataset couldn't be incorporated into the Open Assistant project.

It's important to note that open-source projects rely heavily on the contributions made by many individuals and other projects. It wouldn't make sense to exclude valuable, relevant sources simply because someone claims ownership over them. This is not how open source works and I could imagine that many contributors of the wikiHow platform, who spend countless hours contributing their knowledge to be licensed under the creative commons, would agree! Openness, sharing, and collaboration between projects should be encouraged to ensure the continued advancement of machine learning technology.

Finally, I would like to add that many closed-source LLMs like ChatGPT also rely on open-source projects to train their models and I think they do not offer an opt-out option. In contrast, Open Assistant is an open-source project that provides the opportunity for anyone to contribute and improve the dataset. By removing WikiHow articles, we are not only limiting the knowledge pool but also handicapping the project's potential unfairly. What is the next exclusion, the Wikipedia Foundation?

Removing WikiHow dataset, Legal Issues

f5c9280

Sadly, WikiHow Team contacted me and said that their data cannot be used in this way...

echo0x22 requested review from Vechtomov, bitplane, huu4ontocord, olliestanley and sedthh as code owners May 4, 2023 12:48

andreaskoepf approved these changes May 4, 2023

View reviewed changes

olliestanley changed the title ~~Removing WikiHow dataset, Legal Issues~~ Removing WikiHow dataset at WikiHow request May 5, 2023

olliestanley enabled auto-merge (squash) May 5, 2023 08:16

olliestanley approved these changes May 5, 2023

View reviewed changes

olliestanley merged commit 149f9ac into LAION-AI:main May 5, 2023

echo0x22 deleted the patch-1 branch May 5, 2023 08:50

koosoli mentioned this pull request May 6, 2023

The Removal of WikiHow #3066

Closed

layterz pushed a commit to layterz/Open-Assistant that referenced this pull request May 11, 2023

Removing WikiHow dataset at WikiHow request (LAION-AI#3034)

2f06376

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing WikiHow dataset at WikiHow request #3034

Removing WikiHow dataset at WikiHow request #3034

echo0x22 commented May 4, 2023

echo0x22 commented May 4, 2023 •

edited

Loading

z11h commented May 5, 2023

olliestanley commented May 5, 2023

koosoli commented May 5, 2023

Removing WikiHow dataset at WikiHow request #3034

Removing WikiHow dataset at WikiHow request #3034

Conversation

echo0x22 commented May 4, 2023

echo0x22 commented May 4, 2023 • edited Loading

z11h commented May 5, 2023

olliestanley commented May 5, 2023

koosoli commented May 5, 2023

echo0x22 commented May 4, 2023 •

edited

Loading