Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
README.md		README.md
aat.json		aat.json
[email protected]		[email protected]
admingeo.json		admingeo.json
agrontology.json		agrontology.json
agrovoc-void.json		agrovoc-void.json
aifb.json		aifb.json
[email protected]		[email protected]
[email protected]		[email protected]
bibsonomy.json		bibsonomy.json
[email protected]		[email protected]
csvw.json		csvw.json
daml-time.json		daml-time.json
[email protected]		[email protected]
datcat.json		datcat.json
[email protected]		[email protected]
[email protected]		[email protected]
[email protected]		[email protected]
eunis.json		eunis.json
[email protected]		[email protected]
gdprtext.json		gdprtext.json
[email protected]		[email protected]
geonames.sh		geonames.sh
getty-void.json		getty-void.json
gn.json		gn.json
[email protected]		[email protected]
isbd.json		isbd.json
[email protected]		[email protected]
[email protected]		[email protected]
la.json		la.json
[email protected]		[email protected]
[email protected]		[email protected]
linghub.json		linghub.json
[email protected]		[email protected]
[email protected]		[email protected]
[email protected]		[email protected]
naf.json		naf.json
[email protected]		[email protected]
os_geo.json		os_geo.json
os_sro.json		os_sro.json
[email protected]		[email protected]
[email protected]		[email protected]
[email protected]		[email protected]
[email protected]		[email protected]
rdapo.json		rdapo.json
[email protected]		[email protected]
rr.json		rr.json
sdmx-attribute.json		sdmx-attribute.json
sdmx-code.json		sdmx-code.json
sdmx-concept.json		sdmx-concept.json
sdmx-dimension.json		sdmx-dimension.json
sdmx-measure.json		sdmx-measure.json
[email protected]		[email protected]
[email protected]		[email protected]
[email protected]		[email protected]
sro.json		sro.json
[email protected]		[email protected]
[email protected]		[email protected]
time-entry.json		time-entry.json
[email protected]		[email protected]
[email protected]		[email protected]
wv.json		wv.json

README.md

Erroneous datasets

Unfortunately, many datasets cannot be included into the LOD Cloud because they do follow standards. Datasets that are currently not included because of errors are described in this directory.

Certificate error

The following datasets can not be accessed because of an incorrect certificate:

Who Am I!

Does not exist (404 reply)

The following dataset can not be accessed because their online location does not exist:

Server error (503)

GDPRtEXT

Intentional server error (503) for DDoS mitigation

Some servers use CloudFlare DDoS mitigation. The intention is to allow human users who access the data through a web browser with JavaScript engine, and to disallow machine users who access the data from scripts (and typically without a JavaScript engine).

The CloudFlare process works as follows:

When a human user first visits a URL in their web browser with JavaScript engine, the browser does not know that the URL will be serviced from a CloudFlare server. This is therefore a regular HTTP request.
The CloudFlare server does not support regular HTTP requests and sends back a 503 server error, together with an HTML page body containing JavaScript code.
The web browser loads the HTML page with the intention of showing a human-readable message to the user. However, the HTML page also includes JavaScript code from CloudFlare. This code is tries to determine whether the user is a legitimate user. The requirements for being a legitimate user are unclear, but at least includes the requirements under step (1): accessing the URL through a web browser with a JavaScript engine.
If the CloudFlare code determines that the user who isssued the HTTP request under step (1) is a valid user, the JavaScript code automatically issues another HTTP requests for the URL originally requested in step (1). The JavaScript code ensures that the new request contains certain tokens to communicate to the server that the request is now initiated through CloudFlare JavaScript code.
Observing the tokens in the new HTTP request, the server verifies whether the tokens make the request eligible for a reply. If the request is considered eligible, the reply will provide access to the resource requested in step (1). The reply will also contain Set-Cookie header.
The web browser retains the cookie. If the same resource is requested in the future, the web browser will include the cookie in the request, and will directly obtain access to the resource.

Since this process requires a JavaScript engine and Cookie store, most machine users will not be able to access datasets disseminated through CloudFlare.

The following datasets cannot be accessed because they use this CloudFlare approach:

AGROVOC VoID

Flaky server

The following dataset regularly cannot be downloaded because of an unstable server:

Semantic Finlex

Erroneous `Content-Type` header

The following datasets are serviced with an incorrect Content-Type header:

AIFB binary/octet-stream] emits binary/octet-stream i.o.text/n3.
BabelNet emits text/rdf+n3;charset=utf-8 i.o. text/turtle.
BIBO emits application/xml i.o. application/rdf+xml.
Bibsonomy emits application/xml i.o. application/rdf+xml.
Function Ontology emits application/octet-stream i.o. text/turtle.
Infection Transmission Ontology emits application/octect-stream i.o. text/turtle.
OGC GeoSPARQL emits text/xml i.o. application/rdf+xml.
Linked Art emits application/xml i.o. application/rdf+xml.
Provenance emits application/rdf\+xml i.o. application/rdf+xml.
Public Contracts Ontology emits text/plain i.o. application/rdf+xml.
SDMX Attribute emits text/plain; charset=utf-8 i.o. text/turtle.
SDMX Code emits text/plain; charset=utf-8; should be text/turtle.
SDMX Concept emits text/plain; charset=utf-8 i.o. text/turtle.
SDMX Dimension emits text/plain; charset=utf-8 i.o. text/turtle.
SDMX Measure emits text/plain; charset=utf-8 i.o. text/turtle.
W3C R2RML emits text/html i.o. application/rdf+xml.

No `Content-Type` header

The following datasets emit no Content-Type header at all:

Erroneous handling of `Accept` header

The following Accept header value is used when accessing RDF documents online:

application/trig,
application/n-quads,
application/n-triples;q=0.9,
text/turtle;q=0.9,
application/x-turtle;q=0.9,
text/rdf+n3;q=0.9,
application/rdf+xml;q=0.8,
text/plain;q=0.8,
*/*;q=0.7

The following datasets are serviced from servers that cannot process the above Accept header:

Use the following cURL command to test these URLs:

curl -vL -H 'Accept: application/trig, application/n-quads, application/n-triples;q=0.9, text/turtle;q=0.9, application/x-turtle;q=0.9, text/rdf+n3;q=0.9, application/ld+json;q=0.85, application/rdf+xml;q=0.8, text/plain;q=0.8, */*;q=0.7' '{url}' | head

No RDF

Requesting the following datasets results in a valid server reply, but do not return RDF data:

SWRL emits text/html; charset=iso-8859-1.

Erroneous IRIs

Incorrect escaping

Character escapes in IRIs must use %hh-notation.

`\u`-notation

The following datasets use \u-escaping:

Library of Congress Names line 66,292,711: <http://viaf.org/processed/NLI\u007C001461487>
VIAF line 841,558: <http://dbpedia.org/resource/National_Theatre_"To\u0161a_Jovanovi\u0107">

Absent escaping

Some characters are not allowed to appear unescaped in IRIs.

Unescaped backslash characters

VIAF line 841,558: <http://dbpedia.org/resource/National_Theatre_"To\u0161a_Jovanovi\u0107">

Unescaped caret characters

Pleiades line 60.882: <http://www.persee.fr/web/revues/home/prescript/article/racf_0220-6617_1991_num_30_1_2657?luceneQuery=%28%2B%28content%3AAQUAE+title%3AAQUAE^2.0+fullContent%3AAQUAE^100.0+fullTitle%3AAQUAE^140.0+summary%3AAQUAE+authors%3AAQUAE^5.0+illustrations%3AAQUAE^4.0>; reported at isawnyu/pleiades-rdf#7.

Unescaped double quote

Linked Movie Database (2009-05-18) line 46.397: http://dbpedia.org/resource/Wkw/tk/1996%407%2755"hk.net.

Unescaped space characters

ISO 19115-1@2014 file https://raw.githubusercontent.com/ISO-TC211/GOM/master/isotc211_GOM_harmonizedOntology/iso19115/-1/2014/ExampleOfExtendedMatadata.rdf contains a space in IRI http://def.isotc211.org/iso19115/-1/2014/ExampleOfExtendedMatadata/code/KeywordTypeCode -BioCollection.
ISO 19115-1@2018 file https://raw.githubusercontent.com/ISO-TC211/GOM/master/isotc211_GOM_harmonizedOntology/iso19115/-1/2018/ExampleOfExtendedMatadata.rdf contains a space in IRI http://def.isotc211.org/iso19115/-1/2014/ExampleOfExtendedMatadata/code/KeywordTypeCode -BioCollection.
LingHub line 11: <http://logd.tw.rpi.edu/source/congress-gov/file/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/conversion/congress-gov-biographical-directory-of-the-united-state s-congress-2012-Jan-04.ttl.tgz>
Linked Movie Database (2012-02-10) line 35.710: <http://data.linkedmdb.org/resource/country/iso alpha2>.
Rijksmuseum Actors line 106.332: <skos:exactMatch rdf:resource=" https://rkd.nl/explore/artists/420649"/>

Scheme grammar violations

The following datasets cannot be parsed because they contain a forward slash character in their scheme component:

European Nature Information System (EUNIS) file http://eunis.eea.europa.eu/rdf/countrybiogeo.rdf.gz, line 17: countrybiogeo/AD:AL.

Compression errors

GNU zip errors

The following dataset uses GNU zip compression, but seems to contain strange characters when decompressed:

ConceptNet starts with /a/[/r/.

Syntax errors

The following datasets contain syntax errors:

AtomOwl Vocabylary Specification Extension contains two predicate terms on line 73: ax:updated rdfs:range a xsd:dateTime.
DBpedia URL https://downloads.dbpedia.org/repo/lts/text/short-abstracts/2016.10.01/short-abstracts_lang=yue.ttl.bz2 contains an unescaped control character (0x11, device control 1) on line 2,016.
GeoNames is a concatenation of RDF/XML files with IRIs interspersed. This was communicated to the GeoNames forum.
Getty Art & Architecture Thesaurus (AAT) file AATOut_WikidataCoref.nt does not use end-of-triple indicators (i.e., trailing dots).

Alias overloading

RDA value vocabularies: RDA Polarity has the same alias as RDA element sets: Place object propertirs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

errors

errors

README.md

Erroneous datasets

Certificate error

Does not exist (404 reply)

Server error (503)

Intentional server error (503) for DDoS mitigation

Flaky server

Erroneous `Content-Type` header

No `Content-Type` header

Erroneous handling of `Accept` header

No RDF

Erroneous IRIs

Incorrect escaping

`\u`-notation

Absent escaping

Unescaped backslash characters

Unescaped caret characters

Unescaped double quote

Unescaped space characters

Scheme grammar violations

Compression errors

GNU zip errors

Syntax errors

Alias overloading

Files

errors

Directory actions

More options

Directory actions

More options

Latest commit

History

errors

Folders and files

parent directory

README.md

Erroneous datasets

Certificate error

Does not exist (404 reply)

Server error (503)

Intentional server error (503) for DDoS mitigation

Flaky server

Erroneous Content-Type header

No Content-Type header

Erroneous handling of Accept header

No RDF

Erroneous IRIs

Incorrect escaping

\u-notation

Absent escaping

Unescaped backslash characters

Unescaped caret characters

Unescaped double quote

Unescaped space characters

Scheme grammar violations

Compression errors

GNU zip errors

Syntax errors

Alias overloading

Erroneous `Content-Type` header

No `Content-Type` header

Erroneous handling of `Accept` header

`\u`-notation