Unfortunately, many datasets cannot be included into the LOD Cloud because they do follow standards. Datasets that are currently not included because of errors are described in this directory.
The following datasets can not be accessed because of an incorrect certificate:
The following dataset can not be accessed because their online location does not exist:
- A DAML Ontology of Time
- An Entry Sub-Ontology of Time in OWL
- DatCatInfo
- Dublin Core Elements
- Geometry Ontology
- Ontology of Rhetorical Blocks (ORB)
- PRISM
- SALT Rhetorical Ontology (SRO)
- SIOC
- Spatial Relations Ontology (SRO)
- The administrative geography and civil voting area ontology
- WAIVER
Some servers use CloudFlare DDoS mitigation. The intention is to allow human users who access the data through a web browser with JavaScript engine, and to disallow machine users who access the data from scripts (and typically without a JavaScript engine).
The CloudFlare process works as follows:
-
When a human user first visits a URL in their web browser with JavaScript engine, the browser does not know that the URL will be serviced from a CloudFlare server. This is therefore a regular HTTP request.
-
The CloudFlare server does not support regular HTTP requests and sends back a 503 server error, together with an HTML page body containing JavaScript code.
-
The web browser loads the HTML page with the intention of showing a human-readable message to the user. However, the HTML page also includes JavaScript code from CloudFlare. This code is tries to determine whether the user is a legitimate user. The requirements for being a legitimate user are unclear, but at least includes the requirements under step (1): accessing the URL through a web browser with a JavaScript engine.
-
If the CloudFlare code determines that the user who isssued the HTTP request under step (1) is a valid user, the JavaScript code automatically issues another HTTP requests for the URL originally requested in step (1). The JavaScript code ensures that the new request contains certain tokens to communicate to the server that the request is now initiated through CloudFlare JavaScript code.
-
Observing the tokens in the new HTTP request, the server verifies whether the tokens make the request eligible for a reply. If the request is considered eligible, the reply will provide access to the resource requested in step (1). The reply will also contain
Set-Cookie
header. -
The web browser retains the cookie. If the same resource is requested in the future, the web browser will include the cookie in the request, and will directly obtain access to the resource.
Since this process requires a JavaScript engine and Cookie store, most machine users will not be able to access datasets disseminated through CloudFlare.
The following datasets cannot be accessed because they use this CloudFlare approach:
The following dataset regularly cannot be downloaded because of an unstable server:
- Semantic Finlex
The following datasets are serviced with an incorrect Content-Type
header:
- AIFB binary/octet-stream] emits
binary/octet-stream
i.o.text/n3
. - BabelNet emits
text/rdf+n3;charset=utf-8
i.o.text/turtle
. - BIBO emits
application/xml
i.o.application/rdf+xml
. - Bibsonomy emits
application/xml
i.o.application/rdf+xml
. - Function Ontology emits
application/octet-stream
i.o.text/turtle
. - Infection Transmission Ontology
emits
application/octect-stream
i.o.text/turtle
. - OGC GeoSPARQL emits
text/xml
i.o.application/rdf+xml
. - Linked Art emits
application/xml
i.o.application/rdf+xml
. - Provenance emits
application/rdf\+xml
i.o.application/rdf+xml
. - Public Contracts Ontology emits
text/plain
i.o.application/rdf+xml
. - SDMX Attribute emits
text/plain; charset=utf-8
i.o.text/turtle
. - SDMX Code emits
text/plain; charset=utf-8
; should betext/turtle
. - SDMX Concept emits
text/plain; charset=utf-8
i.o.text/turtle
. - SDMX Dimension emits
text/plain; charset=utf-8
i.o.text/turtle
. - SDMX Measure emits
text/plain; charset=utf-8
i.o.text/turtle
. - W3C R2RML emits
text/html
i.o.application/rdf+xml
.
The following datasets emit no Content-Type
header at all:
- Agrontology
- Datatype Schema
- DOAP
- lexinfo 2.0
- lexinfo 3.0
- Lexvo Ontology
- SP-statements vocabulary
- SPIN: Modeling Vocabulary
The following Accept
header value is used when accessing RDF
documents online:
application/trig,
application/n-quads,
application/n-triples;q=0.9,
text/turtle;q=0.9,
application/x-turtle;q=0.9,
text/rdf+n3;q=0.9,
application/rdf+xml;q=0.8,
text/plain;q=0.8,
*/*;q=0.7
The following datasets are serviced from servers that cannot process
the above Accept
header:
- DBpedia DataID
- Getty VoID
- ISBD ELements
- W3C Data Cube
- W3C Metadata Vocabulary for Tabular Data
- W3C RDFa
- W3C SPARQL Service Description
- W3C SPARQL Terms
Use the following cURL command to test these URLs:
curl -vL -H 'Accept: application/trig, application/n-quads, application/n-triples;q=0.9, text/turtle;q=0.9, application/x-turtle;q=0.9, text/rdf+n3;q=0.9, application/ld+json;q=0.85, application/rdf+xml;q=0.8, text/plain;q=0.8, */*;q=0.7' '{url}' | head
Requesting the following datasets results in a valid server reply, but do not return RDF data:
- SWRL emits
text/html; charset=iso-8859-1
.
Character escapes in IRIs must use %hh
-notation.
The following datasets use \u
-escaping:
- Library of Congress Names line 66,292,711:
<http://viaf.org/processed/NLI\u007C001461487>
- VIAF line 841,558:
<http://dbpedia.org/resource/National_Theatre_"To\u0161a_Jovanovi\u0107">
Some characters are not allowed to appear unescaped in IRIs.
- VIAF line 841,558:
<http://dbpedia.org/resource/National_Theatre_"To\u0161a_Jovanovi\u0107">
- Pleiades line 60.882:
<http://www.persee.fr/web/revues/home/prescript/article/racf_0220-6617_1991_num_30_1_2657?luceneQuery=%28%2B%28content%3AAQUAE+title%3AAQUAE^2.0+fullContent%3AAQUAE^100.0+fullTitle%3AAQUAE^140.0+summary%3AAQUAE+authors%3AAQUAE^5.0+illustrations%3AAQUAE^4.0>
; reported at isawnyu/pleiades-rdf#7.
- Linked Movie Database (2009-05-18)
line 46.397:
http://dbpedia.org/resource/Wkw/tk/1996%407%2755"hk.net
.
- ISO 19115-1@2014 file
https://raw.githubusercontent.com/ISO-TC211/GOM/master/isotc211_GOM_harmonizedOntology/iso19115/-1/2014/ExampleOfExtendedMatadata.rdf
contains a space in IRIhttp://def.isotc211.org/iso19115/-1/2014/ExampleOfExtendedMatadata/code/KeywordTypeCode -BioCollection
. - ISO 19115-1@2018 file
https://raw.githubusercontent.com/ISO-TC211/GOM/master/isotc211_GOM_harmonizedOntology/iso19115/-1/2018/ExampleOfExtendedMatadata.rdf
contains a space in IRIhttp://def.isotc211.org/iso19115/-1/2014/ExampleOfExtendedMatadata/code/KeywordTypeCode -BioCollection
. - LingHub line 11:
<http://logd.tw.rpi.edu/source/congress-gov/file/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/conversion/congress-gov-biographical-directory-of-the-united-state s-congress-2012-Jan-04.ttl.tgz>
- Linked Movie Database (2012-02-10)
line 35.710:
<http://data.linkedmdb.org/resource/country/iso alpha2>
. - Rijksmuseum Actors line 106.332:
<skos:exactMatch rdf:resource=" https://rkd.nl/explore/artists/420649"/>
The following datasets cannot be parsed because they contain a forward slash character in their scheme component:
- European Nature Information System (EUNIS) file
http://eunis.eea.europa.eu/rdf/countrybiogeo.rdf.gz
, line 17:countrybiogeo/AD:AL
.
The following dataset uses GNU zip compression, but seems to contain strange characters when decompressed:
- ConceptNet starts with
/a/[/r/
.
The following datasets contain syntax errors:
-
AtomOwl Vocabylary Specification Extension contains two predicate terms on line 73:
ax:updated rdfs:range a xsd:dateTime.
-
DBpedia URL
https://downloads.dbpedia.org/repo/lts/text/short-abstracts/2016.10.01/short-abstracts_lang=yue.ttl.bz2
contains an unescaped control character (0x11, device control 1) on line 2,016. -
GeoNames is a concatenation of RDF/XML files with IRIs interspersed. This was communicated to the GeoNames forum.
-
Getty Art & Architecture Thesaurus (AAT) file
AATOut_WikidataCoref.nt
does not use end-of-triple indicators (i.e., trailing dots).
- RDA value vocabularies: RDA Polarity has the same alias as RDA element sets: Place object propertirs.