Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph Store HTTP Protocol (GET, POST) back end #1668

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ add_library(engine
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LazyGroupBy.cpp GroupByHashMapOptimization.cpp SpatialJoin.cpp
CountConnectedSubgraphs.cpp SpatialJoinAlgorithms.cpp PathSearch.cpp ExecuteUpdate.cpp
Describe.cpp)
Describe.cpp GraphStoreProtocol.cpp)
qlever_target_link_libraries(engine util index parser sparqlExpressions http SortPerformanceEstimator Boost::iostreams s2)
32 changes: 32 additions & 0 deletions src/engine/GraphStoreProtocol.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Copyright 2024, University of Freiburg
// Chair of Algorithms and Data Structures
// Authors: Julian Mundhahs <[email protected]>

#include "engine/GraphStoreProtocol.h"

#include <boost/beast.hpp>

// ____________________________________________________________________________
GraphOrDefault GraphStoreProtocol::extractTargetGraph(
const ad_utility::url_parser::ParamValueMap& params) {
// Extract the graph to be acted upon using `Indirect Graph
// Identification`.
Comment on lines +12 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant to comment in header file.

const std::optional<std::string> graphIri =
ad_utility::url_parser::checkParameter(params, "graph", std::nullopt);
const bool isDefault =
ad_utility::url_parser::checkParameter(params, "default", "").has_value();
if (!(graphIri.has_value() || isDefault)) {
throw std::runtime_error("No graph IRI specified in the request.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can elaborate on how to specify graphs, to make life easier for the user.

}
if (graphIri.has_value() && isDefault) {
throw std::runtime_error(
"Only one of `default` and `graph` may be used for graph "
"identification.");
}
if (graphIri.has_value()) {
return GraphRef::fromIrirefWithoutBrackets(graphIri.value());
} else {
AD_CORRECTNESS_CHECK(isDefault);
return DEFAULT{};
}
}
134 changes: 134 additions & 0 deletions src/engine/GraphStoreProtocol.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
// Copyright 2024, University of Freiburg
// Chair of Algorithms and Data Structures
// Authors: Julian Mundhahs <[email protected]>

#pragma once

#include <gtest/gtest_prod.h>

#include "parser/ParsedQuery.h"
#include "parser/RdfParser.h"
#include "util/http/HttpUtils.h"
#include "util/http/UrlParser.h"

class GraphStoreProtocol {
private:
static ParsedQuery transformPost(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the functions and everything.

const ad_utility::httpUtils::HttpRequest auto& rawRequest,
const GraphOrDefault& graph) {
using namespace boost::beast::http;
using Re2Parser = RdfStringParser<TurtleParser<Tokenizer>>;
std::string contentTypeString;
if (rawRequest.find(field::content_type) != rawRequest.end()) {
contentTypeString = rawRequest.at(field::content_type);
}
if (contentTypeString.empty()) {
// ContentType not set or empty; we don't try to guess -> 400 Bad Request
}
const auto contentType =
ad_utility::getMediaTypeFromAcceptHeader(contentTypeString);
Comment on lines +22 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this as a function.
And there is an empty if (probably misssing TODO, do you want this to throw?

std::vector<TurtleTriple> triples;
switch (contentType.value()) {
case ad_utility::MediaType::turtle:
case ad_utility::MediaType::ntriples: {
auto parser = Re2Parser();
parser.setInputStream(rawRequest.body());
triples = parser.parseAndReturnAllTriples();
break;
}
default: {
// Unsupported media type -> 415 Unsupported Media Type
throw std::runtime_error(absl::StrCat(
"Mediatype \"", ad_utility::toString(contentType.value()),
"\" is not supported for SPARQL Graph Store HTTP "
"Protocol in QLever."));
}
}
Comment on lines +30 to +46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also be a small function (which can be defined in the .cpp file.
Also report in the error message, which types ARE supported, s.t. the user can fix this.
And for the handling of 415 you probably need a dedicated exception type.

ParsedQuery res;
auto transformTurtleTriple = [&graph](const TurtleTriple& triple) {
AD_CORRECTNESS_CHECK(triple.graphIri_.isId() &&
triple.graphIri_.getId() ==
qlever::specialIds().at(DEFAULT_GRAPH_IRI));
SparqlTripleSimpleWithGraph::Graph g{std::monostate{}};
if (std::holds_alternative<GraphRef>(graph)) {
g = Iri(std::get<GraphRef>(graph).toStringRepresentation());
}
Comment on lines +53 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IRI can be computed in advance. And it is somewhat wasteful to repeat the Graph IRI over and over (but maybe that is not super relevant, as the subject, predicate, and object are also strings here.

return SparqlTripleSimpleWithGraph(triple.subject_, triple.predicate_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to std::move here (I hope that ad_utility:transform supports this, otherwise do something else or let's think about something.

triple.object_, g);
};
updateClause::GraphUpdate up{
ad_utility::transform(triples, transformTurtleTriple), {}};
res._clause = parsedQuery::UpdateClause{up};
return res;
Comment on lines +57 to +62
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual triple parsing can also be a separate function. Everything that doesn't directly depend on the templated request can and should be extracted to the .cpp file.

}
FRIEND_TEST(GraphStoreProtocolTest, transformPost);

static ParsedQuery transformGet(const GraphOrDefault& graph) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a template, can be defined in .cpp, and in the header it can have acomment.

ParsedQuery res;
res._clause = parsedQuery::ConstructClause(
{{Variable("?s"), Variable("?p"), Variable("?o")}});
res._rootGraphPattern = {};
parsedQuery::GraphPattern selectSPO;
selectSPO._graphPatterns.emplace_back(parsedQuery::BasicGraphPattern{
{SparqlTriple(Variable("?s"), "?p", Variable("?o"))}});
if (std::holds_alternative<ad_utility::triple_component::Iri>(graph)) {
parsedQuery::GroupGraphPattern selectSPOWithGraph{
std::move(selectSPO),
std::get<ad_utility::triple_component::Iri>(graph)};
res._rootGraphPattern._graphPatterns.emplace_back(
std::move(selectSPOWithGraph));
} else {
AD_CORRECTNESS_CHECK(std::holds_alternative<DEFAULT>(graph));
res._rootGraphPattern = std::move(selectSPO);
}
Comment on lines +78 to +83
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we store the rootGraphPattern directly in the first case?

return res;
}
FRIEND_TEST(GraphStoreProtocolTest, transformGet);

public:
// Every Graph Store Protocol requests has equivalent SPARQL Query or Update.
// Transform the Graph Store Protocol request into it's equivalent Query or
// Update.
static ParsedQuery transformGraphStoreProtocol(
const ad_utility::httpUtils::HttpRequest auto& rawRequest) {
ad_utility::url_parser::ParsedUrl parsedUrl =
ad_utility::url_parser::parseRequestTarget(rawRequest.target());
GraphOrDefault graph = extractTargetGraph(parsedUrl.parameters_);

using enum boost::beast::http::verb;
auto method = rawRequest.method();
if (method == get) {
return transformGet(graph);
} else if (method == put) {
throw std::runtime_error(
"PUT in the SPARQL Graph Store HTTP Protocol is not yet implemented "
"in QLever.");
Comment on lines +103 to +105
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of the code dupliccation (everything except for the method name is redundant, implement a lambda that does the throwing).

} else if (method == delete_) {
throw std::runtime_error(
"DELETE in the SPARQL Graph Store HTTP Protocol is not yet "
"implemented in QLever.");
} else if (method == post) {
return transformPost(rawRequest, graph);
} else if (method == head) {
throw std::runtime_error(
"HEAD in the SPARQL Graph Store HTTP Protocol is not yet implemented "
"in QLever.");
} else if (method == patch) {
throw std::runtime_error(
"PATCH in the SPARQL Graph Store HTTP Protocol is not yet "
"implemented in QLever.");
} else {
throw std::runtime_error(
absl::StrCat("Unsupported HTTP method \"",
std::string_view{rawRequest.method_string()},
"\" for the SPARQL Graph Store HTTP Protocol."));
}
}

private:
// Extract the graph to be acted upon using `Indirect Graph
// Identification`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate what Indirect Graph Identification is, or where this term is defined.

static GraphOrDefault extractTargetGraph(
const ad_utility::url_parser::ParamValueMap& params);
FRIEND_TEST(GraphStoreProtocolTest, extractTargetGraph);
};
32 changes: 7 additions & 25 deletions src/engine/Server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -347,8 +347,8 @@

// We always want to call `Server::checkParameter` with the same first
// parameter.
auto checkParameter =
std::bind_front(&Server::checkParameter, std::cref(parameters));
auto checkParameter = std::bind_front(&ad_utility::url_parser::checkParameter,
std::cref(parameters));

Check warning on line 351 in src/engine/Server.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/Server.cpp#L350-L351

Added lines #L350 - L351 were not covered by tests

// Check the access token. If an access token is provided and the check fails,
// throw an exception and do not process any part of the query (even if the
Expand Down Expand Up @@ -537,9 +537,11 @@
std::pair<bool, bool> Server::determineResultPinning(
const ad_utility::url_parser::ParamValueMap& params) {
const bool pinSubtrees =
checkParameter(params, "pinsubtrees", "true").has_value();
ad_utility::url_parser::checkParameter(params, "pinsubtrees", "true")
.has_value();
const bool pinResult =
checkParameter(params, "pinresult", "true").has_value();
ad_utility::url_parser::checkParameter(params, "pinresult", "true")
.has_value();
return {pinSubtrees, pinResult};
}

Expand Down Expand Up @@ -740,6 +742,7 @@
MediaType Server::determineMediaType(
const ad_utility::url_parser::ParamValueMap& params,
const ad_utility::httpUtils::HttpRequest auto& request) {
using namespace ad_utility::url_parser;
// The following code block determines the media type to be used for the
// result. The media type is either determined by the "Accept:" header of
// the request or by the URL parameter "action=..." (for TSV and CSV export,
Expand Down Expand Up @@ -1118,24 +1121,3 @@
return true;
}
}

// _____________________________________________________________________________
std::optional<std::string> Server::checkParameter(
const ad_utility::url_parser::ParamValueMap& parameters,
std::string_view key, std::optional<std::string> value) {
auto param =
ad_utility::url_parser::getParameterCheckAtMostOnce(parameters, key);
if (!param.has_value()) {
return std::nullopt;
}
std::string parameterValue = param.value();

// If value is given, but not equal to param value, return std::nullopt. If
// no value is given, set it to param value.
if (value == std::nullopt) {
value = parameterValue;
} else if (value != parameterValue) {
return std::nullopt;
}
return value;
}
12 changes: 0 additions & 12 deletions src/engine/Server.h
Original file line number Diff line number Diff line change
Expand Up @@ -256,18 +256,6 @@ class Server {
/// HTTP error response.
bool checkAccessToken(std::optional<std::string_view> accessToken) const;

/// Checks if a URL parameter exists in the request, and it matches the
/// expected `value`. If yes, return the value, otherwise return
/// `std::nullopt`. If `value` is `std::nullopt`, only check if the key
/// exists. We need this because we have parameters like "cmd=stats", where a
/// fixed combination of the key and value determines the kind of action, as
/// well as parameters like "index-decription=...", where the key determines
/// the kind of action. If the key is not found, always return `std::nullopt`.
static std::optional<std::string> checkParameter(
const ad_utility::url_parser::ParamValueMap& parameters,
std::string_view key, std::optional<std::string> value);
FRIEND_TEST(ServerTest, checkParameter);

/// Check if user-provided timeout is authorized with a valid access-token or
/// lower than the server default. Return an empty optional and send a 403
/// Forbidden HTTP response if the change is not allowed. Return the new
Expand Down
4 changes: 4 additions & 0 deletions src/parser/TripleComponent.h
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,10 @@ class TripleComponent {
}
[[nodiscard]] Variable& getVariable() { return std::get<Variable>(_variant); }

bool isId() const { return std::holds_alternative<Id>(_variant); }
[[nodiscard]] const Id& getId() const { return std::get<Id>(_variant); }
[[nodiscard]] Id& getId() { return std::get<Id>(_variant); }
Comment on lines +179 to +180
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use nodiscard by default.
Only when it is really semantically subtle (e.g. RAII types like Lock-Guard).
That clang-tidy rule of "every const method should be nodiscard" doesn't really add much value.


/// Convert to an RDF literal. `std::strings` will be emitted directly,
/// `int64_t` is converted to a `xsd:integer` literal, and a `double` is
/// converted to a `xsd:double`.
Expand Down
3 changes: 2 additions & 1 deletion src/util/http/MediaTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ using enum MediaType;
// specified in the request. It's "application/sparql-results+json", as
// required by the SPARQL standard.
constexpr std::array SUPPORTED_MEDIA_TYPES{
sparqlJson, sparqlXml, qleverJson, tsv, csv, turtle, octetStream};
sparqlJson, sparqlXml, qleverJson, tsv, csv, turtle, ntriples, octetStream};

// _____________________________________________________________
const ad_utility::HashMap<MediaType, MediaTypeImpl>& getAllMediaTypes() {
Expand All @@ -40,6 +40,7 @@ const ad_utility::HashMap<MediaType, MediaTypeImpl>& getAllMediaTypes() {
add(sparqlXml, "application", "sparql-results+xml", {});
add(qleverJson, "application", "qlever-results+json", {});
add(turtle, "text", "turtle", {".ttl"});
add(ntriples, "application", "n-triples", {".nt"});
add(octetStream, "application", "octet-stream", {});
return t;
}();
Expand Down
1 change: 1 addition & 0 deletions src/util/http/MediaTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ enum class MediaType {
tsv,
csv,
turtle,
ntriples,
octetStream
};

Expand Down
22 changes: 22 additions & 0 deletions src/util/http/UrlParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

using namespace ad_utility::url_parser;

// _____________________________________________________________________________
std::optional<std::string> ad_utility::url_parser::getParameterCheckAtMostOnce(
const ParamValueMap& map, string_view key) {
if (!map.contains(key)) {
Expand All @@ -21,6 +22,27 @@ std::optional<std::string> ad_utility::url_parser::getParameterCheckAtMostOnce(
}
return value.front();
}

// _____________________________________________________________________________
std::optional<std::string> ad_utility::url_parser::checkParameter(
const ParamValueMap& parameters, std::string_view key,
std::optional<std::string> value) {
const auto param = getParameterCheckAtMostOnce(parameters, key);
if (!param.has_value()) {
return std::nullopt;
}
std::string parameterValue = param.value();

// If no value is given, return the parameter's value. If value is given, but
// not equal to the parameter's value, return `std::nullopt`.
if (value == std::nullopt) {
value = parameterValue;
} else if (value != parameterValue) {
return std::nullopt;
}
return value;
}

// _____________________________________________________________________________
ParsedUrl ad_utility::url_parser::parseRequestTarget(std::string_view target) {
auto urlResult = boost::urls::parse_origin_form(target);
Expand Down
7 changes: 7 additions & 0 deletions src/util/http/UrlParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ using ParamValueMap = ad_utility::HashMap<string, std::vector<string>>;
std::optional<std::string> getParameterCheckAtMostOnce(const ParamValueMap& map,
string_view key);

// Checks if a parameter exists, and it matches the
// expected `value`. If yes, return the value, otherwise return
// `std::nullopt`.
std::optional<std::string> checkParameter(const ParamValueMap& parameters,
std::string_view key,
std::optional<std::string> value);

// A parsed URL.
// - `path_` is the URL path
// - `parameters_` is a map of the HTTP Query parameters
Expand Down
2 changes: 2 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -437,3 +437,5 @@ addLinkAndDiscoverTest(UrlParserTest)
addLinkAndDiscoverTest(ServerTest engine)

addLinkAndDiscoverTest(ExecuteUpdateTest engine)

addLinkAndDiscoverTest(GraphStoreProtocolTest engine)
Loading
Loading