Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review/rt 2 #3

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions section/abstract.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,21 @@
% Context
Centralization of web information can raise legal and ethical problems, especially in the context of social applications.
% Need
Decentralizing this information offers a potential solution, but maintaining query performance remains a challenge.
Link Traversal Query Processing (LTQP) enables querying in large-scale networks of decentralized data but suffers from long execution times and high data usage,
largely due to the extensive HTTP requests required for network exploration.
Decentralizing this information offers a potential solution, but achieving acceptable query performance remains a challenge.
Link Traversal Query Processing (LTQP) enables querying in large-scale networks of decentralized data but suffers from long execution times and high data transfer,
largely due to the extensive number of HTTP requests required for network exploration.
% Task
This paper introduces a shape-based pruning approach to minimize the search space of traversal queries.
To solve this problem, we introduce a shape-based pruning approach to minimize the search space of traversal queries.
The approach utilizes \emph{shape indexes} provided by data providers in networks of decentralized knowledge graphs to reduce the search space using a \emph{query-shape containment} algorithm.
% Object
This work introduces link pruning in LTQP by formalizing the shape index and query-shape containment approach, and evaluates its impact on the performance of traversal queries.
In this article, we formalize this shape index and query-shape containment approach as a link pruning mechanism for LTQP,
and evaluate its impact on the performance of queries in a social media context.
% Findings
Our findings show that shape-based data summarization can reduce the query execution time and network usage of selective traversal queries by up to 7 times in our benchmark.
Our findings show that shape-based link pruning can reduce the query execution time and network usage of selective queries by up to 7 times.\rt{Can you also mention if there is an increase in server cost? If not, also mention that.}
% Conclusion
This performance gain, achieved without delegating queries to endpoints, makes our approach a strong candidate for handling selective queries in large networks of structured, decentralized knowledge graphs.
Our work shows the benefits of exposing shape-based metadata for handling selective LTQP queries in large networks of structured, decentralized knowledge graphs.

\keywords{Linked data,
\keywords{Linked Data,
Link Traversal Query Processing,
RDF data shapes,
Decentralization,
Expand Down
6 changes: 4 additions & 2 deletions section/conclusion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@ \section{Conclusion}

In this article, we minimized the search space for link traversal queries by leveraging a shape index and addressing the query-shape containment problem.
Additionally, we introduced pruning in LTQP by extending the concept of reachability criteria.
Using the Solidbench benchmark, we demonstrated that our approach can improve query execution times by up to 7 times.
Our adaptive method effectively handles scenarios with reduced information in shape indexes or their partial absence in a network.
Using the Solidbench benchmark, we demonstrated that our approach can significantly reduce the number of HTTP requests, which leads to query execution time reductions by up to 7 times.
Our adaptive method effectively handles scenarios with reduced information in shape indexes or their partial absence in a network.\rt{This sentence can go, as it does not say a lot here.}
This study highlights that shape-based pruning can be highly effective for LTQP in decentralized environments with structural properties, especially for selective queries.
These findings are particularly relevant for decentralization initiatives that aim to enable users or third-party clients to perform efficient queries over large, diverse networks.
Future work could explore further advancements in this area, such as enhancing query planning in LTQP~\cite{taelman2024towards} with RDF data shapes.

\rt{I still have some open questions after reading the full article: What is the impact on server load (CPU usage)? Does adding shapes lead to an increase of that? Because if not, you could conclude here by saying that adding shapes adds significant benefits to the client, with no overhead to the server, except for a (possibly offline) process for shape creation/derivation. Also, does adding shapes influence query result arrival times negatively or positively?}
24 changes: 12 additions & 12 deletions section/experiment.tex
Original file line number Diff line number Diff line change
Expand Up @@ -41,24 +41,24 @@ \section{Experimental Setup}
The implementation of our shape index approach is open source~\sepfootnote{sf:implementationComunica}, as well as our query-shape containment solver~\sepfootnote{sf:implementationQueryShapeContainment}.
We use SolidBench~\cite{Taelman2023}, based on the LDBC social network benchmark~\cite{Angles2020}, to evaluate our contribution.
To facilitate this, we created an open-source module~\sepfootnote{sf:shapeIndexGenerator} to generate shape indexes in SolidBench, based on user-provided mappings between ShEx shapes and data model objects.
The shape annotated portion of the data model includes posts, comments on posts, user profiles, cities, and likes.
The datasets are Solid Pods~\cite{Taelman2023}
The shape-annotated portion of the data model includes posts, comments on posts, user profiles, cities, and likes.
The datasets are Solid Pods~\cite{Taelman2023} \rt{There's probably a better citation for Solid}
Each Solid Pod contains alongside the data a shape index and separate files for each shape definition.
Some shapes are nested within others.
For example, profiles are associated with cities, and comment are associated with posts.
For example, \rt{user profiles?} profiles are associated with cities, and comment are associated with posts.
Depending on the pod instance, certain data model objects are materialized in a single file, while others are distributed across multiple files.
The entire data model and query templates are available online~\sepfootnote{sf:solidbench}.
The entire data model and query templates are available online~\sepfootnote{sf:solidbench}.\rt{I would say that queries simulate typical (read) actions in social media use cases, perhaps with an example.}

To evaluate our approach, we conducted the following experiments, we first measured the execution time and results of our query-shape containment algorithm using the shapes from the study.
Then, We compared the Solid Pod network optimal traversal algorithm, which uses the LDP specification and type-index~\cite{Taelman2023}, with the LDP traversal algorithm~\cite{Taelman2023} and our shape index approach in a network where each Solid Pods (datasets) provides a complete shape index with the most descriptive shapes.
Additionally, we evaluated the adaptivity of our approach by reducing the shape index information across the network:
To evaluate our approach, we conducted the following experiments, we first measured the execution time and results of our query-shape containment algorithm using the shapes from the study.
Then, we compared the state-of-the-art Solid Pod network traversal algorithm, which uses the LDP specification and type-index~\cite{Taelman2023}, with the LDP traversal algorithm~\cite{Taelman2023} and our shape index approach in a network where each Solid Pod provides a complete shape index with the most descriptive \rt{which shapes are that?} shapes.
Additionally, we evaluated the adaptivity \rt{Not sure adaptivity is the right term here. Something like \emph{shape index variability} may be better.} of our approach by reducing the shape index information across the network:
\begin{itemize}
\item We compared the impact of query execution time in a network where 0\%, 20\%, 50\%, and 80\% of Solid Pods expose a shape index.
\item We compared the impact of using shape indexes with 20\%, 50\%, and 80\% of entries using closed shapes.
\item We compared the impact of using shapes that incorporate only data from the Solid Pods, and shapes providing a minimal dataset description where the object constraints are always an IRI or a literal.
\item Query execution time in a network where 0\%, 20\%, 50\%, and 80\% of Solid Pods expose a shape index. \rt{Why mention a metric here, but not in the next two?}
\item Impact of shape indexes with 20\%, 50\%, and 80\% of entries using closed shapes.
\item Impact of shapes that incorporate only data from the Solid Pods, and shapes providing a minimal dataset description where the object constraints are always an IRI or a literal.\rt{Unclear why this is needed, and what it would look like.}
\end{itemize}
We used query templates from SolidBench, each with five instances varying the starting pod.
Experiments were repeated 50 times with a 2 minute timeout (120,000 ms).
We used query templates from SolidBench, each replicated five times with varying starting pods.
Experiments were repeated 50 times with a 2 minute timeout.
They were conducted on an Ubuntu 20.04.6 LTS machine with a 2x Hexacore Intel E5645 CPU and 24GB RAM.
All experiments are reproducible, with raw data and complementary materials available online~\sepfootnote{sf:complementaryMaterial}.

Expand Down
Loading