Skip to content

Commit

Permalink
Merge pull request #7 from AKSW/master
Browse files Browse the repository at this point in the history
Rebasing on AKSW/NSpM.
  • Loading branch information
mommi84 authored May 15, 2019
2 parents 4119a44 + 42364d8 commit bdd2f70
Show file tree
Hide file tree
Showing 41 changed files with 110 additions and 25 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ ENV/
# mypy
.mypy_cache/

# macOS
.DS_Store

.idea/

data/*/*
51 changes: 32 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# 🤖 Neural SPARQL Machines
A LSTM-based Machine Translation Approach for Question Answering.

![alt text](http://www.liberai.org/img/flag-uk-160px.png "English")
![alt text](http://www.liberai.org/img/seq2seq-webexport-160px.png "seq2seq")
![alt text](http://www.liberai.org/img/flag-sparql-160px.png "SPARQL")
![British flag.](http://www.liberai.org/img/flag-uk-160px.png "English")
![Seq2Seq neural network.](http://www.liberai.org/img/seq2seq-webexport-160px.png "seq2seq")
![Semantic triple flag.](http://www.liberai.org/img/flag-sparql-160px.png "SPARQL")

## Code

Expand All @@ -23,7 +23,9 @@ Install TensorFlow (e.g., `pip install tensorflow`).

The template used in the paper can be found in a file such as `annotations_monument.tsv`. To generate the training data, launch the following command.

<!-- Made monument_300 directory in data directory due to absence of monument_300 folder in data directory -->
```bash
mkdir data/monument_300
python generator.py --templates data/annotations_monument.csv --output data/monument_300
```

Expand All @@ -35,16 +37,19 @@ python build_vocab.py data/monument_300/data_300.sparql > data/monument_300/voca
```

Count lines in `data_.*`
<!-- Fixing the bash related error pertaining to assigning value to NUMLINES here -->
```bash
NUMLINES= $(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)
NUMLINES=$(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)
echo $NUMLINES
# 7097
```

Split the `data_.*` files into `train_.*`, `dev_.*`, and `test_.*` (usually 80-10-10%).

<!-- Making this instruction consistent with the previous instructions by changing data.sparql to data_300.sparql -->
```bash
cd data/monument_300/
python ../../split_in_train_dev_test.py --lines $NUMLINES --dataset data.sparql
python ../../split_in_train_dev_test.py --lines $NUMLINES --dataset data_300.sparql
```

#### Pre-generated data
Expand All @@ -53,7 +58,8 @@ Alternatively, you can extract pre-generated data from `data/monument_300.zip` a

### Training

Launch `train.sh` to train the model. The first parameter is the prefix of the data directory. The second parameter is the number of training epochs.
<!-- Just a simple note to go back to the initial directory.-->
Now go back to the initail directory and launch `train.sh` to train the model. The first parameter is the prefix of the data directory and the second parameter is the number of training epochs.

```bash
sh train.sh data/monument_300 120000
Expand All @@ -69,13 +75,15 @@ Predict the SPARQL sentence for a given question with a given model.
sh ask.sh data/monument_300 "where is edward vii monument located in?"
```

## Paper
## Papers

### Soru and Marx et al., 2017

* Permanent URI: http://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
* arXiv: https://arxiv.org/abs/1708.07624

```
@proceedings{soru-marx-2017,
@inproceedings{soru-marx-2017,
author = "Tommaso Soru and Edgard Marx and Diego Moussallem and Gustavo Publio and Andr\'e Valdestilhas and Diego Esteves and Ciro Baron Neto",
title = "{SPARQL} as a Foreign Language",
year = "2017",
Expand All @@ -84,18 +92,23 @@ sh ask.sh data/monument_300 "where is edward vii monument located in?"
}
```

## Contact

* Neural SPARQL Machines [mailing list](https://groups.google.com/forum/#!forum/neural-sparql-machines).
* Follow the [project on ResearchGate](https://www.researchgate.net/project/Neural-SPARQL-Machines).

### Soru et al., 2018

* NAMPI Website: https://uclmr.github.io/nampi/
* arXiv: https://arxiv.org/abs/1806.10478

## Aman Mehta - [GSoC]

Hi, this is a first commit test on gsoc-aman branch.
Please find my blog [here](https://amanmehta-maniac.github.io) - here you will find details about what this project had to offer.
1. To be able to generate the dataset automatically, there is a five step pipeline which you would have to follow, guided at 'PIPELINE' file.
2. Otherwise you can directly use the data generated under data/place_v2.zip and data/Compositions_v2.zip
```
@inproceedings{soru-marx-nampi2018,
author = "Tommaso Soru and Edgard Marx and Andr\'e Valdestilhas and Diego Esteves and Diego Moussallem and Gustavo Publio",
title = "Neural Machine Translation for Query Construction and Composition",
year = "2018",
journal = "ICML Workshop on Neural Abstract Machines \& Program Induction (NAMPI v2)",
url = "https://arxiv.org/abs/1806.10478",
}
```

## Contact

* Primary contacts: [Tommaso Soru](http://tommaso-soru.it) and [Edgard Marx](http://emarx.org).
* Neural SPARQL Machines [mailing list](https://groups.google.com/forum/#!forum/neural-sparql-machines).
* Follow the [project on ResearchGate](https://www.researchgate.net/project/Neural-SPARQL-Machines).
12 changes: 12 additions & 0 deletions analyse.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Analysis and validation of translated questions into queries.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.1.0-akaha
"""
import argparse
import collections
import json
Expand Down
9 changes: 9 additions & 0 deletions build_vocab.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Build the vocabulary.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.0.4
Usage: python build_vocab.py data.en > vocab.en
"""
import numpy as np
Expand Down
Binary file added data/movies_300.zip
Binary file not shown.
12 changes: 12 additions & 0 deletions filter_dataset.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Filter dataset by a given criterion.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.1.0-akaha
"""
import argparse
import collections
import json
Expand Down
12 changes: 12 additions & 0 deletions generator_test.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Generator test unit.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.1.0-akaha
"""
import generator
import generator_utils
import operator
Expand Down
18 changes: 12 additions & 6 deletions generator_utils.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Generator utils.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.0.4
"""
import collections
import httplib
import json
Expand Down Expand Up @@ -33,12 +45,6 @@ def save_cache ( file, cache ):
with open(file, 'w') as outfile:
json.dump(ordered, outfile)

# proxies = {'http': 'http://proxy.iiit.ac.in:8080/', 'https': 'http://proxy.iiit.ac.in:8080/'}
# proxy_handler = urllib2.ProxyHandler(proxies)
# opener = urllib2.build_opener(proxy_handler)
# urllib2.install_opener(opener)


def query_dbpedia( query ):
param = dict()
param["default-graph-uri"] = GRAPH
Expand Down
Binary file added gsoc/aman/.DS_Store
Binary file not shown.
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions gsoc/aman/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## Aman: Work done during DBpedia's Google Summer of Code 2018

Hi, please find my blog here: https://amanmehta-maniac.github.io. - You will find details about what this project based on https://github.com/AKSW/NSpM had to offer.

1. To be able to generate the dataset automatically, there is a five step pipeline which you would have to follow, guided at 'PIPELINE' file.
2. Otherwise you can directly use the data generated under `./data/place_v2.zip` and `./data/Compositions_v2.zip`.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
12 changes: 12 additions & 0 deletions split_in_train_dev_test.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
#!/usr/bin/env python
"""
Neural SPARQL Machines - Split into train, dev, and test sets.
'SPARQL as a Foreign Language' by Tommaso Soru and Edgard Marx et al., SEMANTiCS 2017
https://w3id.org/neural-sparql-machines/soru-marx-semantics2017.html
https://arxiv.org/abs/1708.07624
Version 0.0.4
"""
import argparse
import random
import os
Expand Down

0 comments on commit bdd2f70

Please sign in to comment.