Skip to content

Commit

Permalink
Merge branch 'release/0.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
lukavdplas committed Mar 14, 2024
2 parents 008749c + 69889ea commit 68be9fc
Show file tree
Hide file tree
Showing 28 changed files with 1,595 additions and 245 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ venv.bak/

*.egg-info/

dist/
dist/
build/
26 changes: 25 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,28 @@ Run unit tests with

```sh
pytest
```
```

## Writing documentation

Documentation is based on [mkdocs](https://www.mkdocs.org).

### Commands

Start the live-reloading docs server:

```sh
mkdocs serve
```

Build the documentation site:

```sh
mkdocs build
```

Print help message and exit:

```sh
mkdocs -h
```
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# I-analyzer Readers

[![Python package](https://github.com/UUDigitalHumanitieslab/ianalyzer-readers/actions/workflows/python-package.yml/badge.svg)](https://github.com/UUDigitalHumanitieslab/ianalyzer-readers/actions/workflows/python-package.yml)
[![Documentation Status](https://readthedocs.org/projects/ianalyzer-readers/badge/?version=latest)](https://ianalyzer-readers.readthedocs.io/en/latest/?badge=latest)

`ianalyzer-readers` is a python module to extract data from XML, HTML, CSV or XLSX files.

This module was originally created for [I-analyzer](https://github.com/UUDigitalHumanitieslab/I-analyzer), a web application that extracts data from a variety of datasets, indexes them and presents a search interface. To do this, we wanted a way to extract data from source files without having to write a new script "from scratch" for each dataset, and an API that would work the same regardless of the source file type.
Expand Down Expand Up @@ -28,11 +31,11 @@ What we find especially useful is that all subclasses of `Reader` have the same

## Usage

*Usage documentation is not yet complete.*
Typical usage of this package would be to make a custom Python class for a dataset from which you want to extract a list of documents. We call this a `Reader`. This package provides the base classes to structure readers, and provides extraction utilities for several file types.

Typical use is that, for each dataset you want to extract, you create a subclass of `Reader` and define required properties. See the [CSV test corpus](./tests/mock_csv_corpus.py) for an example.
For detailed usage documention and examples, visit [ianalyzer-readers.readthedocs.io](https://ianalyzer-readers.readthedocs.io/en/latest/)

After defining the class for your dataset, you can call the `documents()` method to get a generator of document dictionaries.
If this site is unavailable, you can also generate the documentation site locally; see the [contributing guide](./CONTRIBUTING.md) for insttructions.

## Licence

Expand Down
37 changes: 37 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# API documentation

## Core classes

__Module:__ `ianalyzer_readers.readers.core`

::: ianalyzer_readers.readers.core

## CSV reader

__Module:__ `ianalyzer_readers.readers.csv`

::: ianalyzer_readers.readers.csv

## XLSX reader

__Module:__ `ianalyzer_readers.readers.xlsx`

::: ianalyzer_readers.readers.xlsx

## XML reader

__Module:__ `ianalyzer_readers.readers.xml`

::: ianalyzer_readers.readers.xml

## HTML reader

__Module:__ `ianalyzer_readers.readers.html`

::: ianalyzer_readers.readers.html

## Extractors

__Module:__ `ianalyzer_readers.extract`

::: ianalyzer_readers.extract
Loading

0 comments on commit 68be9fc

Please sign in to comment.