COMPENDIUM

Compendium is a processor for generating, unifying and converting different input sources like AsciiDoc files, Markdown files, Confluence and HTML websites into different output formats. We can select all the content or only parts of it from the input files and generate an AsciiDoc, Markdown, HTML or PDF as output. A composition of concise but detailed information about a particular subject obtained from different sources with the aim to be published in a single document, book or other publication.

Operating Mode

Compendium uses a JSON Config file with two parts that define where and how to get the desired input data:

Config JSON

Input Sources

Compendium currently accepts AsciiDoc, Markdown, Confluence and HTML as input formats.

In this part of the configuration file define the sources of the input files, the types and assign a reference id or name to them.

reference: id of the source.
source_type: (i.e asciidoc, markdown, html-url, confluence).
source: URL or PATH where the information is located. (i.e. https://adcenter.pl.s2-eu.capgemini.com/confluence/)

"sources": [
  {
    "reference": "project1",
    "source_type": "asciidoc",
    "source": "./test-data/input/input-data1"
  },
  {
    "reference": "project2",
    "source_type": "asciidoc",
    "source": "./test-data/input/input-data2"
  },
  {
    "reference": "confluence2",
    "source_type": "confluence",
    "source": "https://adcenter.pl.s2-eu.capgemini.com/confluence/",
    "space": "JQ",
    "context": "capgemini"
  }
]

[x] To read from confluence internal network add this arguments to the source part:

context: capgemini
space: space key of the project, all the urls of the project have this letters. i.e.: (https://adcenter.pl.s2-eu.capgemini.com/confluence/display/HD/2.+Objectives ) space ⇒ HD

[ ] To read from confluence private account add this arguments to the source part:

context: external
space: depend on the account, all the urls have a two or three letters / < context > /.

Documents and Sections

reference: it refers the source reference, must be the same (source id).
document: file name or name/id project inside source path referred (i.e 6.+Entity+relationship+diagram).
sections: section/s that you want to extract. If you want to extract all the content of the document you should leave this argument blank, but if you want to extract different sections, write them in an array. (i.e sections: [h1, h3])

"documents": [
  {
    "reference": "project1",
    "document": "manual"
  },
  {
    "reference": "project2",
    "document": "brownfox2"
  },
  {
    "reference": "project2",
    "document": "paragraph1"
  },
  {
    "reference": "confluence2",
    "document": "Jump+the+queue+Home",
    "sections": ["Epic 2. Consult the queue"]
  }
]

Types of Inputs available and parameters

AsciiDoc documents:
- source_type: asciidoc (reads directly from local .adoc documents)
- source: Local Path.
Markdown documents:
- source_type: markdown (reads directly from local .md documents)
- source: Local Path.
Confluence pages:
- source_type: confluence
- source: base url of confluence account
- context: capgemini (internal network) or external(private confluence account)
- space: JQ (project space key)
HTML pages directly from a website:
- source_type: url-html
- source: url In the url-html type the document part have an optional attribute: (document is an index, where we have to extract all the links from. And include them in the output file, so that we download all the pages from a site). The document has to be unique and consider the following:
  - document: index url
  - is_index: true or false (to indicate if we have to read an index)

Types of Outputs available

PDF
HTML
AsciiDoc
Markdown

COMPENDIUM Main Workflow

Compendium works like a merger and compiler. It gets pieces of information from different sources and formats, merges them into a single file and generates an output file with the desired output format (PDF, HTML, AsciiDoc or Markdown).

In this section the main compilation and merging process is described:

Compiler - FRONTEND

Lexical Analysis:
- The sequences of characters from the input files are tokenized in the Scanner or Tokenization process.
- TextIn objects transform the input source code to an HTML tokenized code using Asciidoctor.js for the AsciiDoc files or Showdown.js for the Markdown files. Confluence data is recovered via JSON and can be transcoded directly. Obviously, HTML files are already tokenized.
Syntax Analysis:
- The HTML tokenized code is parsed to a Parse Tree in the Parsing or Hierarchical Analysis process.
- TextIn objects parse XML (HTML) code using html-parser generating a Tree datastructure that represents the content. TextIn objects then go through all of the branches of the parsed Tree returning Array<TextSegment> elements that will be used to generate the Transcript objects.
Semantic Analysis:
- The Array<TextSegment> is iterated through filtering functions that remove the unwanted information.

At this point we have parsed all the input information and filtered it. We have an Array<TextSegment> with all the pieces of information from the sources.

Compiler - MIDDLE END

Intermediate Code Generation:
- The Array<TextSegment> that we generated on the previous steps is now used to create Transcript objects that will contain all the TextSegments information.
- Our Transcript elements are an intermediate representation of the input source data that makes easier to work with them to generate the output file.
Code Optimization:
- In this step, all the Transcript elements are merged using the Merger into a single Transcript. The output is almost ready to generate a file with the selected output format.

Compiler - BACKEND

Output Code Generation:
- In this final step, our Transcript elements are the input for the TextOut object that will generate the output code in the desired format.
- Transcript elements are used to generate an AsciiDoc that will be transformed to HTML using asciidoctor that is used as Intermediate Code again for generating the correct output. Markdown is generated using TurnDown service from the HTML code, PDF using htmlto also from the HTML and AsciiDoc directly from the code generated from the Transcript elements.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
images		images
src		src
test-data		test-data
test		test
.gitignore		.gitignore
CODE_OF_CONDUCT.asciidoc		CODE_OF_CONDUCT.asciidoc
CONTRIBUTING_GUIDE.asciidoc		CONTRIBUTING_GUIDE.asciidoc
ISSUE_TEMPLATE.asciidoc		ISSUE_TEMPLATE.asciidoc
LICENSE		LICENSE
PR_TEMPLATE.asciidoc		PR_TEMPLATE.asciidoc
TERMS_OF_USE.asciidoc		TERMS_OF_USE.asciidoc
compendium.code-workspace		compendium.code-workspace
mocha.opts		mocha.opts
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
tsconfig.json		tsconfig.json
tslint.json		tslint.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMPENDIUM

Operating Mode

Config JSON

Input Sources

Documents and Sections

Types of Inputs available and parameters

Types of Outputs available

COMPENDIUM Main Workflow

Compiler - FRONTEND

Compiler - MIDDLE END

Compiler - BACKEND

About

Releases

Packages

Languages

License

lpinon/compendium

Folders and files

Latest commit

History

Repository files navigation

COMPENDIUM

Operating Mode

Config JSON

Input Sources

Documents and Sections

Types of Inputs available and parameters

Types of Outputs available

COMPENDIUM Main Workflow

Compiler - FRONTEND

Compiler - MIDDLE END

Compiler - BACKEND

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages