RESTful service receiving json to construct a PDF document to various conformance levels
Standard maven build.
-
to package the
jar
filemvn clean package
-
to run the application execute jar -jar /path/to/jar/app.jar server /path/to/config.yml
- eg.
java -jar target/ms-html-to-pdfa-1.0-SNAPSHOT.jar server src/main/properties/dev.yml
- eg.
-
from the IDE run the
uk.gov.dwp.pdfa.application.HtmlToPdfApplication
with program argumentsserver path/to/properties.yml
(eg. src/main/properties/dev.yml)
NOTE: this application accepts environment variables that will be picked up at runtime (this file is bundled into to container). If https configuration is needed a modified config.yml
must be mounted into the container with the appropriate keystore/truststore locations (see dropwizard documentation).
server:
applicationContextPath: ${SERVER_CONTEXT_PATH:-/}
applicationConnectors:
- type: ${SERVER_APP_CONNECTOR:-http}
port: ${SERVER_APP_PORT:-6677}
adminConnectors:
- type: ${SERVER_ADMIN_CONNECTOR:-http}
port: ${SERVER_ADMIN_PORT:-0}
requestLog:
type: ${SERVER_REQUEST_LOG_TYPE:-external}
A k6 script is included to satisfy a basic load test. By default, this will target the application running on localhost
, via the docker hostname host.docker.internal
. This can be altered by passing an optional TARGET_HOST
environment variable.
Ensure you have the service running, and execute the test as follows:
# Default target: host.docker.internal
docker run --rm -i --name loadtest \
-v $PWD:/k6 \
loadimpact/k6 run - < ./load-test/test.js
# Custom target (must be accessible from within the k6 container)
docker run --rm -i --name loadtest \
-e TARGET_HOST=some-target:8080 \
-v $PWD:/k6 \
loadimpact/k6 run - < ./load-test/test.js
# Change no. virtual users and duration
docker run --rm -i --name loadtest \
-v $PWD:/k6 \
loadimpact/k6 run --vus 20 --duration 5m - < ./load-test/test.js
Default configuration and criteria for satisfying performance thresholds are bundled in the test scripts themselves.
For configuring the tests in the CI pipeline, refer to the official GitLab documentation or underlying template source.
POST endpoint receiving the information to build the pdf file
{
"colour_profile": "base64-encoded-file",
"font_map": {
"tahoma": "base64-encoded-file",
"arial": "base64-encoded-file"
},
"page_html": "base64-encoded-html",
"conformance_level": "PDFA_1_A"
}
colour_profile
(optional) : The base64 encoded colour profile file contents to be embedded to the pdf. If this value is omitted or null the default colour profile will be applied (src/main/resources/colours/sRBG.icm)font_map
(optional): a list of fonts to be embedded into the pdf. If thefont_map
is missing or null then a 2 default fonts will be embedded into the document.arial
to cover basic fonts andcourier
to cover monospace requirements.- The format for each key/value item is:-
- the name of the font (eg. arial), this must be specified in the html style header using the same format
- the base64 encoded version of the
.ttf
file contents to be embedded with the file
page_html
(mandatory): The base64 encoded html documentconformance_level
(optional): The conformance level for the resulting pdf.
If this parameter is missing (or null) it will default to PDFA_UA; the tightest of all the conformance levels.
Pdf conformance levels are detailed here with acceptable values for this service as:-
PDF_UA
(https://en.wikipedia.org/wiki/PDF/UA)PDFA_1_A
PDFA_1_B
PDFA_2_A
PDFA_2_B
PDFA_3_A
PDFA_3_B
PDFA_3_U
NONE
The only mandatory parameter is the base64 encoded html. If only the html is passed a standard colour profile will be used, arial
(standard) and courier
(monospace) will be embedded to the pdf and the conformance level for the pdf will be PDF/UA
Returns:-
- 200 :: Success. Returns base64 encoded pdf in the response body
- 400 :: Bad or Malformed json document or json elements. Returns a brief error message as the response body (full error is logged)
- 500 :: Internal error occurred, bad html or conformance levels, font/colour profile embedding. Returns a brief error message as the response body (full error is logged)
For the incoming html there are 2 things to consider.
- The pdf generator requires XHTML which requires careful closing of tags (https://www.w3schools.com/html/html_xhtml.asp)
- In order to satisfy the font requirements of PDFA_1_A document all elements need to reference the font that will be embedded. This is best achieved by adding a
<STYLE>
element to the<HEAD>
of the html and to apply it for all items (eg body). The important point is to make sure that all fonts are explicitly specified in the html document. - If using images it is best to encode the images directly into the html. eg
<img src="data:image/png;base64,<the-base64-encoded-string-of-the-image>"/>
eg.
<html>
<head>
<style>
pre, code, var {
font-family: 'courier', serif;
}
body {
font-family: 'arial', serif;
}
</style>
</head>
<body>
<h1>hello world</h1>
<img
width="250px" height="250px"
src=""
alt="base64 encoded embedded image"
/>
</body>
</html>
- fonts not embedded correctly :: will result in an error reporting
Index: 0, Size: 0
orIndex 0 out-of-bounds for length 0
which, whilst not a very clear, is because the required font is not present in the embedded list array. All html tags should have an attached font (both normal and monospaced) - links not fully qualified :: any references to css or images that have relative paths will fail. A full, resolvable URL is required.
- closing tags :: XHTML requires all tags to be terminated, this is easily missed.
Endpoint to return a standard JSON document with build information.
- name: the
project.artifactId
- version: the
project.version
- build: the jenkins build-number
- build_time: the
maven.build.timestamp
example output is:-
{
"app": {
"name": "ms-html-to-pdfa",
"version": "1.6.0",
"build": "133",
"build_time": "2019-09-09T09:58:17Z"
}
}
The following will base64 encode the html file contents, call the service, decode the response and write to file on *nix based operating systems
curl -m 10 -X POST --data '{"page_html":"'$(cat src/test/resources/successfulHtml.html | base64)'"}' http://localhost:6677/generatePdf | base64 -D > test.pdf
This example will return the current build information
curl http://localhost:6677/version-info
For general information about the CI pipeline on this repository please see documentation at: https://confluence.service.dwpcloud.uk/x/_65dCg
Pipeline Invocation
This CI Pipeline now replaces the Jenkins Build CI Process for the ms-html-to-pdfa
.
Gitlab CI will automatically invoke a pipeline run when pushing to a feature branch (this can be prevented using [skip ci]
in your commit message if not required).
When a feature branch is merged into develop
it will automatically start a develop
pipeline and build the required artifacts.
For production releases please see the release process documented at: https://confluence.service.dwpcloud.uk/pages/viewpage.action?spaceKey=DHWA&title=SRE A production release requires a manual pipeline (to be invoked by an SRE) this is only a release function. Production credentials are required.
localdev Usage
There is no change to the usage of localdev. The gitlab CI Build process create artifacts using the same naming convention as the old (no longer utilised) Jenkins CI Build process.
Therefore please continue to use branch-develop
or branch-f-*
(depending on branch name) for proving any feature changes.
Access
While this repository is open internally for read, no one has write access to this repository by default. To obtain access to this repository please contact #ask-health-platform within slack and a member will grant the appropriate level of access.