-
Notifications
You must be signed in to change notification settings - Fork 0
Home
An user can submit information to NAV, e.g. in order to apply for a benefit (sender inn en søknad) or submit missing documents to an previous sent in application or just a list of documents. A submission will always contain metadata such as userId, title, tema and one or more documents. Such a submission are sent in to NAV and archived before automatic or manual case managements. This text describes the services responsible for taking the users submissions and putting them in NAV's archive (Joark).
There are three services that perform the task of archiving the documents in Joark: Soknadsmottaker, Soknadsarkiverer and Innsending-api (the green components in the diagram below). The latter receives submission metadata and documents from the frontend services FyllUt and Sendinn-frontend (marked yellow in the diagram below). Note that Innsending-Api in addition to what is shown in the diagram also has an external-api that currently is being used by Sykepengesoknad-backend. Innsending-Api stores metadata and documents related to user submissions in its database (Db in the diagram).
User documents are uploaded and stored in Innsending-api's database. When the user sends in the the submission, metadata (with user identification, tema, title and list of documentids) is sent by Innsending-api to Soknadsmottaker which in turn publish the submissions metadata on the Kafka topic called submission in the diagram. If the metadata is successfully published, Innnsending-api sends a done message to Soknadsmottaker in order to cancel the submission's user-notification. Soknadsmottaker publish the done message to the topcis called Utkast and Oppgave in the diagram. Note that the Utkast and Oppgave topics are not maintained by the fyllut-sendinn team and are marked by grey colours.
Soknadsmottaker will serialize the metadata data into Avro format and put it on the submission Kafka topic. It uses the schema definitions in the repo soknadarkiv-schema for this. The rationale for going though a Kafka topic is that it allows decoupling between the Joark archive and systems that the user interacts with (FyllUt and Sendinn-frontend). The archiving operation might, depending on the size and number og documents take more than a minute. Thus, Joark can be down without preventing the user from sending their documents in and for the user the send in operation will normally take less than a second. The documents that the user sends in can potentially be quite large, which Kafka is not suited for handling. Therefore only the metadata is put on the Kafka topic, while the documents' files themselves are temporarly stored in the Innsending-api's database.
Soknadsarkiverer subscribe to the submission topic. It will check if the submission is already archived using av rest call to SAF. If not archived, Soknadsarkiverer will based on the ids of the documents in the submissions metadata, fetch the documents from Innsending-api using GET
requests. Note that each document in Innsending-api might consist of multiple PDF-files. These files are merged to one PDF file and returned to Soknadsarkiverer. When all documents are fethced, a POST
request with the metadata and documents are sent to Joark. If this succeeds, a success message is published on the messages topic. Should Soknadsarkiverer fail to perform any of its tasks, it will retry multiple times with increasing delay. If archiving still fails after retry limit is reached, an archiving failed message is published on the message topic. Failed submissions that have failed to be archived, will require manually handling.
Innsending-api subscribe to the message topic and updates the submission's database with archiving status. If archiving has failed an alert is raised and sent to the Slack channel team-sendinn-fyllut-alert. If Innsending-api receives no response from Soknadsarkiverer* for a sent in submission it will raise another alert on the same Slack channel.
Reasons for failing can be timeouts either fetching documents from Innsendingapi or posting to Joark. Timeouts can be caused by long time merging files to one PDF document in Innsending-api due to corrupted PDF files, or slow respons from Joark. We have previously experienced that Joark has used long time to respond due to checks if PDFs is in PDFA-1b format. When this occur, Joark might have finished the archiving operation after the timeout in Soknadsarkiverer. When running retry Soknadsarkiverer starts archiving operation by calling SAF to check if a submission is already archived, and skips if its already archived. We also set quit high timeouts to avoid retries. To avoid too large submissions, the document sizes are limited to 50MB and the total submission size is set to 150MB. If one of the files uploaded by the user is corrupt and causes archiving errors, it might be necessary to fix and replace this file in innsending-api before initiating re-try. The Innsending-admin application (not shown in diagram) can be used to check if a submission is archived and initiated re-try.
In addition to the three services (Soknadsmottaker, Soknadsarkiverer and Innsending-api marked in green colours in the diagram) that take care of archiving an application to Joark, there exists end-to-end and load tests that is running in the development environment. This is used to check that the archiving service is not broken when changes are done to either of the services.
There are four Kafka topics in use by the system when archiving submissions (marked in light green in the diagram). When calling Soknadsmottaker, it will use the innsendingId
field as a unique key, and use it to connect the various Kafka events on the different topics together.
-
Submission: Whenever Soknadsmottaker receives a
POST
request, it will put this data as an event on this topic. Soknadsarkiverer is the consumer. This is the communication channel between the two applications. -
Processing Event Log: Soknadsarkiverer is internally driven by a Processing Event Log. When Soknadsarkiverer receives an event on the input topic, it will create a
RECEIVED
event on the Processing Event Log. This will trigger its internal engine to schedule a task that will begin to process the event. Only a few tasks can be processed at the same time, thus creating a buffer so that the systems are not flooded with requests. When a task begins, aSTARTED
event is created on the Processing Event Log. Upon any errors in the processing, Soknadsarkiverer will reattempt several times, waiting longer and longer between the tries. If Soknadsarkiverer succeeds in fetching data from Innsending-api and sending it to the archive, it will create aARCHIVED
event on the Processing Event Log. This event where earlier used, but now it will only create aFINISHED
event on the Processing Event Log, and the processing is considered finished. If Soknadsarkiverer is unable to send to the archive after several attempts, it will give up and create aFAILED
event on the Processing Event Log. - Metrics topic: Both Soknadsarkiverer and Soknadsmottaker will publish metrics to this topic, with data of how long time operations take. It is used by the load tests, but it is not required for the archiving operation.
- Message topic: Soknadsarkiverer will publish events to this topic during processing, and end by either publish archiving success or failed status. Innsending-api subscribe to this topic and update the submission's database entry with archiving status.
The archival systems are spread over three services and an additional three repos.
Services:
- Innsending-api - stores the users submissions' metadata and documents as files in a database.
- Soknadsmottaker - puts events on submission Kafka topic.
- Soknadsarkiverer - reads events from the submission Kafka topic, retrieves the documents from Innsending-api and sends it all to Joark.
Other repos:
- soknadarkiv-schema - Avro schema definitions for the Kafka messages.
- archiving-infrastructure - scripts and Docker configuration to start all services and their dependencies locally in Docker. Also contains end-to-end and load tests of the whole system, as well as a mock service that acts as a stand in for Joark, for use during testing.
- arkiv-mock - A mock of the archive (Joark), that is used by the end-to-end and load tests.
The entire system can be run locally. However, as of summer 2024 Innsending-api is not part of the test suit, a test version of the deprecated soknadsfillager is used instead as a source for Soknadsarkiverer to fetch documents. The easiest way is to use Docker. Check out all the above mentioned repositories in the same directory and follow the instructions in each repository for how to build it. The entire system can then be started from the scripts in archiving-infrastructure. See its documentation for more details.
In order to define new or change a kafka topics run the command:
- kubectl apply -f <topic-configuration-and-acl.json>
PS. Note that it is NOT possible to change the number of partitions on an existing topic. Currently the topics is set up with 12 partitions (normally each of the archiving services runs on 2, 3 or 4 PODs, having 12 partitions we will get an even load balance between the PODs)
See https://github.com/navikt/soknadarkiv-schema/tree/main/topicconfig for examples of *_v2.json configuration files that can be used.
In order to list configuration of an existing topic, e.g. privat-soknadinnsending-v2-dev:
- kubectl describe topic privat-soknadinnsending-v2-dev -n team-soknad
If it is neccessary to change topics used by soknadsmottaker/soknadsarkiverer, follow these steps:
- Configure the new topics using the commands listed in previous chapter
- Update relevant topics in soknadsmottaker
- Observe using the logs that new applications are published to the new topic, and that soknadsarkiverer no longer receives applications on the old topic.
- Re-deploy soknadsarkiverer (still using the old topics) in order to check that all published applications are received and archived.
- Update soknadsarkiverer configuration with the new topics. Remember to update groupId to reflect the update.
6 Deploy soknadsarkiverer and observe using the logs that applications published by soknadsmottaker on the new topic are received and archived.