-
Notifications
You must be signed in to change notification settings - Fork 81
[EN] 6. Package with library
Sometimes submitted programs should not have access to full data in the task. This information must be obtained gradually, through various types of queries. Then, of course, it must not be possible to load such data immediately from the input. So we need a way to interact with the program during its runtime.
Such a possibility is provided by libraries. A library is an external code, written by the author of the task, linked with the participant's solution. The program can then call functions from the library, or vice versa, which gives us the opportunity to interact. This solves our problem, as the library can control participant access to the information given in the test.
Let's take one of the basic problems using Binary Search. Very often the learning process of the above starts with the following problem: "We play the game. I think of the number from 1 to 100, and you can ask, whether your chosen number is smaller, greater or equal. Guess in as few attempts as possible the number I'm thinking about".
Often students write a program that prints questions to the standard output, and the answers are typed into the standard input. This way the program plays the game with its author. Unfortunately, while many platforms can simulate something like this, SIO2 does not have the possibility to interact with the participant's program via standard input/output. However, we still have the possibility, in a slightly modified form, to give participants such a task. This is what the library will be used for!
The participants' task will be to write a program playing the above game in interaction with our library. In order to better assess the correctness of the solutions, we will increase the range of numbers to 10^6. The program will only have a possibility to ask questions "is the number you are thinking of greater than x", and the number of questions will be limited.
Examples of correct solutions are in files gue.cpp and gue2.py.
The library is a separate part of a computer program. There is no main function, but only the functions needed to answer the participant's program questions.
It is also possible for the participant to write the "library" part of the code, while the program with the main function will be provided by the organizers. However, we will not discuss this in this tutorial. It's easy to achieve such an effect using the library type we discussed. All you have to do is define 2 functions in the library: giveQuery and respondToQuery. In this way you can simulate the second type of task with the library, without going into in the technical details associated with this.
The library must be able to do the following things:
- Initialize itself. Initialization includes:
- Loading the input data (only in this way it can have different data for different tests)
- Initialize any variables it will use
- Be able to respond to each of the available queries (there is only one available for our task, but nothing stands in the way for other tasks to have more).
- Ability to quickly return an answer to a question (if the library calculates the answer in 1000 operations, and you can ask 1000 queries, it can get a "Time limit exceeded" verdict just for using the library as intended, which is bad behavior!).
- Checking that the query is correct (for example, whether the number given is not outside the permissible range), if it has not been called too many times, if the initialization function has already been called, but also whether the program shouldn't end now (for example, no reply has been given before), and any potential other requirements from the task statement.
- The library should be implemented in a way, that its operating time is stable and the same for tests of similar size. It's good if all of the queries (apart from initialization/response) worked equally fast, regardless of the arguments given. This allows us, as authors, to better estimate the duration of the participant's program and make better time limits. Also, make sure that the library always uses the same amount of memory!
- Have a function to return the answer.
- It must write the result to the standard output and inform other functions, that they shouldn't be called anymore.
- It can also check if the answer is correct and write it out instead of the result itself. There will be more about the approaches to writing the result later on in this guide.
The statement of our task initially does not differ from other task statements. However, in the problem description it should be noted that the task is interactive.
Then the following sections should appear:
- Communication: In this section the functionality of the library should be described in detail. All the functions should be listed, together with their exact signature and requirements for their calls. It is also worth writing that the user should execute appropriate commands at the beginning of the program to be able to use the library functions. It is also worth reminding that the program cannot read and write from the standard input/output.
- Sample execution of a program: Of course a sample test is needed, but in this format, it has to be given in a completely different way. All function calls made by the sample program must be specified, along with descriptions of what they cause and what they return.
- Grading: This section is not mandatory, but is worth adding, especially when the scoring conditions are not obvious. All additional scoring conditions, specific for this particular task, should be listed here.
- Experiments: We are just learning how to compile and test such a program. If this is not obvious to us, then surely it is not for the participants. That's why this section must explain everything so they can run the program locally.
Final task statement is available in guezad.pdf.
Now it is time to implement all the functionality we need. I'm going to explain the implementation for C++, I'll explain why later. You should focus on the following aspects:
- Efficient and correct responses to queries (we don't want the participant to get less points than he should because of our library).
- Ensuring that the participant correctly calls up functions. It should be taken into account that they can give as a parameter of the function any value and the library has to be ready for it. Also, if you require the init function to be called first and only once, then it must be enforced. If the program does not follow a rule, you can print out an error message to the standard output. Then further behavior of the program does not matter. That is why we usually do not enforce program finish, so any runtime errors are only due to the participant's mistakes, and he gets a wrong answer for incorrect library usage only after a flawless completion.
We don't have to worry that the participant won't call a function for too long, or he'll abuse his memory limits, because he's still going to get the "Time limit exceeded" or "Runtime error" verdict in such cases.
If we want to increase the chances of eliminating wrong solutions, we can write an adaptive library. This technique is most often used when we don't want to award points to randomized solutions.
This technique consists in changing the solution during the runtime of the program in a way, that it continues to be consistent with previous queries, but is punishing new suboptimal queries. Let's use an example where a number from 1 to 100 is guessed. The selected number is 100. The first query asks about 50, response "greater" is returned. The second query is 99. Unluckily, the answer "greater" will immediately reveal the result, and yet the query is not good, because it gives very little information for any chosen number other than 100. Therefore, the hidden number is changed to 51 to "punish" the program for this query. However, it cannot be changed to 1, because that would contradict the answer to the first query.
This way of manipulating the solution is allowed in algorithmic competitions, unless it is specified in the statement, that the solution is determined before the interaction begins and does not change during the runtime of the program. In this task this is not the case, so we will use an adaptive library. We want to eliminate solutions querying with a random element. Knowing how binary search works, we can assume, that on average the program will use a small number of queries. Sometimes it will be unlucky (the solution will be in the bigger half of the range), but sometimes lucky, so there's a risk, that the program will fit within the limit of 20 queries. Our adaptive library will always set the solution in the larger of the two number ranges created by the query. Thanks to this we guarantee that the randomized solution will work non-optimally.
In the example package, a randomized solution gueb.cpp is added. You can see that while it is doing quite well in regular tests, it uses clearly more queries in the adaptive one.
We've got the C++ library ready. What about Python? Theoretically, you have to write a second library in that language. But it's tiring, and the correct implementation of each library requires good programming language skills.
But there is a way to avoid writing the whole library again in Python. We will use the SWIG tool to enable Python programs to use the C++ library. This way we will write one code, and it will solve all our problems. In a nutshell, we will create a guelib.i interface file, and then using the tool, we will generate a guelib.py program that will redirect queries to the correct guelib.cpp library.
This is not a necessary solution, but the easiest and fastest to use. More information about this tool and how to use it can be found here.
Another important question is how to compile the program with the library.
GCC handles such tasks very well,
so we only have to give him a file to link as an argument.
This way, the command:
g++ gue.cpp guelib.cpp -o gue
is enough to compile our program.
However, it is worth configuring our package, so that the solution is automatically compiled with the library. First, makefile.in should be modified. The CXXFLAGS key tells the package which flags it should use to compile local solutions. It's not limited to compilation flags, if we set the value of this key to guelib.cpp, we will be able to correctly use the make run command. It is worth knowing that you can add more flags after a space, however please note that these flags are only for local compilation.
The next configuration applies to the functionality of the package after it is uploaded to the portal. To do this, let's modify the config.yml file. The following two lines should be added:
extra_compilation_args: {'cpp': 'guelib.cpp'}
The first one tells us what files we will need to compile the solution, the second one, what additional arguments are needed for compiling individual languages.
Finally, it is worth remembering that the memory used by the library is included in the memory used by the participant's program. Therefore, it is worth increasing the permitted memory limit in the config by an appropriate value, so that the user can use as much memory as is allowed in the task statement.
At this point, the package should be configured correctly.
Another challenge is safety. There is a chance that the participant will guess the input and output format. Then he can read the input by himself, write out the correct result, and the library won't even be called once. Without any security, we have no way of knowing about such an incident.
In order to eliminate this risk, it is worth to encrypt the input. Even if the participant guesses the format, the numbers read will not tell him anything. And let's remember that when the participant reads the input, the library can't, so the error will be guaranteed. Of course it would be best to use professional encryption, but often this is unnecessary, usually a small change is enough. An example of a good modification is applying the xor operation to all the numbers with a constant known only to us. In the example library N (i.e. the range of numbers) will be xored by 143, and the hidden number by N. (If the library is to work on an adaptive basis, the hidden number will be given as -1 at the input, we don't need to encrypt this information). The decryption of this information is as follows:
cin >> N >> X;
N = N ^ 143;
if (X != -1)
X = X ^ N;
You can also add/subtract a constant to/from the numbers, multiply by two, or apply some other simple reversible function. There is no point in over-efforting, after all, if someone necessarily wants to break our encryption, his large number of submissions should get our attention. I will not talk about professional encryption in this tutorial.
There is no need for encryption when writing to the output, but it's worth showing in some way, that it was our library that wrote the output, not the participant's program. For this purpose, it is worth writing out a certain original exit phrase, for example "GoodJobYouDidEndTheGame!" It can be any phrase, even a random sequence of characters, but it must not contain spaces or other whitespaces. The participant has no chance to guess such phrase, so you will know, that it was the library that wrote the output. It's worth using a checker program, otherwise there is a risk that the participant will see a comment "'ERROR' was outputted and 'OurLongPhrase' was expected". The checker also allows you to adjust the comments that the participant will see in the report in his submission (you can, for example, include information in the comment about the number of queries or an error made by the program). Also often our library can calculate how many points the program should receive and may output that number to standard output.
The last aspect is data storage. If the data is declared globally, the user program will have access to it (and then there's no problem in reading the value of the result variable and answer it). Also, even not intentionally, the user can overwrite such variable, and that could cause big troubles with the library runtime. This is due to the way the link between the participant's program and the library works. The way the compilation works is roughly like this, that it pastes the library code to the beginning of the participant's program. Anyone aware of this can easily use it. Therefore, all the helper functions and variables are defined in an anonymous namespace (i.e. namespace without name). This namespace is only visible inside the file in which it is declared, therefore, the participant's program will not have access to it (despite being linked). Only public functions can be declared outside of it. This protects us from having our variables read or overwritten. A ready library can be found in guelib.cpp. Also, the folder in contains encrypted tests (a calculator is sufficient for encryption and decryption). Note that out tests do not contain any important information, therefore they are not added to the package (they will be generated on the SIO2 platform). However, they are needed for local testing, so before running the program, "make ingen" command should be called.
Summarising:
- We encrypt the input, depending from the task importance, with a xor, simple encryption, or even large encryption algorithms.
- In the first line of output we print out a secret phrase, and in the next ones - potential information from the library.
- We write a simple checker, which checks whether the information are correct and personalizes the result/comment.
- We keep all the data in an anonymous namespace, so that the participant's program does not have access to it.
Sometimes we will want to reduce points linearly depending on how the program works (program result, number of queries used, etc.). This is an original (and not very common) technique of punishing slightly incorrect programs in a mild way. If we want to implement such rule depending on the result/activity of the program, we'll need a checker. In our case, we'll punish the programs, which exceed 20 queries, but not 30. From 30 queries onwards, we will award only half the points. If they use more than 2000 queries, they will get zero points.
Implementation of such functionality is not difficult, requires the consideration of a few cases and one formula.
int points = 100;
if (queries > 30) {
points = 50;
} else if (queries > 20) {
points = (30 - queries) * 5 + 50;
}
It is worth remembering that the number printed in the third line is the percentage of points that the participant gets for a given test.
You can find a ready checker in guechk.cpp.
We have already mentioned that it is good for the participant to compile and run the program on his computer. He'll need a library for that. So it's a good idea to give it to him. We can't give him the original one, because he'll know all the encryption methods, adaptation and other potential secrets. For this purpose, we create a simple version of the library, which only reads the test from the input, simulates the operation (without adaptations), and writes the result to the output (possibly checking its correctness). We do not bother about any encryption, namespace, time optimization or any other functions that are necessary in a model library, but are not needed for local testing. It is also important to make the header file available with the sample library, and a sample program. The sample program must work correctly for the sample test, but (for known reasons) it must be wrong. A good example is a program:
Load the range of numbers. The range is [1, 5] Is it bigger than 4? If so, answer 5. Is it bigger than 3? If not, answer 1. If yes, answer 4. Accidentally, the answer in the sample test coincides with the program answer (even though it is wrong).
It is even worth mentioning in the statement/comments of the program that it is incorrect, but an example-based solution. These three files are packaged in zip (usually called dlazaw) and made available to participants. The folder dlazaw (internationally known as public) can be found here