[Draft] first version of DynamicAlgorithm class + demo algorithm to demonstrate #395

rpuwa · 2024-04-21T17:47:20Z

No description provided.

BUYT-1

This isn't what I meant. The Algorithm class is designed to have its LoadData method be called once, and then for Execute to be called multiple times. Thus, for dynamic algorithms I suggest executing the algorithm for the first time in LoadData, then processing updates in Execute. Because the names of methods no longer make sense, I then suggested that they should be renamed: LoadData -> Initialize, Execute -> Process, along with their Python bindings (.def("execute", ...) -> .def("process", ...)).

The workflow with the Python constructor doing the initialization is for the future. That's what Algorithm's bindings should be doing. It would be nice for it to be implemented here, but if not, then that's fine.

So, again, here is what I think should be done for now: execute the dynamic algorithm in LoadData, process updates in Execute. Rename the methods and their bindings as above. Leave everything else as is.

BUYT-1 · 2024-05-03T11:57:08Z

With the changes above, the DynamicAlgorithm class will not be needed. The methods for getting results should be moved to a class analogous to FdAlgorithm (DynamicFdAlgorithm?) and named accordingly. From that class, the other dynamic FD mining algorithms can be inherited. Bindings should then look similar to FD algos' bindings, where DynamicFdAlgorithm is bound by itself, with result methods, and concrete algorithms are bound as its inheritors.

…logic transfer

BUYT-1 · 2024-05-18T15:57:26Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+    insert_statements_.Clear();
+    delete_statements_.clear();


This can be achieved if we just provide empty values as defaults for these options. The user has to set all options before executing the algorithm, so they would inevitably become empty.

These fields will not be empty, because they are not options

Oh, okay. They store the inserted and deleted lines, right? Isn't it wasteful to store them entirely in memory, though? I think these things should be managed by the algorithm.

BUYT-1 · 2024-05-18T15:58:49Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+    if (data->GetNumberOfColumns() != input_table_->GetNumberOfColumns()) {
+        throw config::ConfigurationError(
+            "Invalid data received: the number of columns in the \
+            modification statements is different from the table.");
+    }


The Option class allows validation and normalization of values, default values, and it is also possible to have options that are only shown if some condition, which is expected to depend on other option values, holds. The basic checks can be done with those methods. MetricVerifier uses all the Option's methods, so you can look there for an example. But option management shouldn't be in the base class anyway. In particular, some algorithms may not support deletion, yet the option would be visible here.

BUYT-1 · 2024-05-18T16:08:41Z

src/core/model/table/table_row.h

+    static int CreateId() {
+        static int id = 1;
+        return id++;
+    }


This makes row IDs unpredictable for the user. They could have several algorithms running, and this will make row numbers inconsistent.
Don't make row IDs static. Row IDs should be indices of the rows in the table that a particular algorithm object remembers rows of (perhaps without actually storing it in memory in its entirety). So I doubt the row itself needs to remember its own ID.
Also, using IDs like that will make some checks easier. An algorithm will only need to store the total number of rows to check if a row can be deleted, for example. And it will be possible to store which rows have been deleted in a bitset.

BUYT-1 · 2024-05-18T16:36:41Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+    // configure update statements
+    ValidateDeleteStatements(update_old_batch_);
+    ValidateInsertStatements(update_new_batch_);
+    if (insert_statements_.Size() != delete_statements_.size()) {


It would make more sense to have an object that yields (id, new_data) pairs instead as a single update option in a manner similar to IDatasetStream or just a vector of pairs. This option is one of the cases where checking its value (for example, checking that all the rows have the correct length) completely might not be feasible.

BUYT-1 · 2024-05-18T16:47:01Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+bool algos::DynamicAlgorithm::HasBatch() {
+    bool result = false;
+    for (const std::string_view& option_name : kCrudOptions) {
+        result |= IsOptionSet(option_name);
+    }
+    return result;
+}


This is already taken care of by the configuration system. Default values can be set by Option to be empty, and we'd just have no rows/IDs/pairs in the relevant field.

BUYT-1 · 2024-05-18T17:20:52Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+}
+
+void algos::DynamicAlgorithm::MakeExecuteOptsAvailable() {
+    if (is_initialized_) {


This check does nothing, existing configuration only calls this method after LoadData has been called.

BUYT-1 · 2024-05-18T17:23:24Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+    for (const std::string_view& option_name : kCrudOptions) {
+        previous_options.erase(option_name);
+    }


This makes these options invisible in a GetNeededOptions call, but they should be visible.

BUYT-1 · 2024-05-18T17:46:19Z

src/python_bindings/dynamic/bind_dynamic_algorithms.cpp

+                for (const std::string_view& option_name : kCrudOptions) {
+                    SetOptionByName(algo, option_name, kwargs);
+                }


This is just bypassing the existing configuration infrastructure.

BUYT-1 · 2024-05-18T18:10:18Z

src/core/algorithms/dynamic/dynamic_algorithm.cpp

+unsigned long long algos::DynamicAlgorithm::ExecuteInternal() {
+    if (HasBatch()) {
+        ConfigureBatch();
+        unsigned long long time_ms = ProcessBatch();


ProcessBatch should just be ExecuteInternal for every algorithm (unless there are a lot of common things happening, but that's for the future).

BUYT-1 · 2024-05-18T18:14:12Z

src/python_bindings/dynamic/bind_dynamic_algorithms.cpp

+        .def(py::init([](py::kwargs const& kwargs) {
+                auto algo = std::make_unique<DynamicAlgorithmDemo>();
+                ConfigureAlgo(*algo, kwargs);
+                algo->LoadData();
+                return algo;
+        }))


Leave the constructor as is for now. Use RegisterAlgorithm

rpuwa added 5 commits April 21, 2024 20:45

dynamic algorithms: kwargs in python bindings not working

e28e475

process binding fix + getting opt value by iterator

c720c65

crud options registration + algo changes

d8eda19

correct initialization of algorithm

cd44d47

process batch fixes, example works now

2c80297

BUYT-1 reviewed May 3, 2024

View reviewed changes

rpuwa and others added 3 commits May 17, 2024 23:07

deletion by index of row + Process->Execute and Initialize->LoadData …

6550752

…logic transfer

Merge branch 'main' into dynamic_algorithm_draft

3b9e7e3

build fixes

d8b8264

rpuwa force-pushed the dynamic_algorithm_draft branch from aec1696 to d8b8264 Compare May 17, 2024 21:01

BUYT-1 requested changes May 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] first version of DynamicAlgorithm class + demo algorithm to demonstrate #395

[Draft] first version of DynamicAlgorithm class + demo algorithm to demonstrate #395

rpuwa commented Apr 21, 2024

BUYT-1 left a comment

BUYT-1 commented May 3, 2024

BUYT-1 May 18, 2024

rpuwa May 18, 2024

BUYT-1 May 19, 2024 •

edited

Loading

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

BUYT-1 May 18, 2024

[Draft] first version of DynamicAlgorithm class + demo algorithm to demonstrate #395

Are you sure you want to change the base?

[Draft] first version of DynamicAlgorithm class + demo algorithm to demonstrate #395

Conversation

rpuwa commented Apr 21, 2024

BUYT-1 left a comment

Choose a reason for hiding this comment

BUYT-1 commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BUYT-1 May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BUYT-1 May 19, 2024 •

edited

Loading