Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load data to atomspace #20

Merged
merged 8 commits into from
Jul 17, 2018
Merged

Load data to atomspace #20

merged 8 commits into from
Jul 17, 2018

Conversation

Yidnekachew
Copy link
Collaborator

Implemented based on #12

* @param row_num row number
* @return
*/
Handle mk_row_number_cell(const int& row_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to pass int by const ref as int is as big as its pointer or smaller.

* @param contin contin value of a cell
* @return
*/
Handle mk_real_cell(const contin_t& contin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass contin by ref (it's the same size as a pointer).

* @return
*/
Handle mk_real_cell_exec(const contin_t& contin,
const int& row_num, const string& label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark about passing by ref (string though be passed by ref).

* @param b builtin value
* @return
*/
Handle mk_boolean_cell(const builtin& b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark about builtin ref passing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builtin is an enum. Shouldn't we pass it by ref?

const type_node& type = oTable.get_type();
const string& label = oTable.get_label();
HandleSeq rows;
int row_num = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if it starts by 0 (contrary to what I've indicated on the issue). It makes the subsequent code slightly simpler and it's a more frequent start index value.

mk_real_cell(boost::get<contin_t>(v)));
rows.push_back(h);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you can put the conditional in the createLink and select either mk_real_cell or mk_boolean_cell depending on type. It slightly less efficient (though probably imperceptibly) but it reduces the number of lines of code.

Another even better option is to create a mk_cell visitor! that way you just call that mk_cell function on v and you're good. There are plenty of examples of visitors throughout MOSES' code (in table.h for instance). See also https://www.boost.org/doc/libs/1_67_0/doc/html/variant.html for its documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It slightly less efficient (though probably imperceptibly) but it reduces the number of lines of code.

Will it be imperceptible as we're going to do the checking for every single cell in the data?

Copy link
Collaborator Author

@Yidnekachew Yidnekachew Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if we're changing it for only the output table, it wouldn't affect much. But, what if we're doing the same for other parts such as this line (https://github.com/opencog/as-moses/pull/20/files#diff-85f2b83d9bc9a6c48aeb22bb8535b034R325)?

* all features have similar type with the output feature.
* @return
*/
Handle mk_input_table(const ITable& iTable, const type_node& type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass type_node by ref.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_node is an enum too.

HandleSeq cells;

cells.push_back(mk_boolean_cell(
boost::get<builtin>(table.otable[row_num - 1])));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you start row_num from 0 you wouldn't need to subtract 1.

@ngeiswei
Copy link
Member

@Yidnekachew it is OK if you make you next commits on top the existing one (i.e. not overwrite the git history) I mean the branch is clean enough at this point.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 10, 2018

@Yidnekachew said

Will it be imperceptible as we're going to do the checking for every single cell in the data?

Yes because the branch prediction of the CPU will choose the right function after the first few cycles.

HandleSeq cells;

for (contin_t cell_value : m.get_seq<contin_t>())
cells.push_back(mk_real_cell(cell_value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you could simplify here as well by having the whole push_back loop wrapped in a visitor, but maybe it's just too much worry. I mean this code is already quite clean as it is.

cells.push_back(mk_boolean_cell(cell_value));

rows.push_back(
similarity ? mk_numbered_row(cells, row_num + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop + 1 in row_num + 1 (in that and the other instances).

for (builtin cell_value : m.get_seq<builtin>())
cells.push_back(
mk_boolean_cell_exec(cell_value, row_num + 1,
labels[cell_number++]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, I feel that if you wrap that code in a visitor you may simply a lot your code. Maybe like having a visitor for turning a multi_type_seq into a cells vector.

HandleSeq operator()(const T& t) const
{}

int cell_number = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit dangerous to have this here because if you reapply the visitor it will crash. I would suggest to move it in the operators.


for (const multi_type_seq& m : table.itable)
{
HandleSeq cells = boost::apply_visitor(mk_cell_seq_visitor(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it terms of performance, it might be better to create the visitor out of the loop. But frankly it's not even sure as both the compiler and system (CPU branch prediction) optimize that sort of things. Just keep that in mind for the day you want to speed up that function (although it's unlikely it'll be problematic).

* @param table
* @return
*/
Handle mk_unfolded_boolean_table_eval(const Table& table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

{
mk_exec_cell_seq_visitor cseqv;
cseqv.row_num = row_num;
cseqv.labels = labels;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think C++11 allows you to write directly

mk_exec_cell_seq_visitor cseqv{row_num, labels};

{}

int row_num;
string label;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better be a const string&

{}

int cell_number = 0;
HandleSeq cellSeq;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move this in the operators as well.

int cell_number = 0;
HandleSeq cellSeq;
int row_num;
vector<string> labels;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better be a const ref.

@@ -0,0 +1,474 @@
/** atomese_representation.cc ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need name update. Your IDE likely offers a way to find and replace a string across multiple files, if not you can go old school with sed (though interactivity is usually better handled via an IDE).

mk_exec_cell_seq_visitor(const int row_num,
const vector<string>& labels)
: row_num(row_num), labels(labels)
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to define that constructor, C++11 initializer will work regardless.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, they need to be public for it to work. You may either use struct instead of class or declare them public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a struct now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so now you should be able to remove mk_exec_cell_visitor and mk_exec_cell_seq_visitor constructors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already struct and doesn't work if constructors removed.

Copy link
Member

@ngeiswei ngeiswei Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, C++11 aggregate initialization is very stringent, see https://en.cppreference.com/w/cpp/language/aggregate_initialization for more info.

It actually works with -std=c++17 but then this produces other problems with the cogutil tree class. And C++17 is too recent to force users to use a compiler compatible with it.

So you're gonna have to let it here, but you may add a comment that such a constructor isn't needed for C++17.

using namespace opencog;
using namespace std;

class atomese_reprUTest : public CxxTest::TestSuite {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update that unit test name (and the filename it's in) to follow the functions it is testing), load_tableUTest would do.

@@ -0,0 +1,238 @@
/** atomese_representation.h ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update

*/

#ifndef _OPENCOG_ATOMESE_REPRESENTATION_H
#define _OPENCOG_ATOMESE_REPRESENTATION_H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update

@ngeiswei
Copy link
Member

Last thing @Yidnekachew please fix the warnings (by having the default visitor operators return something, like Handle::UNDEFINED or {}).

@ngeiswei ngeiswei merged commit c038445 into opencog:master Jul 17, 2018
@ngeiswei
Copy link
Member

I've merged, it seems alright, including the git history.

Thanks for your patience and for putting up with my stringent requirements.

@Yidnekachew
Copy link
Collaborator Author

@ngeiswei It's an honor to get your comments!

@Yidnekachew Yidnekachew deleted the load-data-to-atomspace branch July 17, 2018 19:40
ngeiswei added a commit to ngeiswei/asmoses that referenced this pull request Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants