-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] views::char_strictly_to #2898
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/seqan/seqan3/3vgWuKpASAkbALqQj1iZXPwNpifK |
Codecov Report
@@ Coverage Diff @@
## master #2898 +/- ##
==========================================
- Coverage 98.28% 98.28% -0.01%
==========================================
Files 266 267 +1
Lines 11455 11459 +4
==========================================
+ Hits 11259 11262 +3
- Misses 196 197 +1
Continue to review full report at Codecov.
|
I don't know why Codecov claims that coverage is reduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGFM, just some spelling and style things.
01b434a
to
4174626
Compare
I don't know if this was discussed, but is this REALLY necessary? What's the use case, what's the semantic? This view mangles input validation with transformation, and it should be a niche use-case for most users. Most users will get their sequences by reading in a file by using our IO. Of course our IO should provide some level of diagnostics if the read in characters are wrong. But should this API really be public in the alphabet module? If you want to use it in IO, please just make it detail for now. I would even argue that the current functionality is kind of lacking for a user. It just throws an exception, but the user won't be able to recover from that. They don't know where the offending character is, they don't know how the context around the error looks like, they just know that there was somewhere an invalid character. From a user POV the same can also be achieved by iterating with I'm pro for some diagnostic "library" functionality to give a user a good reason why a char-sequence is invalid, like gcc or clang does for parsing a file, but I don't think a VIEW is the right place for that. |
Yes, from my point of view it is. I don't want to have page-long arguments on Github for adding effectively 6 lines of code to the library. It has got one approval already, and another team member has been automatically assigned for a second review. If you feel that this change is so fundamental that Enrico is not qualified to approve it and that it needs to be discussed by the entire core-team, please bring it up at the next meeting of the core-team. |
My first question when I saw the PR was I wanted to discuss this PR in last week's core meeting, but there were only two members present, so we cancelled it. I think it is very valid to question the usefulness: "Oh wow, there is a wrong character somewhere in my 20 GiB file"- which also translates to whether this should be API or detail. What are even the use cases? Apparently you have some, but decided not to share them. Since from my current understanding, both In my opinion, this view should either be |
Why didn't you ask me this question?
That is not how I am using it.
My IO-Code returns records with views into the stream-buffer. The IO does not verify the characters and instead returns a view to the user that converts to the requested alphabet type. There needs to be a user-friendly mechanism of notifying the user when there is an illegal character in the data (user-friendly means not telling the user to run some extra algorithms on the data). Since this is a type that end-users see, it should not be detail, because it means they don't know what is going. Also, this is not some "hidden implementation", it is a very simple view with a 5-line implementation that works exactly like all the other views.
I was not aware that you removed these requirements from the concept. I thought that it was rather self-explanatory that one might want to check which characters "are in" the alphabet?
I write external library and application code, so things in detail, NOAPI or experimental are not very helpful to me. In that case, it is easier to just have things in my own code. |
I wanted to ask in the Core-Meeting, and afterwards it slipped my mind :)
That was my main problem with the PR. There was no description/reasoning, so I was left to wonder what that view might be used for. And as I suspected, you have a use case where this view is useful.
The "use" of the view itself is self-explanatory. What I was missing was a use case that highlights how the view solves some problem. Without any use case, we will fall back to implementing "everything that might seem useful", and this did not work out too well.
The only thing I was missing was some context, which you provided. In general:
In this case, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically just documentation 💅
Can you add a snippet, please?
Something like this(untested)?
test/snippet/alphabet/views/char_strictly_to.cpp
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/alphabet/views/char_strictly_to.hpp>
#include <seqan3/core/debug_stream.hpp>
int main()
{
std::string str{"ACTTTGATAN"};
try
{
seqan3::debug_stream << (str | seqan3::views::char_strictly_to<seqan3::dna4>); // ACTTTGATA
}
catch (seqan3::invalid_char_assignment)
{
seqan3::debug_stream << "\n[ERROR] Invalid char!\n"; // Will throw on parsing 'N'
}
}
test/snippet/alphabet/views/char_strictly_to.err
ACTTTGATA
[ERROR] Invalid char!
// ----------------------------------------------------------------------------------------------------- | ||
// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin | ||
// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik | ||
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License | ||
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md | ||
// ----------------------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ----------------------------------------------------------------------------------------------------- | |
// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin | |
// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik | |
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License | |
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md | |
// ----------------------------------------------------------------------------------------------------- | |
// ----------------------------------------------------------------------------------------------------- | |
// Copyright (c) 2006-2021, Knut Reinert & Freie Universität Berlin | |
// Copyright (c) 2016-2021, Knut Reinert & MPI für molekulare Genetik | |
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License | |
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md | |
// ----------------------------------------------------------------------------------------------------- |
|
||
/*!\name Alphabet related views | ||
* \{ | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*!\name Alphabet related views | |
* \{ | |
*/ |
* \tparam alphabet_t The alphabet to convert to; must satisfy seqan3::alphabet. | ||
* \param[in] urange The range being processed. [parameter is omitted in pipe notation] | ||
* \returns A range of converted elements. See below for the properties of the returned range. | ||
* \ingroup views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \ingroup views | |
* \ingroup alphabet_views |
4174626
to
a4b431b
Compare
Yes, I had thought that assign_char_strictly was still in the concept, so I thought missing this view was an oversight. I hope the use-case is now clear. If you have any questions, please let me know. |
@h-2 would it be also an alternative to have a view that just checks whether a character is part of the "domain", as we already have a view that has the converting effect? Something like this: auto assign_char_strictly = views::char_is_valid | views::assign_char; name would be up-to-discussion ( |
a4b431b
to
35000bb
Compare
Let me think about it! |
I wouldn't mind splitting the implementation into two separate views, but I would still like to have the combined view (defined as you proposed above) for usability. Would that be agreeable? |
I like both proposals, i.e. having a view that checks for validity, and then having the combined view. I just noticed that we probably need to restrict the view to not just take
I think a static_assert would be enough.
seqan3/include/seqan3/alphabet/concept.hpp Lines 729 to 750 in 67873e4
|
Unless someone changed it, |
You're right, I missed the default implementation in the CPO. |
35000bb
to
732d106
Compare
732d106
to
915dedb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One filename is wrong.
Only snippets are failing.
GitHub seems to have trouble serving the Cmake download today, that's why CI sometimes fails.
@@ -0,0 +1,2 @@ | |||
ACTTTGATA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this file to validate_char_for.err
?
915dedb
to
9263bf5
Compare
No description provided.