Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Date Published" sorting option and table column #1770

Merged
merged 14 commits into from
Aug 26, 2024

Conversation

Eitot
Copy link
Contributor

@Eitot Eitot commented Jul 19, 2024

As suggested in #1749 the createddate SQL field is repurposed to show a "Date Published" option in the Sort By menu and add a new column for the horizontal layout. The existing "Date" sort option and column is renamed to "Last Update" to reflect its current purpose better. createddate was previously set to a current date when Vienna added the entry to the database. To keep it simple, I have not added the "Date Published" field to the vertical layout, since that would require more elaborate changes to the table-view cell.

While working on this I also stumbled on two bugs that caused the sort indicator in the column header not to show on app launch and when the user changes the sorting via the main menu. Both are fixed. The former bug is fixed with a band aid however, since the underlying cause is that ArticleListView is initialised before ArticleController, the latter which holds the property that determines the sorting order.

@Eitot Eitot added the changes localisations 💬 This pull request adds, changes or removes localisation keys. label Jul 19, 2024
Copy link
Contributor

@TAKeanice TAKeanice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have "date published" and "date added" which is really confusing in my opinion. Isn´t "publishing" the action of first releasing the original text of an article? I would rather call the two cases "Publication date" and "Last update".

@Eitot
Copy link
Contributor Author

Eitot commented Jul 20, 2024

Now we have "date published" and "date added" which is really confusing in my opinion. Isn´t "publishing" the action of first releasing the original text of an article? I would rather call the two cases "Publication date" and "Last update".

Last update would be wrong though, because "createddate" in the database is created once and never changed.

The problem with the "date" field is that it means something different depending on the feed type and whether a date is provided by the feed. For RSS it means "date published", for Atom it means either "date created", "date modified" or "date updated" – whichever is latest – and for JSONFeed it currently means "date modified". If no date is given, Vienna will fall back to a date at the feed level (which also differs per feed type) or – if no date at all is provided – the date when the feed was fetched by Vienna.

I think another database column would give more flexibility, to distinguish publication date from modification date (both of which could still be filled in by Vienna). Also, the "createddate" column really ought to be updated too when an existing entry is modified.

@TAKeanice
Copy link
Contributor

TAKeanice commented Jul 20, 2024

Last update would be wrong though, because "createddate" in the database is created once and never changed.

I meant createddate to be "publication date", which misses the point just a little (it is the date when Vienna first loaded the article), but offers the highest level of distinction to the other date value

The problem with the "date" field is that (...) For RSS it means "date published", for Atom it means either "date created", "date modified" or "date updated" – whichever is latest – and for JSONFeed it currently means "date modified"

Yes, for RSS there is no update date offering a distinction from the creation date. However, it rather reflects the last update than its publication date:

if ((isRSSElement && [channelItemTag isEqualToString:@"lastBuildDate"]) ||
(isRSSElement && [channelItemTag isEqualToString:@"pubDate"]) ||
([element.prefix isEqualToString:self.dcPrefix] && [channelItemTag isEqualToString:@"date"])) {
NSString *dateString = element.stringValue;
self.modifiedDate = [self dateWithXMLString:dateString];

and

NSDate * articleDate = newsItem.modifiedDate;
// (...)             
article.date = articleDate;

in

NSDate * articleDate = newsItem.modifiedDate;
NSString * articleGuid = newsItem.guid;
// This routine attempts to synthesize a GUID from an incomplete item that lacks an
// ID field. Generally we'll have three things to work from: a link, a title and a
// description. The link alone is not sufficiently unique and I've seen feeds where
// the description is also not unique. The title field generally does vary but we need
// to be careful since separate articles with different descriptions may have the same
// title. The solution is to use the link and title and build a GUID from those.
// We add the folderId at the beginning to ensure that items in different feeds do not share a guid.
if ([articleGuid isEqualToString:@""]) {
articleGuid = [NSString stringWithFormat:@"%ld-%@-%@", (long)folderId, newsItem.url, newsItem.title];
}
// This is a horrible hack for horrible feeds that contain more than one item with the same guid.
// Bad feeds! I'm talking to you, kerbalstuff.com
NSUInteger articleIndex = [articleGuidArray indexOfObject:articleGuid];
if (articleIndex != NSNotFound) {
// We rebuild complex guids which should eliminate most duplicates
Article * firstFoundArticle = articleArray[articleIndex];
if (articleDate == nil) {
// first, hack the initial article (which is probably the first loaded / most recent one)
NSString * firstFoundArticleNewGuid =
[NSString stringWithFormat:@"%ld-%@-%@", (long)folderId, firstFoundArticle.link, firstFoundArticle.title];
firstFoundArticle.guid = firstFoundArticleNewGuid;
articleGuidArray[articleIndex] = firstFoundArticleNewGuid;
// then hack the guid for the item being processed
articleGuid = [NSString stringWithFormat:@"%ld-%@-%@", (long)folderId, newsItem.url, newsItem.title];
} else {
// first, hack the initial article (which is probably the first loaded / most recent one)
NSString * firstFoundArticleNewGuid =
[NSString stringWithFormat:@"%ld-%@-%@-%@", (long)folderId,
[NSString stringWithFormat:@"%1.3f", firstFoundArticle.date.timeIntervalSince1970], firstFoundArticle.link,
firstFoundArticle.title];
firstFoundArticle.guid = firstFoundArticleNewGuid;
articleGuidArray[articleIndex] = firstFoundArticleNewGuid;
// then hack the guid for the item being processed
articleGuid =
[NSString stringWithFormat:@"%ld-%@-%@-%@", (long)folderId,
[NSString stringWithFormat:@"%1.3f", articleDate.timeIntervalSince1970], newsItem.url, newsItem.title];
}
}
[articleGuidArray addObject:articleGuid];
// set the article date if it is missing. We'll use the
// last modified date of the feed and set each article to be 1 second older than the
// previous one. So the array is effectively newest first.
if (articleDate == nil) {
articleDate = itemAlternativeDate;
itemAlternativeDate = [itemAlternativeDate dateByAddingTimeInterval:-1.0];
}
Article * article = [[Article alloc] initWithGuid:articleGuid];
article.folderId = folderId;
article.author = newsItem.authors;
article.body = newsItem.content;
if (!newsItem.title || newsItem.title.vna_isBlank) {
NSString *newTitle = newsItem.content.vna_titleTextFromHTML.vna_stringByUnescapingExtendedCharacters;
if (newTitle.vna_isBlank) {
article.title = NSLocalizedString(@"(No title)", @"Fallback for feed items without a title");
} else {
article.title = newTitle;
}
} else {
article.title = newsItem.title;
}
NSString * articleLink = newsItem.url;
if (![articleLink hasPrefix:@"http:"] && ![articleLink hasPrefix:@"https:"]) {
articleLink = [NSURL URLWithString:articleLink relativeToURL:url].absoluteString;
}
if (articleLink == nil) {
articleLink = feedLink;
}
article.link = articleLink;
article.date = articleDate;
NSString * enclosureLink = newsItem.enclosure;
if ([enclosureLink isNotEqualTo:@""] && ![enclosureLink hasPrefix:@"http:"] && ![enclosureLink hasPrefix:@"https:"]) {
enclosureLink = [NSURL URLWithString:enclosureLink relativeToURL:url].absoluteString;
}
article.enclosure = enclosureLink;
if ([enclosureLink isNotEqualTo:@""]) {
[article setHasEnclosure:YES];
}
[articleArray addObject:article];

In conclusion, I am not 100% convinced of the naming either, but I would opt for the most distinguishable instead of the most accurate terms. The wording will be of no importance to the users who keep their settings as before, for them the change will be "Date" -> "Last update". The users who have problematic feeds that are updated in retrospective, they will have the option to change to "Publication date" which might not be totally accurate but have the desired effect, to keep their sorting in the order they originally received the articles.

One refinement may be to not set the article "createddate" to the date where the article was received, if the article itself contains a date, but to the first value we encounter for the date of the article, and only use the date of receiving the article as fallback. Alternatively, we could introduce an additional field in the feed item that we set to the actual publication date in the feed types that provide it, and to the current date for the feed types that don´t, subsequently setting article.publicationDate to that value, taking that date as createddate on the first insert into the database.
That suggestion would result in code like this in -[Database addArticle: toFolder:]:

    // We set the publication date ourselves if it is not contained in the feed
    article.publicationDate = article.publicationDate ?: [NSDate date];
    NSTimeInterval publicationIntervalSince1970 = article.publicationDate.timeIntervalSince1970;

instead of the current way of setting the addedDate ourselves calculating the createdInterval from it.

@TAKeanice
Copy link
Contributor

Please see https://github.com/TAKeanice/vienna-rss/tree/feature/date-added-field for a draft to implement my suggestion, for some reason I don´t seem to be able to open a PR to your repository @Eitot

@Eitot
Copy link
Contributor Author

Eitot commented Jul 20, 2024

Yes, for RSS there is no update date offering a distinction from the creation date. However, it rather reflects the last update than its publication date:

That is for the date of the feed (channel). The feed item's date is set by pubDate or dc:date:

// Parse item date
if ((isRSSElement && [articleItemTag isEqualToString:@"pubDate"]) || ([itemChildElement.prefix isEqualToString:self.dcPrefix] && [articleItemTag isEqualToString:@"date"])) {
NSString *dateString = itemChildElement.stringValue;
newFeedItem.modifiedDate = [self dateWithXMLString:dateString];
continue;
}

In conclusion, I am not 100% convinced of the naming either, but I would opt for the most distinguishable instead of the most accurate terms. The wording will be of no importance to the users who keep their settings as before, for them the change will be "Date" -> "Last update". The users who have problematic feeds that are updated in retrospective, they will have the option to change to "Publication date" which might not be totally accurate but have the desired effect, to keep their sorting in the order they originally received the articles.

One refinement may be to not set the article "createddate" to the date where the article was received, if the article itself contains a date, but to the first value we encounter for the date of the article, and only use the date of receiving the article as fallback. Alternatively, we could introduce an additional field in the feed item that we set to the actual publication date in the feed types that provide it, and to the current date for the feed types that don´t, subsequently setting article.publicationDate to that value, taking that date as createddate on the first insert into the database. That suggestion would result in code like this in -[Database addArticle: toFolder:]

What about RSS feeds though? Would "last update" and "created date" not be the same, if in the case of RSS they are likely both based on pubDate?

@TAKeanice
Copy link
Contributor

TAKeanice commented Jul 21, 2024

I think if there is no distinction like for RSS, either name for the date is fine. For all cases where there is a distinction, the "date" in article is the last updated date, so we should name it to reflect that. Please check the code in my implementation draft for reference

@TAKeanice
Copy link
Contributor

TAKeanice commented Jul 21, 2024

I cleaned up some confusion that I had yesterday, now the code does what I intended it to do

@Eitot Eitot force-pushed the feature/date-added-field branch 4 times, most recently from 4360dd3 to 26eba00 Compare July 21, 2024 18:28
@Eitot Eitot changed the title Add "Date Added" sorting option and table column Add "Date Published" sorting option and table column Jul 21, 2024
@@ -1014,19 +1021,19 @@ - (BOOL)filterArticle:(Article *)article usingMode:(NSInteger)filterMode {
case VNAFilterUnread:
return !article.read;
case VNAFilterLastRefresh: {
NSDate *date = article.createdDate;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be the lastUpdate date.

Copy link
Contributor Author

@Eitot Eitot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TAKeanice I have incorporated your changes, with some changes:

  • the UI string "Last update" I upper-cased to "Last Update" (per HIG guidelines)
  • the UI string "Publication date" I renamed to "Date Published" (this phrasing is more common in English localisations and also shown in the HIG guidelines)
  • "modifiedDate" properties I renamed to "modificationDate" (for consistency with publicationDate)

However, one thing remaining is the "last refresh" filter. This is probably broken now, since Vienna might add entries that have modification date (lastUpdate) and publication date (publicationDate) before the "last refresh".

@barijaona
Copy link
Member

I am OK with "last update date".
I find "publication date" quite weird, as in my mind, a publication date is decided by the author / publisher of the article, and not set by us.
How about "delivery date" ?

@TAKeanice
Copy link
Contributor

I am OK with "last update date".

I find "publication date" quite weird, as in my mind, a publication date is decided by the author / publisher of the article, and not set by us.

Agree with you there, but we get into a distinction problem again: for JSON Feeds the publication date may actually be set by the author, and for atom feeds we at least have a chance to catch the publication date. Only RSS Feeds don't have it. We may set the first encounter of the article date in an RSS feed to the publication date, though. @Eitot feel free to change that at the comment line I added in the RSS feed class.

How about "delivery date" ?

Too technical in my eyes. "Received date" or "Reception" maybe ("Empfangsdatum" in German) if you don't like "Publication date" at all

@TAKeanice
Copy link
Contributor

@TAKeanice I have incorporated your changes, with some changes

Thank you, I appreciate that you added the thoroughness my original changes lacked

However, one thing remaining is the "last refresh" filter. This is probably broken now

Another thing I suspect to be broken now is the intelligent folder feature - I don't remember how much it relies on the names of fields and which constants it uses for that. The persisted filters might fail if they contain one of the changed fields. Additionally, I think an option for filtering the "Publication / Added / ... " date should be included.

@TAKeanice
Copy link
Contributor

Another thing I suspect to be broken now is the intelligent folder feature - I don't remember how much it relies on the names of fields and which constants it uses for that.

The predicate editor seems fine at the first glance.

I found at least one place where there's now inconsistency though, in SmartFolder.m in some predicate templates the hardcoded String @"Date" is used, and in other places the constant for the Date field which is now renamed to MA_LastUpdatedField. We should probably use the constant for all predicate templates.

@barijaona
Copy link
Member

“Reception date” sounds good to me.

@TAKeanice
Copy link
Contributor

I added "Date published" to the intelligent folder predicates on my branch: https://github.com/TAKeanice/vienna-rss/tree/feature/date-added-field
#1770 (comment) is not incorporated yet to keep it in sync with the current state of this PR.

@Eitot
Copy link
Contributor Author

Eitot commented Jul 29, 2024

I am not sure about "reception date" (or "date received") as a substitute for "publication date" (or "date published"). Date received is the date when Vienna received the article from the source. That is not what this field will represent after this PR: it changes the implementation so that the value of that field will be the earliest date provided by the feed, if one is available (created, updated, modified for Atom , whichever is earliest, and date_published for JSONFeed). Also, date received implies a "last update" date too, which is not what this field is (even though the "last refresh" filter makes it seem that way).

I still see it as a challenge that we are attempting to reconcile different meanings of "date", that are inconsistent among the three feed types, in two fields. Recall that what gave rise to the PR is the problem that Vienna changes the visible date of an article even when the user knows that the article is older (#1749). To accommodate this, we sacrifice the "last refresh" filter.

Comment on lines 571 to 572
NSDate *articleDate = [NSDate dateWithTimeIntervalSince1970:[newsItem[@"published"] doubleValue]];
NSDate *publicationDate = [NSDate dateWithTimeIntervalSince1970:[newsItem[@"published"] doubleValue]];
NSDate *lastUpdate = [NSDate dateWithTimeIntervalSince1970:[newsItem[@"updated"] doubleValue]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test on newsItem[@"updated"] being nil or not is needed here: I ended up with articles having "last update" set to year 1970…

@@ -611,7 +612,8 @@ -(void)feedRequestDone:(NSMutableURLRequest *)request response:(NSURLResponse *)
article.link = refreshedFolder.feedURL;
}

article.date = articleDate;
article.publicationDate = publicationDate == nil ? lastUpdate : publicationDate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if neither newsItem[@"published"] nor newsItem[@"updated"] is provided, we should use [NSDate data]

Comment on lines 1511 to 1512
NSTimeInterval publicationIntervalSince1970 =
(article.publicationDate == nil ? [NSDate date] : article.publicationDate).timeIntervalSince1970;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to test on nil here, as article.publicationDate has already been set a few lines ago.

Vienna/Sources/Parsing/RSSFeed.m Show resolved Hide resolved
@TAKeanice
Copy link
Contributor

TAKeanice commented Aug 18, 2024

@barijaona I cleaned up the places you mentioned and made the logic as robust as I could. @Eitot i don´t know why I can´t push to your branch, this used to be possible when ticking "allow edits and access to secrets by maintainers" during the creation of a PR, but maybe I am doing something wrong. Instead, I opened a PR to your branch: Eitot#7

@barijaona
Copy link
Member

… I don´t know why I can´t push to your branch, this used to be possible when ticking "allow edits and access to secrets by maintainers" during the creation of a PR

@TAKeanice , would you try again pushing to this branch ? For some reason, you were visible in the @ViennaRSS/maintainers group, without having the maintainer role…

@TAKeanice
Copy link
Contributor

Thanks for pushing my commits @Eitot . I need to check and test the criteria implementation, if I add any additional commits for that I will try pushing directly again.

This partially reverts dbae950. With the ArticleFieldIDCreatedDate case restored, the explicit value of 415 for ArticleFieldIDEnclosure is no longer needed.
@barijaona
Copy link
Member

I have :

  • rebased the branch on latest master
  • modified @TAKeanice's latest commit (to c928266)
  • added a few other commits related to these changes

@barijaona barijaona requested review from TAKeanice and a team August 25, 2024 00:56
Comment on lines 1618 to 1622
// we never change the publication date, unless the date provided in the feed is prior to it and makes sense
NSDate * publicationDate = existingArticle.publicationDate;
if (article.publicationDate && [article.publicationDate isLessThan:existingArticle.publicationDate]) {
publicationDate = article.publicationDate;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the "and makes sense" part in this? What would happen if it doesn´t "make sense"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this comment is outdated.
At a time, I encountered dates equivalent to 1970 here, and had to add other tests, but I have since solved the issue elsewhere and could remove the extra precautions.

Copy link
Contributor

@TAKeanice TAKeanice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don´t see any logic errors in the code that @barijaona added. Please consider the two comments I made though, to make sure we have not overlooked anything. I also fixed the smart folder to properly deal with the publication date and allow for predicates that contain it, so from my perspective (having seen the code and tested the predicate editor) the PR is good to go live.

TAKeanice and others added 5 commits August 26, 2024 03:58
Only when there is a provided publication date on creation, use that one
without questioning it
Fix publication date not being transmitted by RefreshManager.m

Most dates interpretations/manipulations are now in Database.m
During fetching from feeds, we just retrieve the infos and store them in
relevant Article fields.
This makes the logic more apprehensible and easier to maintain.

Solves issue ViennaRSS#1749

Co-authored-by: Barijaona Ramaholimihaso <[email protected]>
Therefore, regarding dates, adopt in RSSFeed the same logic than in
AtomFeed.
Adapt RefreshManager to this unified logic.
As the meaning of `createddate` has changed, we have to adopt another
approach which provides similar (but not identical) results.
this allows the predicate editor to correctly re-load "date published" predicates.
Also, make the SmartFolder code better by extracting the repeated array of Date expressions and remove superfluous space in predicate translation strings
Copy link
Member

@barijaona barijaona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated a comment, as suggested by @Eitot, and force pushed.

@barijaona barijaona merged commit c2a8992 into ViennaRSS:master Aug 26, 2024
2 checks passed
@Eitot Eitot deleted the feature/date-added-field branch August 31, 2024 09:58
barijaona added a commit to barijaona/vienna-rss that referenced this pull request Sep 12, 2024
This makes the "Last Refresh" filter to behave more like it did before
pull request ViennaRSS#1770 (cf. commit bb06760)
Significant difference though: "Last Refresh" refers to the last time
a particular feed was refreshed and got new articles, while previously
it referred to the last time "Refresh All Subscriptions" was triggered.
barijaona added a commit that referenced this pull request Sep 13, 2024
Improve management of folder's caches of articles

After PR 1770 <#1770> was merged,
 I noticed that the "Last Refresh" filter did not always have the expected behavior.

Investigation led to various discoveries regarding the use of NSCache and parts
of the code where article's status was lost.
This PR solves these issues and gives back a functional "Last Refresh" filter,
with a difference though: "Last Refresh" now refers to the last time each feed
was refreshed and got new articles, while previously it referred to the last time
"Refresh All Subscriptions" was triggered.ent of folder's caches of articles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changes localisations 💬 This pull request adds, changes or removes localisation keys.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants