Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlWeb Load() method returns empty HtmlDocument when using cache #15

Closed
AntonKedrov opened this issue Jun 10, 2017 · 13 comments
Closed
Assignees

Comments

@AntonKedrov
Copy link

Hi,

I'm using HtmlAgilityPack 1.5.0-beta6 in .NETCoreAPP 1.1 console application.
Following code outputs empty string. Once I set UsingCache = false it will output page content, as expected.

var web = new HtmlAgilityPack.HtmlWeb()
{
	CachePath = @"C:\Users\Anton\Documents\Cache",
	UsingCache = true
};

var url = "https://github.com/zzzprojects/html-agility-pack";
var doc = web.Load(url);
Console.WriteLine(doc.DocumentNode.InnerText);
@JonathanMagnan JonathanMagnan self-assigned this Jun 13, 2017
@JonathanMagnan
Copy link
Member

Hello @AntonKedrov ,

Thank you for reporting. We will investigate this issue over the weekend and provide you more information very soon.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @AntonKedrov ,

The v1.5.0-beta8 has been released:
https://www.nuget.org/packages/HtmlAgilityPack/

The content is now stored in the CachePath and loaded in the HtmlDocument as expected.

Let us know if this issue is fixed on your side.

Best Regards,

Jonathan

@AntonKedrov
Copy link
Author

Unfortunately, it still doesn't work. web.Load(url) still returns uninitialized document.

BTW, I think there is a mistake in the unit test. I think there should be Assert.AreEqual instead of Console.WriteLine. :)

[Test]
public void TestLoadWithCache()
{
	...

	var docLoad = new HtmlAgilityPack.HtmlWeb().Load(url);
	Console.WriteLine(docLoad.DocumentNode.OuterHtml, docCache.DocumentNode.OuterHtml);
}

Best regards,
Anton

@JonathanMagnan
Copy link
Member

... omg!

We will look at it again during the weekend.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @AntonKedrov ,

We fixed the test. However, the code is still working fine for us with the v1.5.0-beta8.

Could you verify on your side to make sure this is not an issue related to an assembly caching and still using the old version? By example, by making sure to clean the solution first.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @AntonKedrov ,

We still have no received any update from your part concerning this issue.

Best Regards,

Jonathan

@AntonKedrov
Copy link
Author

Hi,

I'm really sorry for delayed response.

  1. To ensure that I have latest package I have removed C:\Users\Anton\.nuget\packages folder
  2. Removed my cache folder C:\Users\Anton\Documents\Cache
  3. Then created a new .NET Core Console App project (.NET Framework 4.7)
  4. Installed HtmlAgilityPack 1.5.0-beta92
  5. Copy-pasted code from the first post

In result C:\Users\Anton\Documents\Cache\github.com\zzzprojects directory created with two files: html-agility-pack and html-agility-pack.h.xml. But HtmlWeb.Load(url) still return empty document. (i.e. doc.DocumentNode.FirstChild is null, doc.DocumentNode.InnerText is empty string, etc.)
And once I set HtmlWeb.UsingCache to false, it works as expected.

I'm using following workaround. I'm checking if page is cached (using HtmlWeb.GetCachePath(uri) to get the path and then File.Exists(cachedPagePath)) and if it is cached, then I set HtmlWeb.CacheOnly to true before calling HtmlWeb.Load(url). In this case cached document will be loaded correctly.

@JonathanMagnan
Copy link
Member

Hello @AntonKedrov ,

Unfortunately, we are not able to reproduce it.

This feature works very well for my developer and me.

We are currently not sure how to reproduce it since we don't have the error whatever we try ;(

I believe the only solution currently is on your side downloading the source and trying to figure out what is not working.

Best Regards,

Jonathan

@AntonKedrov
Copy link
Author

Sure, I'll try to help.

I cloned the repo and it seems that problem is in NETSTANDARD version of Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) method. Note that it is present in two projects: HtmlAgilityPack and HtmlAgilityPack.Shared.

Non-NETSTANDARD code looks like this:

if (UsingCache)
{
    // NOTE: LastModified does not contain milliseconds, so we remove them to the file
    SaveStream(s, cachePath, RemoveMilliseconds(resp.LastModified), _streamBufferSize);

    // save headers
    SaveCacheHeaders(req.RequestUri, resp);

    if (path != null)
    {
        // copy and touch the file
        IOLibrary.CopyAlways(cachePath, path);
        File.SetLastWriteTime(path, File.GetLastWriteTime(cachePath));
    }

    if (_usingCacheAndLoad)
    {
        doc.Load(cachePath);
    }
}

And NETSTANDARD code is like this:

if (UsingCache)
{
	// NOTE: LastModified does not contain milliseconds, so we remove them to the file
	SaveStream(s, cachePath, RemoveMilliseconds(response.Content.Headers.LastModified), _streamBufferSize);

	// save headers
	SaveCacheHeaders(request.RequestUri, response);

	if (path != null)
	{
	    // copy and touch the file
	    IOLibrary.CopyAlways(cachePath, path);
	    File.SetLastWriteTime(path, File.GetLastWriteTime(cachePath));
	}
}

I think that doc.Load(cachePath) code is missing here.

if (_usingCacheAndLoad)
{
    doc.Load(cachePath);
}

Best regards,
Anton

@JonathanMagnan
Copy link
Member

Thank you,

We will fix it today and check at the same time why our .NET Core project didn't raised this error.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

JonathanMagnan commented Jul 11, 2017

Hello @AntonKedrov ,

We have fixed our .NET Core project and we can now successfully reproduce this issue.

The V1.5.2-beta2 has been released.

It should now correctly work with your scenario.

Thank you for giving us the solution ;)

Best Regards,

Jonathan

@AntonKedrov
Copy link
Author

I can confirm that it works now. Thank you! :)

Best regards,
Anton

@JonathanMagnan
Copy link
Member

Closing Comment: Fix confirmed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants