Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DAS FS fallback to old layout, fix health check #2510

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

Tristan-Wilson
Copy link
Member

This PR

  • adds a fallback when getting data from the LocalFileStorageService to also try getting the data using the old filesystem layout, and
  • fixes an issue with the health check creating too many by-expiry-timestamp entries.

As of release 3.1.0, the DAS filesystem layout automatically migrates to the new trie based layout if the migration is detected to not have occurred yet, based on the absence of the by-data-hash and by-expiry-timestamp directories. If an operator rolls back to an older version, some more batch data is written with the old layout, and then rolls forward again, only the new layout data will be detected. This PR makes the GetByHash method also try to get the data using the old layout to handle these cases gracefully.

The issue of the health check creating too many by-expiry-timestamp entries is also fixed. The /health REST endpoint and HealthCheck JSON-RPC method for the RPC server when used with the local file store do a full end to end test of the storage backend by writing some data and reading it back. It always writes the same data to save on storage space, but previously it wrote a different expiry time each time. This cause the by-expiry-timestamp index to be spammed with entries for this same data and eventually to fail with "too many links"

Testing done

Backwards compat

Set up a data directory with some data in new layout and some in old layout.

$ find data/ -name 'e85af8282a9d103e2b5ada07ee6834ffa0f8ceaa6ea051378c9d0177a5cbe643'
data/e85af8282a9d103e2b5ada07ee6834ffa0f8ceaa6ea051378c9d0177a5cbe643
$ find data/ -name '5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b9'
data/by-data-hash/5a/19/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b9
data/by-expiry-timestamp/171963/2466/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b9

Check that the data from both layouts is fetchable.

$ curl -v http://localhost:9877/get-by-hash/e85af8282a9d103e2b5ada07ee6834ffa0f8ceaa6ea051378c9d0177a5cbe643
* Host localhost:9877 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying 127.0.0.1:9877...
* Connected to localhost (127.0.0.1) port 9877
> GET /get-by-hash/e85af8282a9d103e2b5ada07ee6834ffa0f8ceaa6ea051378c9d0177a5cbe643 HTTP/1.1
> Host: localhost:9877
> User-Agent: curl/8.6.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Cache-Control: public, max-age=1
< Date: Mon, 22 Jul 2024 12:39:40 GMT
< Content-Length: 708
< Content-Type: text/plain; charset=utf-8
< 
{"data":"ABtDAwBkAK+1QOq6AC/LT0Aujsc2s10glGAQSsTPHGyMiWd07E/i4aC2g5mkcfCxt47rA/sAPOyPLUjNEAU0D/YnAAXfhoFY7nTV3CcGW8fwwvFrliJDQgW1p/JqFgzy6AH7iVmxFAAAAAAAmDxrBTacCRhwNHjZ9sNRp/O044l+7VEWTGCryFrlbiij0DLaJu3rX2fQ3jbFzhHwJZlMjXzvSMwKMR/c7JVYmtA06s3RMSVRQoHAoQ3096nLKc/sQvq5nYdye2EhEjYfUqz0XGezQCNur1ui8srmiwaoCRjQJ5TJkHp7oM3HpY+WvfGt7+56zUaS4pYphvI52PwgQwBkN8Fh/45gElCR4I49OXzjnz95K68vyQzrKpwxHKQZbAqBw4DAww18T9AnpzrzyG/kCjB85/9zWeXq/v5/SU4PknlCV+BDFPpjIdCyVd5ENBiVkdgk4fYslL1o3rsQTOt5ixIfnXjT0DxK8+Y44IJODPI3oAj5NHNsTs1YM7g+vEQ9KHzJjywssEP8XgcroAtiw0LhIJYhUHgIwvZA/g/KcuqNobck18o+uobh0toOW9O5qhBpKNCnOZJqdl25FCD3vVBjIhr8hYNZ23/qfGzmBaeQyDj0tOorJfKYiTM//8XfTLiFWRBUqf+fxpP+RfT2l8d6TwXDjyVsCslLM09UqyA+VzlBnUk="}


$ curl -v http://localhost:9877/get-by-hash/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce98419
32b9                                                                                                    
* Host localhost:9877 was resolved.                                                                     
* IPv6: ::1                                                                                             
* IPv4: 127.0.0.1                                                                                       
*   Trying 127.0.0.1:9877...                                                                            
* Connected to localhost (127.0.0.1) port 9877                                                          
> GET /get-by-hash/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b9 HTTP/1.1            
> Host: localhost:9877                                                                                  
> User-Agent: curl/8.6.0                                                                                
> Accept: */*                                                                                           
>                                                                                                       
< HTTP/1.1 200 OK                                                                                       
< Cache-Control: public, max-age=1                                                                      
< Date: Mon, 22 Jul 2024 12:40:13 GMT                                                                   
< Content-Type: text/plain; charset=utf-8                                                               
< Transfer-Encoding: chunked                                                                            
<                                                                                                       
{"data":"ABtNu0BmAM65LesRz+ZafH7iIl5PfyJe/xOn1srscQdIRrJKnfC7Vdsd7OBuiGgSdkCWVGp9ks7WhKZGbS7EjtnFXRyB6iC
+RIkzByqAO1U9VT8PeuerxcwTK/tCuJe3WyfU16p79cI9lAzNZnHytfoaJ4NEi/+PR8NE2oYeSkbDRgK5v/h0eAJgQU8ukLJfHOCI6Q9...

Check data that doesn't exist (last byte of hash changed) is� still handled correctly

$ curl -v http://localhost:9877/get-by-hash/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b1
* Host localhost:9877 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying 127.0.0.1:9877...
* Connected to localhost (127.0.0.1) port 9877
> GET /get-by-hash/5a19d56910317880cc56d88c2ee757f64fcf661e8babd9cba1b08ce9841932b1 HTTP/1.1
> Host: localhost:9877
> User-Agent: curl/8.6.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Cache-Control: public, max-age=1
< Date: Mon, 22 Jul 2024 12:40:32 GMT
< Content-Length: 0
< 
* Connection #0 to host localhost left intact

Health check fix

Used the /health endpoint several times to create the test data.

$ curl -v http://localhost:9877/health
* Host localhost:9877 was resolved.                                                                     
* IPv6: ::1                                                                                             
* IPv4: 127.0.0.1                                                                                       
*   Trying 127.0.0.1:9877...                                                                            
* Connected to localhost (127.0.0.1) port 9877                                                          
> GET /health HTTP/1.1                                                                                  
> Host: localhost:9877                                                                                  
> User-Agent: curl/8.6.0                                                                                
> Accept: */*                                                                                           
>                                                                                                       
< HTTP/1.1 200 OK                                                                                       
< Cache-Control: public, max-age=1                                                                      
< Date: Mon, 22 Jul 2024 12:19:05 GMT                                                                   
< Content-Length: 0                                                                                     
<        

Test that it is pruned on startup along with other files needing pruning if pruning is enabled. Then create it again and check it is pruned on next pruning iteration.

$ ../../nitro/target/bin/daserver --conf.file test-backwards-compat.cfg  2>&1 | tee back-02.txt
INFO [07-22|14:18:40.641] Starting REST server                     addr=localhost port=9877 revision=4fdae71-modified vcs.time=2024-07-22T11:41:58Z
INFO [07-22|14:18:40.643] Local file store pruned expired batches  count=4 pruneTil=2024-07-22T14:18:40+0200 duration=1.894951ms
INFO [07-22|14:23:40.645] Local file store pruned expired batches  count=1 pruneTil=2024-07-22T14:23:40+0200 duration="561.789µs"

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Jul 22, 2024
@Tristan-Wilson Tristan-Wilson enabled auto-merge July 22, 2024 13:31
Copy link
Contributor

@gligneul gligneul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The explanation made super easy to review.

Copy link
Member

@joshuacolvin0 joshuacolvin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tristan-Wilson Tristan-Wilson merged commit 1905917 into master Jul 22, 2024
15 checks passed
@Tristan-Wilson Tristan-Wilson deleted the das-fs-backwards-compat branch July 22, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants