Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot perform any retrieval for 32GB files #5409

Closed
kernelogic opened this issue Jan 25, 2021 · 25 comments
Closed

Cannot perform any retrieval for 32GB files #5409

kernelogic opened this issue Jan 25, 2021 · 25 comments
Labels
area/markets/client area/markets/retrieval team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs

Comments

@kernelogic
Copy link

kernelogic commented Jan 25, 2021

Describe the bug
I tried several retrievals to several different miners for the deals I made. None of them went through. I do not believe they all offline. I suspect it has something to do with the deal size. These are all 32GB offline deals, made with --fast-retrieval=false.

Example 1 - stuck after DealStatusOngoing
Provider is doing unsealing but it takes hours without going further

lotus client retrieve --miner f024008 bafykbzaced6noeziyglm2frycaknydeijbimvhpk72eimhfupoxd72dehp2k4 abc

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelCreateInitiated (DealStatusPaymentChannelCreating)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelReady (DealStatusPaymentChannelAllocatingLane)
Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)

Example 2 - unmarshalling error
The provider told me his miner crashed upon receiving my retrieval request

lotus client retrieve --miner f083550 bafykbzacecu4qt4tlr5vqojgtlt553rbskqremq67z37x5cufaonjqfh5y7no mysql-2016-04-19.tar.gz.partbl
2021-01-25T04:59:20.145Z WARN rpc [email protected]/client.go:541 unmarshaling failed {"message": "{"Err":"exhausted 5 attempts but failed to open stream, err: failed to dial 12D3KooWFJ6iPAiW82pR7REB8pJfKAsoKoEAhgWKYduoQT1734a9: all dials failed\n * [/ip4/127.0.0.1/tcp/45989] dial tcp4 127.0.0.1:45989: connect: connection refused\n * [/ip6/::1/tcp/41337] dial tcp6 [::1]:41337: connect: connection refused\n * [/ip4/59.12.56.212/tcp/45989] dial tcp4 59.12.56.212:45989: connect: connection refused\n * [/ip4/59.12.56.215/tcp/45989] dial tcp4 0.0.0.0:33463-\u003e59.12.56.215:45989: i/o timeout","Root":null,"Piece":null,"Size":0,"MinPrice":"\u003cnil\u003e","UnsealPrice":"\u003cnil\u003e","PaymentInterval":0,"PaymentIntervalIncrease":0,"Miner":"f083550","MinerPeer":{"Address":"f083550","ID":"12D3KooWFJ6iPAiW82pR7REB8pJfKAsoKoEAhgWKYduoQT1734a9","PieceCID":null}}"}
ERROR: RPC client error: unmarshaling result: failed to parse big string: '"\u003cnil\u003e"'

Example 3 - incomplete response
One of the provider told me his miner is set as allowUnseal=false but his worker is set as allowUnseal=true, yet the retrieval failed before the worker picked up the unseal.

lotus client retrieve --miner f064218 baga6ea4seaqgsbrsupr6az5zf2yjqai5t4xtgn7lf2zd62wnxxq2mgyhlvbxyda publicdomainmovies.tar.05

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventUnsealPaymentRequested (DealStatusAccepted)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane)
Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentRequested (DealStatusFundsNeeded)
Recv: 0 B, Paid 0 FIL, ClientEventSendFunds (DealStatusSendFunds)
Recv: 0 B, Paid 0.1 FIL, ClientEventPaymentSent (DealStatusOngoing)
Recv: 0 B, Paid 0.1 FIL, ClientEventDataTransferError (DealStatusErrored)
Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

lotus client retrieve --miner f01278 bafykbzaceaj7ube4k2vhniqgdb6vq7ggcoebf4c6punse7ike2almme6esdxi mysql-2018-11-01.tar.gz.partcy

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventUnsealPaymentRequested (DealStatusAccepted)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelCreateInitiated (DealStatusPaymentChannelCreating)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelReady (DealStatusPaymentChannelAllocatingLane)
Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentRequested (DealStatusFundsNeeded)
Recv: 0 B, Paid 0 FIL, ClientEventSendFunds (DealStatusSendFunds)
Recv: 0 B, Paid 0.000000000000000002 FIL, ClientEventPaymentSent (DealStatusOngoing)
Recv: 0 B, Paid 0.000000000000000002 FIL, ClientEventDataTransferError (DealStatusErrored)
Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

Example 4 - miner is not accepting online retrieval deals
lotus client retrieve --miner f047419 baga6ea4seaqgsbrsupr6az5zf2yjqai5t4xtgn7lf2zd62wnxxq2mgyhlvbxyda publicdomainmovies.tar.05

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventDealRejected (DealStatusRetryLegacy)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptanceLegacy)
Recv: 0 B, Paid 0 FIL, ClientEventDealRejected (DealStatusRejected)
Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
ERROR: retrieval failed: Retrieve: Retrieval Proposal Rejected: deal rejected: miner is not accepting online retrieval deals

Example 5 - normal shutdown of state machine
This provider is able to serve my retrieval for a 8GB file, but not 32GB ones.

lotus client retrieve --miner f022352 baga6ea4seaqoyyc52ajdq2v7723d3uixeem5ulwvbtv7h45776vrrg53nnf2gjq trusted_setup_phase2.tar.47

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane)
Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)
Recv: 0 B, Paid 0 FIL, ClientEventProviderCancelled (DealStatusCancelling)
Recv: 0 B, Paid 0 FIL, ClientEventDataTransferError (DealStatusErrored)
Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: unable to send cancel to channel FSM: normal shutdown of state machine

lotus version 1.4.1+git.d6c06881e

@flyworker
Copy link

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

@kernelogic kernelogic changed the title Cannot perform any retrival Cannot perform any retrieval Jan 25, 2021
@kernelogic kernelogic changed the title Cannot perform any retrieval Cannot perform any retrieval for 32GB files Jan 25, 2021
@dgreuel2002
Copy link

This is the error I get when running,

lotus-miner retrieval-deals list
1 (1).txt

Also, This is my miner log when the message hit and then it came to an abrupt stop!

2.txt

@dirkmc
Copy link
Contributor

dirkmc commented Jan 25, 2021

Thank you for the bug reports, it's very helpful when they're so well structured like this 👍

@dgreuel2002 I believe the crash you're seeing is coming from this line:
https://github.com/filecoin-project/go-fil-markets/blob/master/retrievalmarket/impl/requestvalidation/revalidator.go#L87
Probably because deal.PricePerByte is zero. Is your miner accepting deals for free?
Related: #3420

@kernelogic

Example 1 - stuck after DealStatusOngoing

Is it possible there are messages stuck in the message pool preventing the deal from proceeding?

Example 2 - unmarshalling error

This sounds like a crash on the miner side, as you suggested.

We will investigate all these issues, thanks again for the detailed report.

@kernelogic
Copy link
Author

@dirkmc do you mean his deal price for "retrieval" is free? How to set the retrieval price anyway? Is it always the same as storage price for unverified deals? I looked into the help of "lotus-miner storage set-ask" but there's no mention of retrieval.

USAGE:
lotus-miner storage-deals set-ask [command options] [arguments...]

OPTIONS:
--price PRICE Set the price of the ask for unverified deals (specified as FIL / GiB / Epoch) to PRICE.
--verified-price PRICE Set the price of the ask for verified deals (specified as FIL / GiB / Epoch) to PRICE
--min-piece-size SIZE Set minimum piece size (w/bit-padding, in bytes) in ask to SIZE (default: 256B)
--max-piece-size SIZE Set maximum piece size (w/bit-padding, in bytes) in ask to SIZE (default: miner sector size)

@dgreuel2002
Copy link

@kernelogic Kernelogic it is
lotus-miner retrieval deals set-ask

@dirkmc yes I do have it set to 0 and price per byte set to .0000000000002

I have no messages stuck as the rest of miner operations are fine and local mpool is clear. the miner abruptly haults as soon as the unseal message hits it. The log I attached is the end of the kiner log as it had haulted and didnt indicate to why. Please let me know if you need more info on this. @stuberman also has this same error when he runs lotus-miner retrieval-deals list.

I am on the road but please reach out for more info if needed.

@dgreuel2002
Copy link

@dirkmc @kernelogic @jennijuju
Here is my get-ask
lotus-miner retrieval-deals get-ask
Price per Byte Unseal Price Payment Interval Payment Interval Increase
0 FIL 0.000000000000000002 FIL 1MiB 1MiB

@stuberman
Copy link

I am storing both sealed and unsealed sectors. I my miner logs I see attempts to retrieve deals, both from unsealed sectors and sealed sectors that all seem to fail. Two sectors transferred to my worker/sealing repo to unseal and are still sitting there idle (sectors 305 and 440). My miner logs also show an unsealed retrieval attempt (sector 330) which also fails.

Here is a sample of logs for sector 330 on miner

2021-01-25T04:53:03.039Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentReceived", "deal ID": "8", "receiver": "12D3KooWEdkgB5RQCAZToi4d3U9GSYwjLR38tqaNVtrwy5KGH81q", "state": "DealStatusUnsealing", "message": ""}
2021-01-25T04:53:03.065Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows
2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}]
2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 1]]
2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0
2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unsealread to window 0
2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330
2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"}
2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T04:53:03.093Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={})
2021-01-25T04:53:03.105Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T04:53:03.140Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}]
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 3]]
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unseal to window 0
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330
2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T04:53:03.156Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:6; a:0): {{0 0} /pool/storage/sealed/s-t01278-330 /pool/storage/cache/s-t01278-330}
2021-01-25T04:53:03.174Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T04:53:03.175Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"}
2021-01-25T04:53:03.175Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T04:53:03.191Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T04:53:03.191Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:6; a:0): {{0 0} /pool/storage/sealed/s-t01278-330 /pool/storage/cache/s-t01278-330}
2021-01-25T04:53:03.199 INFO filcrypto::proofs::api > unseal_range: start
2021-01-25T04:53:03.202 INFO filecoin_proofs::api > unseal_range:start
2021-01-25T04:53:06.493Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0)
2021-01-25T04:53:30.046Z INFO graphsync impl/graphsync.go:299 Graphsync ReceiveError: stream reset
2021-01-25T04:53:37.258Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 7s (nulls: 0)
2021-01-25T04:54:06.311Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0)
2021-01-25T04:54:23.446 INFO storage_proofs_porep::stacked::vanilla::graph > using parent_cache[1073741824 / 1073741824]
2021-01-25T04:54:23.449 INFO storage_proofs_porep::stacked::vanilla::cache > parent cache: opening /home/stuart/parms/v28-sdr-parent-21981246c370f9d76c7a77ab273d94bde0ceb4e938292334960bce05585dc117.cache, verify enabled: false
2021-01-25T04:54:23.449 INFO storage_proofs_porep::stacked::vanilla::proof > multi core replication
2021-01-25T04:54:23.450 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > create labels
2021-01-25T04:54:23.501 INFO storage_proofs_porep::stacked::vanilla::memory_handling > initializing cache
2021-01-25T04:54:23.501 WARN storage_proofs_porep::stacked::vanilla::memory_handling > failed to lock map Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }, falling back
2021-01-25T04:54:23.503 WARN storage_proofs_porep::stacked::vanilla::memory_handling > failed to lock map Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }, falling back
2021-01-25T04:54:23.503 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Layer 1
2021-01-25T04:54:23.503 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Creating labels for layer 1
... (more porep logs)
2021-01-25T08:18:27.278 INFO filecoin_proofs::api > unseal_range:finish
2021-01-25T08:18:28.970 INFO filcrypto::proofs::api > unseal_range: finish
2021-01-25T08:18:28.997Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}]
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 1]]
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unsealread to window 0
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330
2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"}
2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T08:18:29.042Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 }
2021-01-25T08:18:36.008Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0)
2021-01-25T08:18:37.358Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "8", "receiver": "12D3KooWEdkgB5RQCAZToi4d3U9GSYwjLR38tqaNVtrwy5KGH81q", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"}
2021-01-25T08:18:46.451Z WARN storagemarket_network network/deal_status_stream.go:26 stream reset
2021-01-25T08:18:46.451Z ERROR storagemarket_impl impl/provider.go:464 failed to read DealStatusRequest from incoming stream: stream reset

Logs for sector 440 from miner

2021-01-25T05:48:20.221Z INFO dt-impl impl/events.go:338 received new channel request from 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv
2021-01-25T05:48:20.355Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentRequested", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFundsNeededUnseal", "message": ""}
2021-01-25T05:48:20.488Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDealAccepted", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFundsNeededUnseal", "message": ""}
2021-01-25T05:48:20.803Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentReceived", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusUnsealing", "message": ""}
2021-01-25T05:48:20.815Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows
2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}]
2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 3]]
2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 440 to window 0
2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 440 task seal/v0/unseal to window 0
2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 440
2021-01-25T05:48:20.825Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T05:48:20.833Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "5036f473-8aed-478a-b65e-3b1f3281ed20"}
2021-01-25T05:48:20.833Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows
2021-01-25T05:48:20.839Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventUnsealError", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFailing", "message": "storage call error 0: unsealing sector: acquire unsealed sector path (existing): allocate local sector for fetching: couldn't find a suitable path for a sector"}
2021-01-25T05:48:20.888Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={})
2021-01-25T05:48:20.956Z INFO dt-impl impl/impl.go:315 close channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229
2021-01-25T05:48:20.981Z WARN dt-impl impl/impl.go:323 unable to close channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229: channel not found
2021-01-25T05:48:20.981Z INFO dt-impl impl/impl.go:326 12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7: sending close channel to 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv for channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229
2021-01-25T05:48:20.997Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"}

Logs from lotus-worker for 440

2021-01-25T05:38:36.418Z INFO stores stores/remote.go:224 Fetch http://192.168.1.95:2345/remote/sealed/s-t01278-440 -> /seal/worker/sealed/fetching/s-t01278-440
2021-01-25T05:43:30.991Z DEBUG stores stores/util_unix.go:28 move sector data {"from": "/seal/worker/sealed/fetching/s-t01278-440", "to": "/seal/worker/sealed/s-t01278-440"}
2021-01-25T05:43:30.994Z INFO stores stores/remote.go:224 Fetch http://192.168.1.95:2345/remote/cache/s-t01278-440 -> /seal/worker/cache/fetching/s-t01278-440
2021-01-25T05:43:31.905Z DEBUG stores stores/util_unix.go:28 move sector data {"from": "/seal/worker/cache/fetching/s-t01278-440", "to": "/seal/worker/cache/s-t01278-440"}
2021-01-25T05:43:31.908Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 440} 8} (e:6; a:0): {{0 0} /seal/worker/sealed/s-t01278-440 /seal/worker/cache/s-t01278-440}
2021-01-25T05:48:20.831Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 440} 8} (e:6; a:0): {{0 0} /seal/worker/sealed/s-t01278-440 /seal/worker/cache/s-t01278-440}

@kernelogic
Copy link
Author

That align with my attempt, @stuberman . All end up with deal data transfer failed: incomplete response.

@s0nik42
Copy link

s0nik42 commented Jan 26, 2021

have you tried on fast-retrieval=true deals ?

@s0nik42
Copy link

s0nik42 commented Jan 26, 2021

fast-retrieval=true just send you a local copy of the file that doesn't required any computation

@s0nik42
Copy link

s0nik42 commented Jan 26, 2021

I'm going to try now. Let you know.

@kernelogic
Copy link
Author

No I haven't tried. I don't want to waste too much space of my providers so all my slingshot deals were sent as fast-retrieval=false. Now this decision might come back bite me as slingshot judges might deduct rewards for failed retrievals.

@s0nik42
Copy link

s0nik42 commented Jan 26, 2021

seems like miners are now blocking retrieval deals to protect their miners ...

@kaptinlin
Copy link

We have the same unmarshalling error issue.

ERROR: RPC client error: unmarshaling result: failed to parse big string: '"\u003cnil\u003e"'

@dgreuel2002
Copy link

I am still getting this when I do "lotus-miner retrieval-deals list" this happened after @kernelogic sent me this deal in this post and it crashed miner....has there been any movement on this or should I open a separate issue? @s0nik42

3.txt

@dirkmc
Copy link
Contributor

dirkmc commented Feb 3, 2021

@dgreuel2002 the fix is in #5493, however this has not been released yet. It should go out in the next lotus release.

@s0nik42
Copy link

s0nik42 commented Feb 4, 2021

@dirkmc are you talking about

  • lotus-miner retrieval-deals list output garbage or
  • miner crash when unsealing ?

@dgreuel2002
Copy link

@s0nik42 I have updated to the new version and am still getting the lotus-miner retrieval-deals list output garbage

@s0nik42
Copy link

s0nik42 commented Feb 4, 2021

yes, I'm not sure these 2 issues are related

@dirkmc
Copy link
Contributor

dirkmc commented Feb 5, 2021

Sorry I didn't look at the log, I just read your message.

#5493 should fix the crash when deal price per byte is zero

@jennijuju
Copy link
Member

We believe the unsealing issue is in the proof instead of lotus, track here filecoin-project/rust-fil-proofs#1402.

@neondragon
Copy link

neondragon commented Mar 17, 2021

I am seeing 'Stuck after DealStatusOngoing' on my miner, when trying to retrieve a file that is already unsealed. The miner tries to unseal it again.

lotus-miner version 1.5.2+mainnet+git.cd293aa72.dirty

Client:

~# lotus client retrieve --miner f019551 bafykbzaceb3lrwftnl5vn7msktfal44jqexs2yz6z2kkz4xvnh7cnf5towzdk test.file
Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew)
Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance)
Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted)
Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane)
Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)

miner.log:

{"level":"info","ts":"2021-03-17T09:23:14.227Z","logger":"dt-impl","caller":"impl/events.go:338","msg":"received new channel request from 12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5"}
{"level":"info","ts":"2021-03-17T09:23:14.236Z","logger":"markets","caller":"loggers/loggers.go:30","msg":"retrieval provider event","name":"ProviderEventOpen","deal ID":"1","receiver":"12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5","state":"DealStatusNew","message":""}
{"level":"info","ts":"2021-03-17T09:23:14.237Z","logger":"markets","caller":"loggers/loggers.go:30","msg":"retrieval provider event","name":"ProviderEventDealAccepted","deal ID":"1","receiver":"12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5","state":"DealStatusUnsealing","message":""}

~# lotus-miner sealing jobs

ID Sector Worker Hostname Task State Time
4fe081b0 1369 1224d871 gb-wlv1-filecoin2 RD running 19m59.9s

~# lotus-miner storage find 1369

In 2aaec1c2-9ad6-4563-8f59-0ac57179649e (Unsealed, Sealed, Cache)
Sealing: false; Storage: true
...

UPDATE:
After a while, the lotus client began to download the data. The cause of the delay seems to be lotus-miner taking a long time copying the unsealed sector from the storage location into .lotusminer/datastore/staging/ before starting to send it to the client.
The deal status transitioned through "DealStatusUnsealing" which was misleading -- no actual unseal task was scheduled or run. The transfer is in progress (13GB so far), and is certainly a lot slower than a file transfer could be on the local 10GbE network, but it is working.

UPDATE2:
The 32GiB download completed, it just took quite a long time. It also seemed to start faster and then get progressively slower the closer it got to 32GiB.

@flyworker
Copy link

with lotus 1.5.0

Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.349Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventUnsealComplete", "deal ID": "12", "receiver": "12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA", "state": "DealStatusUnsealed", "message": ""}
Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.376Z INFO dt-impl impl/impl.go:370 resume channel 12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA-12D3KooWHhJjBsoAom7ong97Q4FJeNP2TbCdKCsYvnFB7DHMn5RM-12
Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.376Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={})
Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.442Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "12", "receiver": "12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"}

@dkkapur
Copy link
Contributor

dkkapur commented Apr 26, 2021

@kernelogic - any updates on this based on changes introduced in 1.6 and 1.7? Is this still an ongoing problem?

@dkkapur
Copy link
Contributor

dkkapur commented May 10, 2021

Going to close this out for the time being. Please open a new Issue for specific issues you may still be experiencing.

@dkkapur dkkapur closed this as completed May 10, 2021
@TippyFlitsUK TippyFlitsUK removed the need/team-input Hint: Needs Team Input label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/markets/client area/markets/retrieval team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

No branches or pull requests