Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing lockups writing to NSUserDefaults #518

Closed
gestrich opened this issue Mar 19, 2016 · 21 comments
Closed

Seeing lockups writing to NSUserDefaults #518

gestrich opened this issue Mar 19, 2016 · 21 comments

Comments

@gestrich
Copy link

We are seeing NSUserDefaults locking up for a small number of our users after integrating Segment. One thread is waiting on the mutex for CFPrefsSource, but no other threads seem to have acquired the mutex. This is the only thread writing/reading to NSUserDefaults. Then, if another thread tries to write to NSUserDefaults from the main thread, the app will lock up. Any idea why this call is getting stuck? We can't reproduce it but the users that have the problem have it consistently until we disable segment analytics for them. We assume maybe there is an issue with the data attempting to be written (ex mutable data in the structure?)

Thread 15:
0 libsystem_kernel.dylib 0x00000001835a7f90 __psynch_mutexwait + 8
1 CoreFoundation 0x0000000183a4260c -[CFPrefsSource setValue:forKey:] + 44
2 CoreFoundation 0x0000000183a41bbc +[CFPrefsSource withSourceForIdentifier:user:byHost:container:perform:] + 836
3 CoreFoundation 0x0000000183a453cc _CFPreferencesSetValueWithContainer + 300
4 Foundation 0x00000001842f1b78 -[NSUserDefaults(NSUserDefaults) setObject:forKey:] + 52
5 ForeFlight 0x00000001010fcba0 -SEGSegmentIntegration persistQueue
6 ForeFlight 0x00000001010fb5c8 -SEGSegmentIntegration queuePayload:
7 ForeFlight 0x00000001010fb4a4 __71-[SEGSegmentIntegration enqueueAction:dictionary:context:integrations:]_block_invoke (SEGSegmentIntegration.m:389)
8 libdispatch.dylib 0x0000000183459630 _dispatch_call_block_and_release + 20
9 libdispatch.dylib 0x00000001834595f0 _dispatch_client_callout + 12
10 libdispatch.dylib 0x0000000183465634 _dispatch_queue_drain + 860
11 libdispatch.dylib 0x000000018345d0f4 _dispatch_queue_invoke + 460
12 libdispatch.dylib 0x00000001834595f0 _dispatch_client_callout + 12
13 libdispatch.dylib 0x0000000183467a88 _dispatch_root_queue_drain + 2136
14 libdispatch.dylib 0x0000000183467224 _dispatch_worker_thread3 + 108
15 libsystem_pthread.dylib 0x000000018366d470 _pthread_wqthread + 1088
16 libsystem_pthread.dylib 0x000000018366d020 start_wqthread + 0

@f2prateek
Copy link
Contributor

Do you know which thread has acquired the lock?

@gestrich
Copy link
Author

Oddly enough, no other thread shows that it has acquired the lock. We can take thread snapshots by forcing the app to crash. Out of all the threads, only one involves NSUserDefaults/CFPrefsSource, -- the one that shows it is waiting. This issue is only occurring for a small handful of our users but the locked thread cannot be resolved until disabling analytics. Is it possible that persistQueue is doing an endless cycle of updates?

@f2prateek
Copy link
Contributor

Unlikely since the code isn't recursive. We call persistQueue under the following conditions —

  1. Event is queued
  2. Event is uploaded/dequeued
  3. App is terminating

@gestrich
Copy link
Author

We now have > 10 customers with this lockup issue now which gave the opportunity to do a lot of experimentation with the problem over the last few weeks.

Customers were reporting that their physical location seemed to be triggering the lockups, even while connected to the same network. For instance, we had a customer on a cellular connection that triggered the bug by altering his position within a few dozen meters. Walk to location A, he has the freeze-ups. Walk to location B, the bug immediately goes away. Location A and B both have a strong cellular connection. This seemed very peculiar. Nonetheless, we asked him to turn off Location Services to see if location was really playing a role. The freeze ups immediately stopped with location services disabled. He repeated this test about 10 times, turning location services on/off and was able to confirm that it was trigging the lockups.

We asked this customer to download a special build of ours using SegAnalytics but with shouldUseLocationServices == NO (it was set to YES). We verified that this change stops the freeze ups (Even with Location Services now ON in Apple Settings). It is our plan in our next release to set shouldUseLocationServices = NO for SegAnalytics. (side note that we use Location Services extensively throughout our app and have never seen these lockups before)

Admittedly, the underlying cause of this issue is not clear to us. If you have any information or ideas for collecting location diagnostic data, please let us know.

@f2prateek
Copy link
Contributor

Thanks, definitely helpful. Are you seeing this for all customers and on your own devices, or just a few?

@gestrich
Copy link
Author

We are only receiving reports for a small subset of our customers; however, those that do get it will continually get lockups (every few minutes). We have not been able to duplicate the issue at our location.

@underscoretang
Copy link

we're seeing this as well. getting this error: "Attempt to set a non-property-list object as an NSUserDefaults/CFPreferences value for key SEGQueue"

The convention for NSUserDefaults is for storing user settings, not for persisting objects.

https://github.com/segmentio/analytics-ios/blob/master/Pod/Classes/Internal/SEGSegmentIntegration.m#L581

@vphamdev
Copy link

NSUserDefaults should not be used as a cache like this. This is going to cause lots of issues for clients with large numbers of events.

@f2prateek
Copy link
Contributor

f2prateek commented Apr 23, 2016

@underscoretang

"Attempt to set a non-property-list object as an NSUserDefaults/CFPreferences value for key SEGQueue"

That is related to #511 (comment).

@johnryan
Copy link
Contributor

Any update on a fix for this?

@f2prateek
Copy link
Contributor

@johnryan which issue? there are a few different ones in this thread

@johnryan
Copy link
Contributor

@f2prateek I was referring to the deadlocking in user defaults.

@f2prateek
Copy link
Contributor

@johnryan do you have a reproducible case we can try to simulate this?

@johnryan
Copy link
Contributor

@f2prateek it seems related to #511 If you try and track an event that has NSNull values in the dictionary, NSUserDefaults becomes unresponsive.

@f2prateek
Copy link
Contributor

f2prateek commented May 27, 2016

I see, I still think it's still best to avoid NSNull in the dictionary as I explained above #511 (comment).

That being said I'm happy to revert the change (#488) and have a better way to support tvOS. I'm out but @wcjohnson11 will take care of this this week or early next week.

@underscoretang
Copy link

I think it's in the best interest of everyone for Segment to handle NSNull as values gracefully and not cause our apps to crash instead of coming up with some nonsense about how NSNulls shouldn't be in dictionaries because it's ambiguous.

If there was even a mention of this here: https://segment.com/docs/libraries/ios/ I would give you some credit, but there isn't. You can search "null" on this page and there's no results. Your documentation should say in bold letters at the top.

"WARNING: ADDING NSNULL AS A VALUE TO PROPERTIES DICTIONARIES WILL CAUSE YOUR APP TO HANG."

But alas, it doesn't say that. And we've literally waited a month for you to make this change, and were told early this week that it was in your sprint now. Yet still.. here you are defending your position when you could do something as simple as parsing the dictionary for NSNull.. and removing those properties before inserting them into NSUserDefaults and preventing apps from hanging.

If the tone of this comment has come of as unprofessional, it's because I'm livid.

@danieljimenez
Copy link

^ couldn't agree more.

@f2prateek
Copy link
Contributor

f2prateek commented May 28, 2016

I think there are 2 different issues being mixed up here and let me address them separately.

First one is using NSUserDefaults as the event queue. As I mentioned earlier, Will is reverting the change this sprint and improving how we support tvOS by detecting the platform and using the appropriate storage mechanism.

The 2nd issue is about null values.

coming up with some nonsense about how NSNulls shouldn't be in dictionaries because it's ambiguous.

Maybe my explanation wasn't clear enough, but unfortunately the reality is that this is very much an ambiguous case with Segment.

Let me try to explain with a more concrete example. Say you identified a user with some address, and later a null address as a trait as a way to indicate that the users address should be deleted. But since our integrations are stateless, they are not aware the user previously had an address and may interpret the null differently.

e.g. consider AppBoy and Kahuna. AppBoy will simply ignore the address for this call and leave the old address on this user profile https://github.com/segment-integrations/integration-appboy/blob/master/lib/mapper.js#L147. On the other hand, Kahuna will simply use this null address as is and overwrite any old address associated with the user
https://github.com/segment-integrations/integration-kahuna/blob/master/lib/mapper.js#L52.

I hope this example explains why the behaviour is ambiguous. My preferencre is to avoid ambiguous cases like this and prevent people from running into it, which is why I suggested we disallow null values entirely. I do agree that we can document this limitation better, and the current experience of seeing lockups because of this was a sub par experience, and I'm sorry about that.

Moving forwards on the null issue, we have 2 options:

  1. Clean up null values in the library before we process the message.
  2. Throw a runtime error that the library disallows invalid values.

My personal preference is for the latter (with appropriate documentation) since the former could make some people confused why Segment is ignoring a null field. But I can see why as a consumer of the library 1 is more convenient if you understand that Segment will ignore null fields.

It does seem to me that the general preference from a this thread is option 1, so I can definitely implement that as the default behavior!

@johnryan
Copy link
Contributor

I don't really agree with the runtime error but I do see how it can be confusing with different integrations. The problem with the runtime error is that you may not know about this behavior until it starts crashing in production. It seems like it would be ok to replace nil or NSNull values with a "null" string for integrations that ignore null values. That's atleast what I would expect as a consumer of the API.

@f2prateek
Copy link
Contributor

Interesting, but would a "null' string will work in all cases? Since we can't guess the type of a custom field if the value provided is null. The type of that field might be a number or dictionary, and if we replace NSNull with "null", it could cause more confusing behavior.

Makes sense about the runtime error. But it's hard to say at what point it's appropriate to try to automatically correct vs throw an error (e.g. NSUserDefaults throws an error).

Though since we inadvertently did support accepting null values in previous versions, I think it makes sense to automatically clean up the dictionary for now (either by deleting files with nulls or accepting as is and let tools interpret it differently).

Will's upcoming change will revert the null handling behavior to old vers. If we do decide to change this behavior, we can make it part of a major bump as a breaking change.

@f2prateek
Copy link
Contributor

Reverted as of 3.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants