Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosing events after upgrading to 1.5.7 #324

Closed
xmollv opened this issue Apr 4, 2024 · 11 comments
Closed

Loosing events after upgrading to 1.5.7 #324

xmollv opened this issue Apr 4, 2024 · 11 comments
Assignees
Labels

Comments

@xmollv
Copy link

xmollv commented Apr 4, 2024

Describe the bug

A few days ago our team realized that some analytics were wrong, and after some investigation it seems that something happened after 1.5.5 where Segment is loosing tracked events. Let me explain:

We had build 1.2 on the App Store using Segment 1.5.5. When we released 1.3 (and 1.4), the only change about analytics was updating Segment to 1.5.7. After doing that, we realized on Mixpanel (what we use to read Segment's data) that some events were totally wrong. We have an onboarding flow where you do steps A → B → C. To reach C, you must have gotten through A & B. Since 1.3, we were seeing that some users reached C without going through A & B, which is impossible.

We then released 1.4.1 where the only change was downgrading Segment to 1.5.5. Since then, we see all events back to normal. Not a single user has managed to skip A and/or B, which leads me to believe that in some of the latest versions of Segment there's a bug somewhere that's leading to data loss.

Here's a Mixpanel graph of the discrepancy in the data. Observe how March 23rd the lines start diverging (release of our 1.3 app version which included Segment's 1.5.7 SDK) until April 3rd, where almost all of our users are on 1.4.1 (which means they're using the now downgraded version of Segment, that is 1.5.5).

Screenshot 2024-04-04 at 10 45 50

To Reproduce

Unclear. Seems to start happening on versions > 1.5.5 of the SDK.

Expected behavior

No data loss of tracked events.

Screenshots

See above.

Platform (please complete the following information):

  • Library Version in use: 1.5.5 and 1.5.7
  • Platform being tested: iOS
  • Integrations in use: Mixpanel

Additional context

N/A

@xmollv xmollv added the triage label Apr 4, 2024
@xmollv xmollv changed the title Loosing events after updating to 1.5.7 Loosing events after upgrading to 1.5.7 Apr 4, 2024
@alanjcharles
Copy link
Collaborator

Hi @xmollv would you mind reaching out to [email protected] with this report? They have more access to your segment event logs than we do on our side. They will be able to help with a more detailed investigation to get to the bottom of this and will escalate any outstanding issues from that investigation to us. Thanks!

@bsneed
Copy link
Contributor

bsneed commented Apr 4, 2024

In addition to what @alanjcharles stated, I'd first try the latest. There were some previously resolved issues that could result in a situation that you're describing. Feel free to reopen and/or reply to this ticket if you have any more questions for us.

@bsneed bsneed closed this as completed Apr 4, 2024
@xmollv
Copy link
Author

xmollv commented Apr 5, 2024

Hi @xmollv would you mind reaching out to [email protected] with this report? They have more access to your segment event logs than we do on our side. They will be able to help with a more detailed investigation to get to the bottom of this and will escalate any outstanding issues from that investigation to us. Thanks!

Cool! I've just got in touch with that email, if and when there's a 'solution' found I'll update this ticket for anyone that might have experienced the same issue.

In addition to what @alanjcharles stated, I'd first try the latest. There were some previously resolved issues that could result in a situation that you're describing. Feel free to reopen and/or reply to this ticket if you have any more questions for us.

The thing is, we already lost 2 weeks of 'good' data on production. We can't afford to push a build with a broken analytics SDK. How confident are you that what could have been broken on > 1.5.5 is fixed on 1.5.9? If I had a way to reproduce the issue reliably I wouldn't mind testing it myself, but I don't know exactly how to reproduce the events being lost.

@xmollv
Copy link
Author

xmollv commented Apr 5, 2024

@bsneed @alanjcharles I've been poking at the diffs from 1.5.5...1.5.9. I'm pretty sure the issue we are facing was introduced in this PR #304, which seemed to ship with 1.5.6. After that, I don't see any fixes related that refactor (besides access levels), which leads me to believe that the issue is not fixed on 1.5.9.

Of course, I might be totally wrong here since I don't know this codebase at all. Just trying to help figure out what's wrong because I really don't like not being able to be on the latest version of an SDK we use! 🙏🏼

@tristan-warner-smith
Copy link

tristan-warner-smith commented Apr 9, 2024

Hi @bsneed @alanjcharles, we're starting to see this hit us significantly across all tracked events after upgrading from 1.5.5 to 1.5.8. I can see there were no significant changes between 1.5.8 and 1.5.9 unless the privacy policy changes were blocking network traffic entirely.

This is the only relevant change over the time period shown below. A drop of nearly 75% in this particular event case.

Can you re-open this ticket / otherwise guide us on getting this looked at as a priority?
drop-in-events

@alanjcharles
Copy link
Collaborator

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events.

I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this.

FYI: you can add a breakpoint here to see your events being batched/sent.

@tristan-warner-smith
Copy link

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events.

I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this.

FYI: you can add a breakpoint here to see your events being batched/sent.

With this in mind we're putting out a hotfix bumping from 1.5.8 to 1.5.9 to see if it resolves what we're seeing. I'll let you know if we spot any change.

@xmollv
Copy link
Author

xmollv commented Apr 10, 2024

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events.
I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this.
FYI: you can add a breakpoint here to see your events being batched/sent.

With this in mind we're putting out a hotfix bumping from 1.5.8 to 1.5.9 to see if it resolves what we're seeing. I'll let you know if we spot any change.

Eager to see what comes out of this! We have started tested internally 1.5.9 but we don't have enough data to be able to tell if it's fixed or not. On Production we rolled back to 1.5.5 on the last release and the data seems to be back to normal. This is the graph that I posted on the original report right now, where the only change is rolling back Segment from 1.5.7 to 1.5.5.

image

@xmollv
Copy link
Author

xmollv commented May 28, 2024

I know y'all didn't make a big fuss of this, but things are still pretty broken for us. We were in 1.5.11 and I just had to roll back to 1.5.5 (again) because users were not showing up on Mixpanel (the identify calls had no properties). Running the same flows on 1.5.5 does indeed work (we do see the profiles created as expected on Mixpanel), which means after 1.5.5 something really bad is going on internally.

Leaving it here for anyone searching online that Segment is missing events.

@alanjcharles
Copy link
Collaborator

Hi @xmollv would you mind reaching out to [email protected] with the details you've shared here? They will be better able to assist in getting to the bottom of the issue you're experiencing and will be able to escalate any bugs resulting from the investigation to us so we can prioritize them accordingly. Thanks so much!

@xmollv
Copy link
Author

xmollv commented May 30, 2024

Hi @xmollv would you mind reaching out to [email protected] with the details you've shared here? They will be better able to assist in getting to the bottom of the issue you're experiencing and will be able to escalate any bugs resulting from the investigation to us so we can prioritize them accordingly. Thanks so much!

I already did it last time, and after two weeks of back and forth emails it ended up as 'we don't know what's wrong, so can't reproduce it, therefore we can't fix it'. I'm not going to do it again to get the same result. I think I've already done enough, if y'all don't want to prioritize this that's fine.

I told my team that I've pinned the dependency to 1.5.5 and it'll never be updated again until one of this two things happens:
a) There's a new feature that we want to use and it's available only on a newer SDK version.
b) The app stops compiling due to the SDK being so old that something broke along the way.

PS: This is the message I saw this morning when I logged into work from the person that's in charge of Analytics:

Just checked the data we got from today and it looks perfect. I looked at all users who viewed a wall and they also triggered sign in events.

The only change between it being broken and it being perfect is this:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants