Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding Dupes #1

Open
pedromorgan opened this issue Aug 28, 2016 · 5 comments
Open

Finding Dupes #1

pedromorgan opened this issue Aug 28, 2016 · 5 comments

Comments

@pedromorgan
Copy link
Contributor

There is a problem where we have raw logs etc

But also missing pieces, or an attempt to reinsert same data etc..

So question to @geoffmcl are..

  • can we assume that from cflog.pl the rows will have the same timestamp ?
  • otherwise we need to basically check that same lat/lon position not exists.. ?

Main thing here is timestamp and I assume thiat is from the last_updated json filed from cf.ffs ?

@geoffmcl
Copy link

@pedromorgan hi Pete, yes, you are correct, the timestamp, the 2nd last CSV field, update, is in fact the directly from the json last_updated field...

So to break the current CSV raw logs back into each json fetch, you can just switch on the update field... each json fetch should represent some 3 to about 54 flights...

Normally, each json block will be some 5.5 seconds since the last, except if there has been a break... but with supervisor re-running the script if it breaks, this should always be something less than 15-20 seconds later... no real problem...

And of course each CSV line commences with the UNIQUE Flight ID assigned by crossfeed...

And the static html pages, like http://htmlpreview.github.io/?https://github.com/fgx/crossfeed-dailies/blob/gh-pages/html/20160806.htm shows how I broke the logs into Models and Callsigns...

Of course the callsign can be a crazy string like ---! One day we should try to force some CALLSIGN standards for MP usage, but got some negative feedback the last time I suggested this, years back now...

And Theo has produced a great 3D global view of each CSV file here - http://fgx.github.io/sandbox/globe-crossfeed-replay/

Hope this answers your question here?

There are no duplicated records...

@pedromorgan
Copy link
Contributor Author

So the way its working atmo, is that the cvs is imported into a statgin database table.. and drom then processed into the various datatables, callsign, aicraft, flight (includes path as geom)

Check here https://pg.daffodil.uk.com/

SO that way, we can actually check for dupes (probably and unique index of fid+callsign+aircraft+ts)

@geoffmcl
Copy link

@pedromorgan your link requires authentication, so not sure what you are pointing to...

I do a similar thing in cfcsvlogs.pl, to pre-process the CSV log, first into FIDs - that is a unique set of flight records for that FID...

Naturally for any one FID there will also be ONE callsign, and one MODEL, repeated in each record, with a set of lat,lon,... over time... updates...

To re-state - there are no duplicate records... do not understand this idea of dupes!!!

@peteffs
Copy link

peteffs commented Aug 29, 2016

From my end.. I am banging the files into database.. even the same file
could go in twice == dupes.

On 28 Aug 2016 7:27 p.m., "Geoff McLane" [email protected] wrote:

@pedromorgan https://github.com/pedromorgan your link requires
authentication, so not sure what you are pointing to...

I do a similar thing in cfcsvlogs.pl
https://github.com/geoffmcl/scripts/blob/master/cfcsvlogs.pl, to
pre-process the CSV log, first into FIDs - that is a unique set of flight
records for that FID...

Naturally for any one FID there will also be ONE callsign, and one MODEL,
repeated in each record, with a set of lat,lon,... over time... updates...

To re-state - there are no duplicate records... do not understand this
idea of dupes!!!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFsCCeWvzuLXyhfuqlvk-JVIbAaqOgv9ks5qkdMkgaJpZM4Ju0S1
.

@geoffmcl
Copy link

@peteffs ok, well yes, if you are just banging files... even the same file more than once...

Then yes, the SAME fid+callsign+model+update would indicate a dupe ;=))

But you could just as easily keep track of CSV files done...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants