Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

cfinch/MH370_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data for Flight MH370

Aircraft Location Data
----------------------
To my knowledge, precise aircraft location data has not been released. If this
data is available, please add it!

Satellite Data
--------------
This data was extracted from the PDF file provided by the government of
Malaysia to the public at the URL shown in Reference 1. The process used to
extract the data is described below.

KNOWN ERRORS
The OCR software (tesseract) had some problems recognizing periods and colons
in the time stamps for the in-flight data. For some reason, the pre-flight data
and the Appendices did not have this issue.

NOTES
I split the first column (Time) into separate columns for Date and Time. Note
that the date format is day/month/year.

EXTRACTION PROCESS
Convert a multi-page PDF to a series of images:
gs -r300 -sDEVICE=pngmono -dTextAlphaBits=4 -o Pages/mh370_p%02d.png mh370.pdf 

To crop out headers and page numbers, first find the bounding box [2]:
gs -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 mh370.pdf 2>&1 | grep %%BoundingBox

Result in points (72p = 1 inch): %%BoundingBox: 42 51 768 544
lower left corner: (42, 51) upper right corner: (768, 544)
Need to multiply size by 720/72 because of 720dpi output

Produce a test PNG by cropping table header and page number:
gs -dNOPAUSE -dBATCH -sDEVICE=pnggray -r720 -dFirstPage=4 -dLastPage=4 -sOutputFile=test.png -g7690x3940 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf

gs -dNOPAUSE -dBATCH -sDEVICE=pnggray -r720 -dFirstPage=21 -dLastPage=21 -sOutputFile=test.png -g7690x4100 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf

Cropping for pages 4-20:
gs -dNOPAUSE -dBATCH -sDEVICE=pngmono -r720 -sOutputFile=Pages/mh370_p%02d.png -g7690x3940 -dFirstPage=1 -dLastPage=20 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf

Cropping for pages 21-41:
gs -dNOPAUSE -dBATCH -sDEVICE=pngmono -r720 -sOutputFile=Pages/mh370_p%02d.png -g7690x4100 -c "<</Install {0 -75 translate}>> setpagedevice" -f mh370.pdf

OCR to convert image to text:
tesseract mh370_p04.png text_p04 -psm 6 

REFERENCES
[1] http://www.dca.gov.my/mainpage/MH370%20Data%20Communication%20Logs.pdf
[2] http://stackoverflow.com/questions/12484353/how-to-crop-a-section-of-a-pdf-file-to-png-using-ghostscript
[3] https://code.google.com/p/tesseract-ocr/

About

Flight data for Malaysian Airlines Flight MH370

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages