-
-
Notifications
You must be signed in to change notification settings - Fork 591
GSoC 2024 Piyush Sharma
- Name: Piyush Sharma
- University: IIT Roorkee
- Major: Integrated BS-MS in Mathematics And Computing
- Timezone: IST (UTC +5:30)
- Contact Email:
[email protected]
- Github Id:
ViciousEagle03
- Element.io Username: VEagle
- OS: Ubuntu 22.04
- Editor: VS Code
I am a second-year undergraduate student at the Indian Institute of Technology, Roorkee
, majoring in
Mathematics and Computing. I have been programming for the past three years, primarily using
Python, along with proficiency in Java, C/C++, and JavaScript. Python has been instrumental in my
development, and I'm comfortable with frameworks like Flask and Django.
I have a few Python projects hosted on my GitHub
account, which I believe reflect my ongoing journey
and dedication to the language. [Relevant Project]
- https://github.com/astropy/astropy/pull/15985 [Merged]
- https://github.com/sunpy/sunpy/pull/7453 [Merged]
- https://github.com/sunpy/ndcube/pull/666 [Open]
- Assisted with issue #663 in sunpy/ndcube repository
The ndcube package is a SunPy-affiliated package that provides a collection of objects for representing and manipulating data and world coordinates simultaneously. As of now, ndcube doesn't support saving an NDCube object to a file and load an NDCube from a file.
This document proposes to add the feature of saving any NDCube object to an ASDF file format and then be able to load the NDCube object back in
. ASDF is chosen as the file format as it is well
suited to accommodating the complex metadata requirements of modern scientific data. Unlike FITS,
ASDF offers a hybrid text and binary format that addresses these challenges by providing a hierarchical
metadata structure, human-editable content, and efficient storage of binary data. Also, ASDF design
allows for the adoption of new conventions for storing complex metadata in a highly structured
manner. Incorporating ASDF serialization into NDCube will enhance its flexibility and compatibility,
enabling the storage of richer metadata structures.
Efforts were made to implement a save method for NDCube objects in the issue #111
, aiming to write them to FITS files. However, the APE-14 API did not provide a generalized way to serialize gWCS to FITS and get the same gWCS object when we deserialize. Instead, serialization of gWCS to the ASDF format
was proposed as a solution. This involved developing a method to serialize FITS WCS objects to ASDF, as ASDF was identified as the primary serialization option for gWCS.
There were discussion regarding the serialization of FITS WCS objects to ASDF in issue #9735
. Initial attempts to add serialization support were made in the astropy.io.misc.asdf module. However, since the module will be removed
in future versions of Astropy, this proposal aims to incorporate serialization support through the asdf-astropy
plugin instead and aims to address the limitations
imposed by the APE-14 API and enable the serialization of NDCube objects with gWCS and astropy.wcs.WCS representation to a more flexible and compatible format.
The document outlines the objectives in the subsequent sections:
The primary objective initially is to enable the serialization of basic attributes to ASDF, where .data
is a
numpy array and .wcs
is a gwcs.WCS object. Converters for the NDCube class will be defined to
achieve this goal, implementing the to_yaml_tree
method, which takes an NDCube object and returns a YAML representation of it. Additionally, a tag URI will be designated to the converter class to identify
the specific object type it handles. The resulting YAML representation will be validated by a schema designed for the basic NDCube object.
The schema will serve to validate and standardize the structure of an NDCube object when stored in a
YAML file. The structure of the schema would need to comply with the JSON Schema Draft 4
. To ensure the schema designed for NDCube is valid and adheres to the ASDF schema specifications I would add a custom meta-schema. To uniquely identify the schema I would assign an ID to it and provide a title and description to document its purpose and content. To associate the type of object with the schema, the
type validators need to be set to NDCube. The properties validator will initially contain the basic
attributes of NDCube, but it will be updated to account for extra properties and metadata as we
progress with the project.
Leveraging existing Converters for converting gwcs.WCS objects to YAML format, the .wcs
attribute can
be passed as a nested complex object in the to_yaml_tree
method. Similarly, a tag will be assigned to
the schema to uniquely identify it during serialization. To support deserialization I would implement
the from_yaml_tree
method to reconstruct an NDCube object with the passed YAML data.
Now, for serialization of NDCube objects to ASDF, it would require the compulsory attributes .data
and
.wcs
. However, I will extend this support to include additional world coordinates associated with pixel
axes governed by the ExtraCoords class and to incorporate extra information related to the NDCube
object, governed by the GlobalCoords class. Thus, I will define the Converter class for these as well and
implement the to_yaml_tree
and from_yaml_tree
methods to convert the ExtraCoords and GlobalCoords
instances into a YAML file, supporting the construction of these objects back from a valid YAML file,
respectively. The schema for the NDCube object would then be adjusted to include optional properties
for ExtraCoords and GlobalCoords.
To associate each Converter class with the serialization process we would need to register the Converter Classes with the ASDF using the extension URIs.
Unlike gWCS objects, astropy.wcs.WCS does not support serialization to asdf, to enable the serialization, I would implement the necessary methods in the Converter class and ensure proper schema design within the asdf-astropy package to add the support.
We would also need to designate URI tags to both the Converter Class and schema design, this would help to distinguish between astropy.wcs.WCS object and gwcs.WCS object during the serialization process of the NDCube object and the correct Converter to associate with the associated WCS object. This part of the project aims to broaden the scope of serialization support to include astropy.wcs.WCS class and other relevant WCS wrapper classes.
Now the next phase would be to enable the manipulated NDCube object to be able to be saved to ASDF format and this would require converters to be written for various WCS wrapper classes, that is, SlicedLowLevelWCS, ResampledLowLevelWCS, CompoundLowLevelWCS, and ReorderedLowLevelWCS to ensure when an NDCube object is manipulated or transformed then the serialization process takes into account of the changes and accurately save the object to ASDF file. The extension class will need to be written to incorporate this serialization logic into the ASDF ecosystem.
I would then restructure the schema of the NDCube object to take into account the added support for various types of WCS objects. The test suite would then be expanded to cover a diverse range of scenarios, ensuring comprehensive support for different WCS objects.
Once the basic serialization of the ndcube.NDCube object with various WCS representations is implemented, the next step would involve extending the serialization process to include optional properties/metadata like mask, uncertainty, and PSF. A well-structured Schema and a Converter must
be added for the astropy.nddata.NDUncertainty class within asdf_astropy
to preserve the metadata
during serialization. After registering the Converter, I would update the schema for NDCube to include the properties of metadata as optional properties. This development will enable NDCube to efficiently
handle complex metadata configurations, enabling accurate conversion to ASDF format.
NDCubeSequence, a class for handling multiple contiguous NDCube objects, and NDCollection, a container class for unordered grouping of NDCube or NDCubeSequence instances. Both of the classes depend on NDCube instances. Leveraging the already developed serialization methods for ndcube.NDcube objects, Converters for NDCubeSequence, and NDCollection would utilize compound serialization. Additionally, Schema definitions for these classes would reference the schema of the NDCube class, ensuring consistency within the serialization framework. After this, the converters will be registered to the ASDF library. Extend the existing test suite for NDCube to cover NDCollection and NDCubeSequence classes.
For clarity and maintainability, I would be in parallel documenting, and detailing the Converter Classes and schema design, and tags and URIs associated with each corresponding Converter Classes would be written along with tests to validate the functionality of the implementation.
During the community bonding period, having developed a fair bit of knowledge of asdf serialization I aim to contribute to solving some issues within the asdf-astropy
repository to help me get acquainted with the working knowledge of Schema design and Converters. I would discuss with the mentors if any change needs to be made to the proposal, regarding the timeline and weekly project goals. I will be taking my end-term examination till 8th May, so I will be fairly busy until then. I plan to start working on
the initial phase of the project and develop a prototype of Schema and Converter Class for the ndcube.NDCube object to discuss the prototype and get feedback on whether the prototype aligns with the project requirements. I will also try to complete my pending [PR] by the end of the Community Bonding Period.
To ensure efficient implementation I would try to commit every subsequent change I make to the WIP PR to keep my work on track and provide visibility for mentors to offer guidance if I veer off course from my goals.
- Discuss the structure of the schema design of NDCube with its .wcs attribute as gwcs.WCS.
- Make a WIP PR to the ndcube repository to add the support of serialization to ASDF.
- Implement the Converter for the NDCube class.
- Documentation for the same.
- Design the schema for GlobalCoords and ExtraCoords classes.
- Write Converters for both the classes.
- Make schemas installable as ASDF library resources for easy access.
- Implement the Extension class to register the Converters.
- Documentation and tests for the same.
- Review and refine the schemas for NDCube, GlobalCoords, and ExtraCoords.
- Further testing to ensure smooth interaction within the ASDF ecosystem.
- Document the implemented Converters and schema.
- Ensure the complete implementation of the ASDF extension infrastructure to the ndcube package.
- Test serialization and deserialization of NDCube objects using ASDF.
- Extend the existing test suite with additional examples of saving and loading NDCube objects backed by different gwcs.WCS objects.
- Discuss with mentors regarding the existing ASDF extension framework, and whether any enhancement or adjustment is needed to optimize the framework's efficiency.
- Begin designing the schema and implement the Converters for astropy.wcs.WCS within the
asdf-astropy
package. - Implement the Extension class to register the Converter written within asdf-astropy.
- Complete any remaining task before 1st evaluation.
- Complete any remaining task before 1st evaluation.
- Design the schema and implement Converters for the remaining WCS wrapper classes, SlicedLowLevelWCS, ResampledLowLevelWCS, CompoundLowLevelWCS, and ReorderedLowLevelWCS classes.
- Documentation and test for the same.
- Discuss with mentors regarding the progress and address any issues raised during the midterm evaluation.
- Get the PR merged for ndcube to support the existing serialization of NDCube objects.
- Implement the Extension class to register the converters for all the WCS wrapper classes.
- Document the support enabling the saving of manipulated NDCube objects to ASDF format.
- Test serialization and deserialization of manipulated NDCube objects using ASDF.
- Buffer week for leftover tasks. Week 10 (July 29th - August 4th)
- Implement Converters and schemas for the astropy.data.NDUncertainity class in asdf-astropy.
- Documentation and tests for the same.
- Write the Extension class to register the Converters.
- Test serialization and deserialization of NDCube object with optional properties set.
- Implement Converters and schemas for the astropy.data.NDUncertainity class in
asdf-astropy
. - Documentation and tests for the same.
- Write the Extension class to register the Converters.
- Test serialization and deserialization of NDCube object with optional properties set.
- Design the schema and implement the Converters for NDCubeSequence and NDCollection.
- Documentation and tests for the same.
- Implement the Extension classes to register the Converters.
- Test serialization and deserialization of NDCubeSequence and NDCollection instances.
- Write up documentation for remaining tasks.
- Write documentation about the usage of the project.
- Ensure all the documentation related to the project is properly organized.
- Get feedback on the overall project from the mentors.
- Complete any leftover task.
No
No, I am solely applying for this project within this organization.
My end-term examinations are scheduled from May 1st to May 8th. However, since this period aligns with the Community Bonding phase, I don’t think this will be an issue as the coding period starts on May 26th. From May 8th to July 20th, I will have college vacations, so I would easily be able to dedicate 40 hours/week to the project given its medium-sized(175 Hours) scope, also I do not have any major commitments during my summer vacations. Even after classes resume on July 21st, I am confident that I will still be able to allocate over 35 hours per week, as there will be no examinations or major commitments.
Yes, I am eligible to receive payment from Google.