MAGFILO: Manually Annotated GONG Filaments in H-Alpha Observations
MAGFiLO is one of the products of the MLEcoFi project, funded by NSF (#2209912, #2433781). This dataset represents the largest collection of manually annotated filaments from H-Alpha observations captured by the Global Oscillation Network Group (GONG). It is publicly accessible for the community and is ready to be used for training complex machine learning models to enhance our understanding of solar filaments.
10,244
Annotated
Filaments
958
Unique
H-Alpha
Observations
> Chirality <
> Polygon <
> Spine <
5' 20''
Annotation Time
per Filament
1,066
Person-Hours of
Annotations
Observation: 20150518145654Bh
Data Format
The annotations are structured into a COCO-style data format which is a JSON file containing dictionaries and lists as shown below:
info: corresponds to a dictionary containing metadata description of the dataset.
images: corresponds to a list of image dictionaries.
annotations: corresponds to a list of annotation dictionaries.
licenses: corresponds to a dictionary containing the license information of the NSO/GONG H-Alpha images.
categories: corresponds to a list of category dictionaries.
image: corresponds to a dictionary containing information about an annotated image, including its name and downloadable URL.
annotation: corresponds to a dictionary containing information about an annotated filament, including its segmentation and bounding box.
license: corresponds to a dictionary describing the license of an image.
category: corresponds to a dictionary containing the category information that each filament may be described with.
{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
"categories": [category],
}
info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}
license{
"id": int,
"name": str,
"url": str,
}
categories[
{
"id": int,
"name": str,
"supercategory": str,
}]
annotation{
"id": int str,
"image_id": int,
"category_id": int,
"segmentation": RLE or [[]],
"area": float,
"spine": [], # Added
"bbox": [x, y, width, height],
"iscrowd": 0 or 1,
}
image{
"id": int str,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str, "url": str,
"coco_url": str,
"date_captured": datetime,
}
Note
The strike-through text above shows the minor differences between the dataset and the COCO-style format.
A bounding box in annotation["bbox"] is represented with a list of 4 values, [x, y, width, height] , where (x,y) corresponds to the top-left corner of the box. All values are in pixels unit.
A segmentation in annotation["segmentation"] is a list of floats representing a closed path. Each path forms a polygon capturing a filament's shape. A polygon made with n points is represented as [x_0, y_0, x_1, y_1, ..., x_(n-1), y_(n-1)] where x_0 = x_(n-1) and y_0 = y_(n-1).
The area in annotation["area"] is the segmentation area (not the bbox area). This is computed using pycocoapi package, by first converting the polygon into RLE.
A spine in annotation["spine"] is a list of floats representing a path. Each path captures a filament's spine. Disconnected line segments are not allowed. A spine with n points is represented as [x_0, y_0, x_1, y_1, ..., x_(n-1), y_(n-1)] .
The tag annotation["iscrowd"] is always zero indicating that each segmentation corresponds to a single filaments (i.e., no group of filaments is annotated together).
Although annotation["segmentation"] is a list, it only contains one polygon. That is, each filament is annotated using a single-piece polygon.
The class labels listed in category["name"] are: "Left" (id = 1), "Right", (id = 2), "Unidentifiable" (id = 3), and "Ambiguous" (id = 4).
The image ids in image["id"] is a string that shows the annotator's batch name, as well as the name (together, unique). For example, id 010401-20160920230134Lh indicates that the image with name 20160920230134Lh.jpg is annotated by the annotator "010401", and it might have also been annotated by two other annotators, 010402 and 010403, in the same group.
The annotation ids in annotation["id"] is a string (e.g., a7639f8a-c76d-43a7-b392-e75842262b75) generated by the annotation platform, which is unique for filaments.
Observation: 20160604163414Mh
Analysis of Annotations
Here I will present stats (from PSQL and COCO API) about the annotation labor.
Filament Statistics
The final dataset contains 10,244 annotated filaments from 1,593 H-Alpha observations captured by the GONG network. For each annotated filament its polygon, spine, minimum bounding box, and chirality is identified. Our designed annotation pipeline permitted each observation to be annotated by up to three independent annotators. In the released dataset v1.0, a total of 958 unique observations are annotated from which 548 (57.20%) are annotated once, 185 (19.31%) are annotated twice, and 225 (23.49%) are annotated thrice. These observations are sampled from the years 2011 through 2022. The annotations are made of 3,128 (30.53%) left-chiral, 3,273 (31.95%) right-chiral, and 3,843 (37.51%) unidentifiable filaments.
Q & A
> What software did the team use for manual annotation?
We used Darwin V7 for manual annotation of filaments. Due to the growing demand for labeled data, there are a number annotation platforms out there, such as Label Studio, CVAT, LabelBox, SuperAnnotate, to name just a few.
> What image format was used for manual annotation?
The annotators were provided JPEG images of H-Alpha observations. As you may know, unlike the typical 8-bit images, pixel values in FITS observations range from 0 to 2^14. Therefore different transformation functions, such as log or squared, are used for viewing those images. To provide the annotators with a unified view, we converted the FITS files to JPEG format.
> How was the FITS-to-JPEG conversion done?
We used the fits2jpeg software implemented by William Cotton. This is the same conversion that is used by NSO to archive observations in JPEG format.
> What does a double-blind annotation process mean here?
The annotation process to create MAGFiLO was double-blind. That is, the annotators in each group worked independently so that they would not be impacted by the decisions made by the other annotators in the same group (working on the same set of images). Moreover, each annotation was independently reviewed and verified. That is, a reviewer did not know the decisions made by any other annotators working on the same batch. No annotator was assigned more than one batch from the same group. This ensured that no duplicate observations were annotated by a single annotator.
> What is the overall agreement among the annotators?
Since nearly half of the observations are annotated by more than one person (independently), we can quantify the overall quality of the assigned labels. To do so, we ran a cross-comparison analysis of the annotations using Cohen’s kappa score, which showed a substantial agreement, i.e. kappa = 0.66, among the annotators, where Left or Right chirality labels are assigned.
> What happened to the "bad" images?
In short, we removed those with significant issue. This was done in three steps: (1) automatic filtering, (2) manual filtering before images are assigned to annotators, (3) filtering by annotators if they came across any.
Our definition of "bad" (or "anomalous") observations encompasses only observations with significant issues, leaving the rest, such as mildly blur observations, to be considered "normal". Some of the typical issues which may render an observation anomalous, by our definition, are as follows: (1) the solar disk is un-centered, (2) some regions of the solar disk are over-saturated, (3) parts of an observation are obscured by clouds, (4) a placeholder image is used instead of an actual observation, (5) an observation is occluded by shadows of objects such as airplanes and transmission towers, (6) an observation is significantly blur. As per our estimate, 3-5\% of the GONG's H-Alpha observations fall into this definition.
> Why are some blur observations also annotated?
ML models trained for detection of filaments from a stream of H-Alpha observations naturally come across many blur images. Slight imperfections in the data ensure that trained models had exposure to such samples during their training phases.
> Are filaments annotated as "segmentation masks" or "polygons"?
The annotators used the Brush/Eraser tool to create a pixel-precise annotation. Their annotations were then stored as polygons (instead of binary masks) with as many nodes (corner points) as needed to maintain the original precision. We still call it a "segmentation" though, because a "polygon" annotation implies that the annotators created polygons by adding nodes, one at a time. Such polygons are much less precise than ours. To learn about the granularity of our polygons, see the plots above showing the distribution of Spine Dimension and Segmentation Dimension.
> Why are small barb-like features ("threads") also captured in segmentations, in addition to the prominent barbs?
While it might be difficult for multiple viewers to agree on which features qualify as "barbs" and which feature do not, our intention is to highlight the features we would like future ML models to focus on and learn from. Therefore, in addition to prominent barbs, our segmentations capture small barb-like features, a.k.a threads.
> Why aren't the annotators' labels aggregated?
While a filament's chirality might be considered as Unidentifiable by one annotator, another annotator may believe they see patterns indicating either Left or Right chirality. Being only equipped with H-Alpha images, it is not always possible to favor one vote over the other, even after careful examination of the observation, or by looking at the filament's evolution. Therefore, we refrain from aggregating the labels just because 2 annotators disagreed with the 3rd one.
This becomes more challenging when we come across cases like the one illustrated on the right. In this figure, one or more filaments (top) are annotated by Annotator 1 as a single filament with a Right chirality (middle), whereas it is labeled as two distinct filaments by Annotator 2, one with an Unidentifiable chirality and the other with a Right chirality (bottom). Annotator 1 has used GONG H-Alpha Viewer to investigate whether the two pieces belong to a single filament or not, whereas Annotator 2 relied mostly on what is visible in this very observation. Such decisions are subjective and intrinsic to how individuals may perceive 3D filaments in 2D images.
Figure. The figure shows one or more filaments and two different ways that two annotators annotated them. The methods differ in the number of filaments annotated as well as the identified chirality. (Original observation)
> What else the annotators used to identify filaments' chirality?
To help the annotators in the process, we developed a web application, named the GONG H-Alpha Viewer, for searching through the GONG archive of H-Alpha observations. Using this app, the annotators could see a filament's evolution in time. Of course, this is very time consuming and they only used this platform for "larger" filaments whose chirality was not clear in the given observation but with some extra effort it could be identified.
> Can I use the web app that the annotators used to check the evolution of filaments?
Yes. This web app is made publicly available. Visit https://dmlab.cs.gsu.edu/MLEco/GONGHAlphaViewer/.
> Are there any incorrect labels in this dataset?
Most likely. It is important to note that it is not possible to annotate complex phenomena such as solar filaments without significant simplifications and compromises. Our manual annotation efforts, despite employing best practices, should be seen in that light.
Cite this work!
If you use our dataset in your work, we kindly ask that you cite it. Proper citation helps give credit to the effort behind creating the dataset and supports the continued sharing of valuable data for the research community. Thank you!
Cite the paper:
Ahmadzadeh, A., Adhyapak, R., Chaurasiya, K. et al. A dataset of manually annotated filaments from H-alpha observations. Sci Data 11, 1031 (2024). https://doi.org/10.1038/s41597-024-03876-y
@Article{Ahmadzadeh2024magfilo,
author={Ahmadzadeh, Azim and Adhyapak, Rohan and Chaurasiya, Kartik and Nagubandi, Laxmi Alekhya and Aparna, V. and Martens, Petrus C. and Pevtsov, Alexei and Bertello, Luca and Pevtsov, Alexander and Douglas, Naomi and McDonald, Samuel and Bawa, Apaar and Kang, Eugene and Wu, Riley and Kempton, Dustin J. and Abdelkarem, Aya and Copeland, Patrick M. and Seelamneni, Sri Harsha},
title={A dataset of manually annotated filaments from H-alpha observations},
journal={Scientific Data},
year={2024},
month={Sep},
day={27},
volume={11},
number={1},
pages={1031},
issn={2052-4463},
doi={10.1038/s41597-024-03876-y},
url={https://doi.org/10.1038/s41597-024-03876-y}}
Cite the dataset:
@data{ahmadzadeh2024magfilodata,
author = {Ahmadzadeh, Azim and Adhyapak, Rohan and Chaurasiya, Kartik and Nagubandi, Laxmi Alekhya and Aparna, V. and Martens, Petrus C. and Pevtsov, Alexei and Bertello, Luca and Pevtsov, Alexander and Douglas, Naomi and McDonald, Samuel and Bawa, Apaar and Kang, Eugene and Wu, Riley and Kempton, Dustin J. and Abdelkarem, Aya and Copeland, Patrick M. and Seelamneni, Sri Harsha},
publisher = {Harvard Dataverse},
title = {{MAGFILO: Manually Annotated GONG Filaments in H-Alpha Observations}},
year = {2024},
version = {V1},
doi = {10.7910/DVN/J6JNVK},
url = {https://doi.org/10.7910/DVN/J6JNVK}
}
Ahmadzadeh, Azim; Adhyapak, Rohan; Chaurasiya, Kartik; Nagubandi, Laxmi Alekhya; Aparna, V.; Martens, Petrus C.; Pevtsov, Alexei; Bertello, Luca; Pevtsov, Alexander; Douglas, Naomi; McDonald, Samuel; Bawa, Apaar; Kang, Eugene; Wu, Riley; Kempton, Dustin J.; Abdelkarem, Aya; Copeland, Patrick M.; Seelamneni, Sri Harsha, 2024, "MAGFILO: Manually Annotated GONG Filaments in H-Alpha Observations", https://doi.org/10.7910/DVN/J6JNVK, Harvard Dataverse, V1
The Annotators
We are deeply grateful to everyone who enthusiastically contributed to the creation of this dataset. In particular, we extend our heartfelt appreciation to those who undertook the challenging task of manually annotating the filaments. Your meticulous efforts are invaluable and greatly appreciated.
Naomi Douglas | Eugene Kang | Riley Wu | Rijul Mehta | Aya Abdelkarem | Guranggad Singh | Tony Vargas-Miguel | Wei Fan Wang | Nidhi Mahajan | Humayra Mahmood | Niruthiya Narashiman Srinivasan | Justin Heyer | Olivia Allen | Rahul Chawla | Hung Nguyen | Raya Deb | Lucy Hopwood | Mukta Deshmukh | Allen Dasari | Kayla Thornton | Sammam Zaman | cristal cervantes | Valencia Jenkins | Chioma Nwokedi | Tran Ha | Hoangyen Nguyen | Muhammad Abdullah Nasir | Zainab Sirajo | Suzal Regmi | Edward Abel-Guobadia | Aditya Biyani | Maris Ashu | Diyorahon Rakhimova | Zakiyyah Saleem | Pramit Bhatia | Nathnael Damte | Brendan Krafty | Katelyn Vercher | Aaron Channer
Report an Issue!
We do appreciate issues being reported to us so that we can improve the quality of the current dataset in future releases.
Please email any issues to Dr. Azim Ahmadzadeh at ahmadzadeh@umsl.edu.