Introducing Sources and Samples

In this tutorial, I will explain the basic concepts around which XGA is designed, and why its been set up this way. There won’t be much detail on how to use the module for any kind of analysis, but by the end you should have an understanding of how to get started defining sources, then in the next tutorials we can actually start to analyse them.

from astropy.units import Quantity
import numpy as np
import pandas as pd

# Here we import various types of source class from XGA
from xga.sources import BaseSource, NullSource, ExtendedSource, PointSource, GalaxyCluster
# And here we import different of sample class
from xga.samples.extended import ClusterSample

What are sources?

XGA revolves around ‘source’ objects, which are representative of X-ray sources in real life. These are at the heart of any analysis performed with XGA, and just as there are different types of objects that emit X-rays, there are different types of source object built into XGA. We make distinctions between the different types of source due to the different information they can require for their analysis (clusters need overdensity radii for instance, whereas that isn’t a useful concept for an AGN). Different source classes also have some different procedures and methods built into them, as we often wish to measure different things for different types of source.

At their most basic, all that is required to define a source is a position on the sky. The first time XGA is run on a new system, it makes an ‘observation census’ of the data that you have pointed it to, and finds what observations are available and what their pointing coordinates are; when a new source is defined XGA searches through the census to check whether there are data available for the given coordinates, and if there are relevant observations then they are ‘associated’ with the source object. An observation will be associated with a source if the aimpoint of the observation is within 30 arcminutes of the source coordinates, though some source classes support further cleaning steps to remove observations that don’t cover the entire object.

This approach means that the user doesn’t have to directly deal with data if they don’t want to, XGA will fetch all available data by itself. When it comes to actually analysing and measuring quantities from the source, all the data will be used, not just single observations.

As we ask you to supply region files to XGA in the configuration file, the module can be aware of where other detected sources are in the data its chosen, that allows it to define any ‘interloper’ sources that have to be excluded from spectrum generation and photometric analysis.

What types of source are there?

  • BaseSource - The superclass for all the other source classes, and the simplest of them all, there are very few circumstances where this class should be initialised by users. BaseSource only needs an RA and Dec to be initialised, and if a name is not supplied by the user then one will be generated from those coordinates.

  • NullSource - This class of source is an exception to XGA’s design philosophy that an XGA source represents a real X-ray emitting object. By default NullSource associated every available ObsID with itself (though you may specify which ObsIDs to associate with it), and as such shouldn’t be used for astrophysical analysis. This class of source should only be used for bulk generation of products such as images and exposure maps.

  • ExtendedSource - This is a general class for extended X-ray sources, it is also the superclass of the GalaxyCluster class. XGA will attempt to find a matching extended source from the supplied region files, and if it does then that region will be used for any analysis. The user may also supply a custom circular region in which to analyse the object. Unless it is told not to, XGA will also attempt to find the X-ray peak of this extended object.

  • GalaxyCluster - This class is specifically for the analysis of Galaxy Clusters, and is a subclass of ExtendedSource. Defining an instance of this class requires a redshift to be passed, as well as at least one overdensity radius ( R\(_{200}\), R\(_{500}\), and R\(_{2500}\) are supported). Also supports passing weak lensing mass and richness values, for use in multi-wavelength analyses. Point sources close to the centre of the cluster will not be removed, as they could be a misidentified cool core, please see the _source_type_match method of the GalaxyCluster class for more information.

  • PointSource - Similar to the ExtendedSource class in that this is a superclass for more specific point source classes. There are no methods in this class to produce radial plots for instance, as for point-like sources the ideal of a radial profile has very little meaning. When a PointSouce is declared, an attempt will be made to match to a point source in region files, if they are supplied.

If you would like a more specific class implemented for the type of object you’re working on, please get in contact with me and I will see what I can do.

What are samples?

An XGA sample is a group of the same type of object that we wish to analyse as a population; for instance you might want to analyse multiple Galaxy Clusters and derive a scaling relation from them.

There is a secondary benefit to using a sample object rather than multiple Source objects, a sample can be passed into any function that will accept a source, and the function will perform its job on every source in it. This not only makes writing your code easier and cleaner, but can also be more efficient when running SAS and XSPEC, as XGA will run any such jobs in parallel, rather than you having to run them sequentially in a loop for instance.

When a sample is declared (other than the BaseSample class), it will make sure that images and exposure maps for all associated observations are generated for all constinuent sources exist, and if they don’t then they will be generated.

What types of sample are there?

These mostly mirror the types of source that are present in XGA. Specific sample types can have properties or methods unique to that type of astrophysical object.

  • BaseSample - The superclass for all the other sample classes, there are very few circumstances where this class should be initialised by users. All a BaseSample requires to be instantiated are two numpy arrays, containing RA and Dec values. Arrays of names and redshifts may also be supplied (though names supplied to sample definitions must be unique).

  • PointSample - For a population of some generic type of point source. Again only RA and Dec values have to be supplied, though redshift information can be provided if available. As this is a general point source class, no methods to generate scaling relations have been provided.

  • ClusterSample - For a population of Galaxy Clusters. Just as with the GalaxyCluster source class, here we require that redshift and overdensity radius information be provided on declaration. Many convenient features have been added to this sample class, for instance you can retrieve temperatures of all clusters in the sample (if measured) using a ClusterSample method. You can also easily generate common scaling relations by calling methods of the ClusterSample class, using several different fitting methods.

Defining your first source

Here I demonstrate just how simple it is to define a PointSource object, all I’ve done is to supply the Right Ascension and Declination of Castor, a famous sextuple star system that emits in X-ray. All coordinates used with XGA must be passed as decimal degrees, sexagesimal coordinates are not supported by this module.

PointSource also accepts various other keyword arguments that you may wish to change from defaults, please see this documentation for a full list. A particularly useful keyword argument is cosmology, which allows you to pass an Astropy cosmology object, which will then be used in all aspects of analysis; the default cosmology is currently Planck15 - this ability to set the cosmology is present in all source and sample objects, and is used throughout any analysis done with that source/sample.

demo_src = PointSource(113.65833, 31.87083, name='Castor')
Generating products of type(s) ccf: 100%|██████████| 3/3 [00:18<00:00,  6.05s/it]

We can see from the progress bar above that XGA has detected that no appropriate XMM calibration files were available for the data associated with Castor, and so it used its SAS interface to automatically create the necessary files. It will also have automatically generated combined images and exposure maps, using all available data.

We can now use the info() method to see a summary of the information we have about this particular source. You will notice that XGA has used the input coordinates to find an nH value (using a HEASoft tool). You may also notice that a custom region radius of 0.01 degrees has been used to calculate a SNR value, this is the default region radius for the PointSource class, and may be changed using a keyword argument when the PointSource is defined.


Source Name - Castor
User Coordinates - (113.65833, 31.87083) degrees
X-ray Peak - (113.65833, 31.87083) degrees
nH - 0.0446 1e+22 / cm2
XMM ObsIDs - 3
PN Observations - 3
MOS1 Observations - 3
MOS2 Observations - 3
On-Axis - 3
With regions - 3
Total regions - 200
Obs with one match - 3
Obs with >1 matches - 0
Images associated - 18
Exposure maps associated - 18
Combined Ratemaps associated - 1
Spectra associated - 0

A lot of information is stored in all source objects, and I would advise you look at the BaseSource API documentation (or use the dir() command on any source object) to explore what it can tell you.

Here I demonstrate how easy it is to retrieve simple information such as the hydrogen column density at the source coordinates, which ObsIDs are associated with the source, and which ObsIDs are considered ‘on-axis’ observations. The instruments property provides a dictionary with the associated ObsIDs as keys, and the instruments associated with them as values; this is necessary because we cannot take for granted that all observations have data from all cameras.

# This property returns an astropy quantity with the hydrogen column density

# This property is just the ObsIDs associated with the source

# However this property returns a dictionary of ObsIDs and which of their instruments are valid

# And finally we can easily see which observations are considered on-axis
0.0446 1e+22 / cm2
['0123710201', '0123710101', '0112880801']
{'0123710201': ['pn', 'mos1', 'mos2'], '0123710101': ['pn', 'mos1', 'mos2'], '0112880801': ['pn', 'mos1', 'mos2']}
['0123710201', '0123710101', '0112880801']

Defining your first sample

This is a simple demonstration of how you can define a sample of GalaxyClusters, with four clusters from the XCS-SDSS sample (Giles et al. (in prep)).

First I create a Pandas dataframe, simply because its a convenient way to store the initial sample data (you don’t have to use it), and because I often read in samples using Pandas. Then it is as simple as passing the different columns into the ClusterSample class. Note that the radii are supplied as Astropy quantities - I use quantities throughout XGA, most values that have a unit will be one.

column_names = ['name', 'ra', 'dec', 'z', 'r500', 'r200', 'richness', 'richness_err']
cluster_data = np.array([['XCSSDSS-124', 0.80057775, -6.0918182, 0.251, 1220.11, 1777.06, 109.55, 4.49],
                         ['XCSSDSS-2789', 0.95553986, 2.068019, 0.11, 1039.14, 1519.79, 38.90, 2.83],
                         ['XCSSDSS-290', 2.7226392, 29.161021, 0.338, 935.58, 1359.37, 105.10, 5.99],
                         ['XCSSDSS-134', 4.9083898, 3.6098177, 0.273, 1157.04, 1684.15, 108.60, 4.79]])

sample_df = pd.DataFrame(data=cluster_data, columns=column_names)
sample_df[['ra', 'dec', 'z', 'r500', 'r200', 'richness', 'richness_err']] = \
    sample_df[['ra', 'dec', 'z', 'r500', 'r200', 'richness', 'richness_err']].astype(float)

name ra dec z r500 r200 richness richness_err
0 XCSSDSS-124 0.800578 -6.091818 0.251 1220.11 1777.06 109.55 4.49
1 XCSSDSS-2789 0.955540 2.068019 0.110 1039.14 1519.79 38.90 2.83
2 XCSSDSS-290 2.722639 29.161021 0.338 935.58 1359.37 105.10 5.99
3 XCSSDSS-134 4.908390 3.609818 0.273 1157.04 1684.15 108.60 4.79

Just as with the definition of the PointSource object, there are many keyword arguments that can be supplied here, and I recommend examining the documentation to see whether you need any of the other options.

It is not necessary to supply two different overdensity radii (as I have done here), but can be very useful. The richness information I passed in is also not needed to define the sample. Remember that you must pass redshift information to define GalaxyCluster objects, and as such redshift information is required for ClusterSample objects as well.

Note that this ClusterSample definition generates some images and exposure maps - those are combined images and exposure maps, and they have to exist for the individual GalaxyCluster objects to perform peak finding on the data.

demo_smp = ClusterSample(sample_df["ra"].values, sample_df["dec"].values, sample_df["z"].values,
                         sample_df["name"].values, r200=Quantity(sample_df["r200"].values, "kpc"),
                         r500=Quantity(sample_df["r500"].values, 'kpc'), richness=sample_df['richness'].values,
Declaring BaseSource Sample: 100%|██████████| 4/4 [00:02<00:00,  1.93it/s]
Generating products of type(s) ccf: 100%|██████████| 4/4 [00:18<00:00,  4.54s/it]
Generating products of type(s) image: 100%|██████████| 4/4 [00:02<00:00,  1.61it/s]
Generating products of type(s) expmap: 100%|██████████| 4/4 [00:00<00:00,  6.38it/s]
Setting up Galaxy Clusters: 100%|██████████| 4/4 [00:06<00:00,  1.58s/it]

All sample classes have an info() method, just like sources, though not as much information is included as in the source info() methods. It is simple to retrieve properties for all sources, such as name and redshift.



Number of Sources - 4
Redshift Information - True

['XCSSDSS-124' 'XCSSDSS-2789' 'XCSSDSS-290' 'XCSSDSS-134']
[0.251 0.11  0.338 0.273]

Interacting with a source object in a sample

Just as with many Python objects (lists, dictionaries, etc.), a sample can be indexed to retrieve individual elements from the whole (in this case individual source objects). What is slightly different about XGA sample objects is that you may use an integer value or a name to retrieve the specific source object you want:

# Looking at the first source stored in the sample
chosen_src = demo_smp[0]
# And printing its name

# Now showing that the name of a source can also be used to retrieve the object
chosen_src = demo_smp['XCSSDSS-124']

Removing an ObsID from a source object

It is possible that you may want to remove an ObsID from a source object, and as such throw away the data associated with that ObsID. In XGA this is called disassociating, and you can choose to remove as many ObsIDs as you like, or even individual instruments from an ObsID.

You may pass a string if you just wish to remove one ObsID, or a dictionary if you wish to be more precise.

# This removes all reference to observation 0123710201 from the source object

# This, however, removes only the MOS1 and MOS2 data from observation 0112880801
demo_src.disassociate_obs({'0112880801': ['mos1', 'mos2']})

Looking at the source summary and instruments property again to confirm that we have removed data.



Source Name - Castor
User Coordinates - (113.65833, 31.87083) degrees
X-ray Peak - (113.65833, 31.87083) degrees
nH - 0.0446 1e+22 / cm2
XMM ObsIDs - 2
PN Observations - 2
MOS1 Observations - 1
MOS2 Observations - 1
On-Axis - 2
With regions - 2
Total regions - 170
Obs with one match - 2
Obs with >1 matches - 0
Images associated - 8
Exposure maps associated - 8
Combined Ratemaps associated - 0
Spectra associated - 0

{'0123710101': ['pn', 'mos1', 'mos2'], '0112880801': ['pn']}

Removing a source from a sample

You may also wish to remove a source from a sample, which is even easier than removing observations from sources. We can simply use the Python ‘del’ operator, identifying the source to be removed either with its name, or with its index in the sample object.

# Here we remove a source using its name
del demo_smp['XCSSDSS-124']

# But we can also remove a source just using an index
del demo_smp[2]

And if we look again at the sources included in this sample, we can see two have been removed



Number of Sources - 2
Redshift Information - True

['XCSSDSS-2789' 'XCSSDSS-290']
[0.11  0.338]