Mark Taylor
December 2015
These are notes for the Advanced TOPCAT tutorial at the ASTERICS VO school at ESAC, December 2015. It covers use of the desktop GUI tool TOPCAT and its command-line evil twin STILTS.
These tools are designed to do things with tabular data - typically source catalogues. They don't do science for you, but they let you do the mechanical manipulation of tables that you need to do to understand their science content in detail.
TOPCAT and STILTS can do basically the same things, but are used in different ways. TOPCAT is easier to learn, and good for interactive use, especially exploring data to get a feel for what's there. For production work, it is sometimes better to move on to STILTS, which has a steeper learning curve but can be scripted for repeated or reproducible work. STILTS can also, for some purposes, be used for larger datasets.
Most links in the document are to:
These manuals contain much more detailed reference documentation, so should be consulted for more information.Some of the data files used below are:
TOPCAT/STILTS like FITS files. They can work with some other formats (CSV, VOTable, IPAC, a few others) but they are slower. More details here. f you're going to work with a table (especially a large one) a lot it's a good idea to convert it to FITS first.
stilts tpipe in=xxx.csv ifmt=csv out=xxx.fits
VOTable is good for storing metadata - column descriptions, units, UCDs, history information etc. FITS isn't. TOPCAT uses a home-made FITS variant it calls FITS-plus that has the best of both worlds. You don't need to worry about the details, but if you use topcat/stilts to convert from VOTable to FITS, the metadata isn't lost.
There are various ways to get data from the VO. Some are more straightforward than others.
with Cone Selection
SELECT ... FROM table WHERE 1=CONTAINS(POINT(..),CIRCLE(..))
SELECT ... FROM table WHERE ...
VO|VizieR CatalogueService menupleiades"
in the Object Name field2" (degrees) in the Radius fieldTycho-2 (they're alphabetical)
and click on itI_259_tyc2)
as "tycho" by typing in the Label
field of the main control window with that table selected.Here we plot data in one parameter space, identify the points in a subregion, and see where those fall in a different parameter space. First we do this using topcat windows, then, using SAMP, between topcat and another application.
Linked views in TOPCAT:
VTmag-BTmag,
Y=VTmag.
Use the Axes control
control to flip the Y axis.
window,
and plot proper motions:
X=pmRA,
Y=pmDE.
clicking on points highlights the corresponding row and vice versa.
, drag out the cluster region, and then click
the same button
again).
The New Subset window will pop up;
choose a name (e.g. "comoving") and hit
Add Subset.
from the
main control window toolbar. You can see the new subset.
You can use this for other things.Many VO tools can communicate with each other using Simple Application Messaging Protocol (SAMP). A SAMP Hub process needs to be running on the desktop. Usually, this just works, since topcat and other tools start such a hub when they start up.
Here is an example of it in operation:
pleiades" in the Location field,
and zoom in.
,
select the comoving subset, and use the
Interop|Send Subset To...|Aladin menu item.
Positons of the comoving subset are highlighted in Aladin.TAP is the Table Access Protocol. It lets you perform queries against a remote relational database. In TOPCAT, you write queries in ADQL (Astronomical Data Query Language). This is essentially a dialect of SQL.
TOPCAT gives you the
TAP Load Window.
Open it using the
button on the
main toolbar; it's also available from the VO menu
or the Load window.
It's a complicated window, because it does a lot of stuff. It's organised in several tabs, and some of the tabbed parts have tabs of their own.
The most important jobs it lets you do are:
Use the Select Service tab (this is visible by default when you open the TAP window). The job of this tab is to let you choose a service to talk to. (If you try to open the Use Service now it won't let you, because you haven't chosen one).
You should see a list of all known TAP services (this is all the registered TAP services; some others for non-public use may exist as well). If you don't see that after a short delay ... there's probably a network or registry error. The services are listed in order of the number of tables they contain (this number is in brackets after the name).
If you know what service you want to use, you can browse the list and just click its name.
More often, you want to search for a given data set. To search for (e.g.) the RAVE survey:
GAVO DC TAP (3/142)"
means the service "GAVO DC" has 142 tables altogether, and 3 of them
(maybe) have something to do with RAVE.
rave.main) then click on that, or the service name,
to tell topcat this service is the one you want to use.
Note: not all TAP services are equal. This GAVO one (and some others using the same software) generally works very well, and supports many optional features like table upload and examples. Some other services may work less smoothly. Don't be too surprised if some things break. Hopefully, this will improve in the future...
When you open the Use Service tab, after a short delay, you should see on the left of the window a list of tables in the service's database. The top level of this tree shows you Schemas; these are just an organisational level that (sometimes) groups tables into related sets. Click on the handles to expand schemas and see the tables inside. The tabs to the right contain information depending what schema/table you have selected in the tree.
Now you can type an ADQL query into the text field at the bottom of the window. If you don't know how to do that, don't worry; there is help at hand.
When you've got the idea, try your own queries, or edit the examples to taste. Useful hints:
opens a new blank tab, and
copies the current text into a new tab.
/
buttons or
Ctrl-Z/Ctrl-Shift-Z.
button.
The
button works the same way for the currently
selected table name.There are various different ways to crossmatch tables against each other. Which is best depends on the details of what you want to do, how big the tables you want to match are, and what data services are available that provide the (large) tables of interest.
There are four main options available from TOPCAT (and STILTS), described below. The examples here identify objects in the region of the Pleiades with detections in both Tycho-2 and 2MASS, though note some of these methods are more appropriate than others for this job. These examples use the Tycho catalogue from the previous section. Alternatively, just download it from http://andromeda.star.bris.ac.uk/data/tycho-pleiades.fits.
The Crossmatch Window
is generally the easiest way to do it, very flexible, and usually quite fast.
-Xmx1000M) may help.2MASS-PSC)
like for the Tycho data above.
You may need to increase the Maximum Row Count
(plot the result on the sky
when loaded to check that the region
looks like a circle, not a circle with some parts missing).
You can relabel it 2mass.
using the button in the main toolbar
(or the Join menu)tycho and 2mass tables.
Plot Result button.tskymatch2,
tmatch2,
tmatch1,
tmatchn.
Note these are still vulnerable to running out of memory.
The above example in STILTS looks like:
stilts tskymatch2 \
in1=tycho-pleiades.fits ra1=_RAJ2000 dec1=_DEJ2000 \
in2=2mass-pleiades.fits ra2=_RAJ2000 dec2=_DEJ2000 \
join=1and2 find=best error=1 \
out=tycho-2mass.fits \
The CDS X-Match window
is based on the X-Match service provided by CDS.
If you have one table loaded in topcat, and want to match against
another one that's in VizieR (or the SIMBAD database),
it works well, and is very fast, even for large tables.
,
using the button in the main toolbar (or VO menu).2MASS in the VizieR Table ID/Alias
selector. If the table you want isn't there, try searching at
http://tapvizier.u-strasbg.fr/adql/ and use
the table identifier (e.g. II/246/out) in the
right hand column.
button and select the new table.
Vary the marker Shape and Size
(in the Form tab)
and Colour (in the Subsets tab)
so that all the different markers are visible at the same time.cdsskymatch.
This will work with a local table of unlimited size.
The above example in STILTS looks like:
stilts cdsskymatch cdstable=2MASS \
in=tycho-pleiades.fits \
ra=_RAJ2000 dec=_DEJ2000 radius=1 \
find=best out=tycho-2mass.fits
TAP joins use the
TAP window
.
If there are two huge tables in a remote TAP database,
use TAP to join them.
If you want to join a local table with a huge table in a remote TAP
database, use a TAP Upload query (if available).
using the button
in the main toolbar (or VO menu).GAVO DC TAP (near the top, probably).
Select that row, and hit the Use Service button
at the bottom.2mass
in the Find box. No results. Try clicking
the Description checkbox, which will check for
text matches in the table description as well as name.
Open the twomass entry in the tree
and select table twomass.data.twomass.data is selected in the TAP window.
Then click the Examples button at the bottom
of the screen, and select Upload|Upload Join.
It puts some text (an upload join query) into the ADQL text box.TOP 1000".tapquery,
tapskymatch.
tapskymatch only does positional sky matches:
but it works on unlimited sized tables, and is more straightforward to use.
The above example in STILTS looks like:
stilts tapskymatch \
tapurl=http://dc.g-vo.org/tap \
taptable=twomass.data taplon=raj2000 taplat=dej2000 \
in=tycho-pleiades.fits inlon=_RAJ2000 inlat=_DEJ2000 \
sr=1./3600. find=best sync=true \
out=tycho-2mass.fits
or
stilts tapquery tapurl=http://dc.g-vo.org/tap \
nupload=1 upload1=tycho-pleiades.fits \
ucmd1='colmeta -name ra _RAJ2000' ucmd1='colmeta -name dec _DEJ2000' \
adql="SELECT * FROM twomass.data AS twom \
JOIN TAP_UPLOAD.up1 AS tycho \
ON 1=CONTAINS(POINT('ICRS', twom.raj2000, twom.dej2000), \
CIRCLE('ICRS', tycho.ra, tycho.dec, 1./3600.))" \
sync=true out=tycho-2mass.fits
The Multi-Cone window
lets you make one cone search for each row
of a local table, effectively joining it with the remote one that is
exposed via a Cone Search service.
and
Multi-SSA
do the same thing for Simple Image and Spectral Access services.ivoa.obscore table)
may be another way to get image/spectral observation data).
using the VO|Multicone menu item2mass" to the Keywords
field and hit Find Services. It gives you too many
results ... try "2mass point".
Pick one that looks OK -
(hint: the VizieR one ("II/246") will probably
work best).
When selected this will fill in the Cone Search URL
field.tycho as the Input Table
coneskymatch
The above example in STILTS looks like:
stilts coneskymatch \
serviceurl='http://vizier.u-strasbg.fr/viz-bin/votable/-A?-out.all&-source=II%2F246%2Fout&' \
in=tycho-pleiades.fits icmd=progress \
sr=0.0002777 find=best \
out=tycho-2mass.fits
To summarise, the best choice mainly depends on whether your tables are Small (<thousand row), Medium (<million row), or Huge.
T1 | T2 | Crossmatch | CDS X-Match | TAP | Multi-Cone ------------------------------|---------------------------------------------- Small/Medium | Small/Medium | YES | yes | yes | no Small | Huge | no | YES | yes | yes Medium | Huge | no | YES | yes | no Huge | Huge | no | (web iface) | yes | no
Some miscellaneous tips:
button on main toolbar).
Make sure you have
(2MASS) J, H and K magnitudes,
heliocentric radial velocity (rv),
parallax (Plx)
and calibrated metallicity (met).
If they don't have those names, you can rename the columns
by double-clicking on the names in the columns display and typing the
new name.View the RAVE data on the sky.
.
button,
especially if your screen is small/short.
and adjusting the options:
For fun, you can overplot the Messier objects on the same plot.
button
buttonName or ID column
in the Label selectorThis is how you do that in STILTS:
stilts plot2sky \
viewsys=galactic \
in1=rave-dr4.fits lon1=raj2000 lat1=dej2000 \
layer1=mark \
in2=messier.xml lon2=RA lat2=DEC color2=cyan \
layer2a=mark size2a=2 \
layer2b=label label2b=name
Do a colour-colour plot of RAVE data:
button,
especially if your screen is small/short.h-j and
Y as h-k.
Note these are using topcat's
expression language;
in this case the expressions are pretty simple.
Auto (default):
you can get a fairly good idea that the density
is bimodal, but not all the details
Flat:
hopeless for this kind of high-density plot
Translucent:
adjust the Transparency Level slider.
Can see bimodality, but hard to see all
levels of detail at once.
Transparent:
like Translucent, but behaves differently if you zoom in/out
Density:
more options to explore density profiles.
Fiddle with colour maps, scaling, clips, quantisation, ...
(probably need to increase Smoothing)[M/H]
or something else, depending where you got the data)
varies over the colour-colour space.
You can do this by using the metallicity value to colour-code
plotted points with the remaiing Shading Modes.
In both these cases, a colour ramp is displayed on the right,
and you can control the map using the
Aux Axis Control
control that appears in
the control stack on the left.
Play around with the colour map etc in the Map tab and
the data range in the Range tab for best results.
The Aux and Weighted
shading modes can both do this:
Aux:
select the metallicity column
as the Aux coordinate.
You can see some low-metallicity hotspots, but they're a bit
noisy because points are plotted over each other.
Adjusting transparency, zooming in, or making points smaller
may help a bit.
Weighted:
select the metallicity column as the Weight coordinate.
This averages the values at each pixel and makes it easier to see
what's going on.
The combination method is Mean by default;
you can experiment with the others.
Median may give you a more robust view, but
note it's slower.
This is how you do that in STILTS:
stilts plot2plane \
in=rave-dr4.fits \
icmd='colmeta -name met c[M/H]K' \
x=h-j y=h-k \
xmin=-1.3 xmax=0.2 ymin=-0.2 ymax=0.7 \
auxmap=cubehelix auxmin=-3 auxmax=0 auxfunc=square \
layer1=mark shading1=weighted weight1=met
Get some theoretical data from the Milliennium simulation in one of the following ways:
VO|GAVO Millennium Run Query
menu itemPlot X,Y,Z positions:
button,
especially if your screen is small/short.x, y and z
as the X, Y and Z
coordinates.
.
There's no Auto because it plays badly with multi-dataset plots.
The transparency options
,
are OK.
But if you have a single-dataset plot you can use
Density
mode.
Navigate in 3D:
button to rescale if you get lost.Plot velocities:
.velX/Y/Z.
See the little arrows.Plot separate subsets:
type which identifies
galaxies as central/FOF (0), central/subhalo (1) or satellite.
It's useful to plot these as separate subsets.
button
using the expressions
type==0, type==1 and type==2.
(in the Subsets menu, or on the toolbar)type in the Classification Value
selector, and hit Classify buttonThis is how you do one of those plots in STILTS:
stilts plot2cube \
in=millennium-g2.fits \
x=x y=y z=z \
layer1=mark \
xdelta2=velx ydelta2=vely zdelta2=velz \
shading2=transparent opaque2=3 \
layer2a=xyzvector icmd2a='select type==1' color2a=cyan \
layer2b=xyzvector icmd2b='select type==2' color2b=magenta \
layer2c=xyzvector icmd2c='select type==0' color2c=606060 \
seq=2a,2b,2c
Activation Actions are things you can cause to happen when you
"activate" a row. Activation happens when you click on a row in the
Data Window
or click on a point in a plot.
Some of these use SAMP.
They are controlled from the Activation Window, which you can reach using the Activation Action button in the main Control Window.
This example uses a list of HST spectral observations of
positions in the Asiago supernova catalog (B/sn in VizieR).
It was obtained by using the
Multiple SSA window
from the VO menu with B/sn as the local
table and ivo://mast.stsci/ssap/hst as the remote service.
Alternatively, like this (which took about 5 minutes):
stilts coneskymatch \
in='http://vizier.u-strasbg.fr/viz-bin/votable?-source=B%2fsn' \
icmd='replacecol -units deg RAJ2000 hmsToDegrees(RAJ2000)' \
icmd='replacecol -units deg DEJ2000 dmsToDegrees(DEJ2000)' \
icmd='select RAJ2000>0' \
icmd=cache icmd=progress \
servicetype=ssa \
serviceurl='http://archive.stsci.edu/ssap/search2.php?id=HST&' \
dataformat=fits \
ra=RAJ2000 dec=DEJ2000 sr=5./3600. \
usefoot=false find=best \
parallel=5 emptyok=false \
ocmd='colmeta -utype ssa:Access.Reference utype$ssa_access_reference' \
ostream=true \
out=sn-hst-spectra.vot
You can find the data file at http://andromeda.star.bris.ac.uk/data/sn-hst-spectra.vot.
We will try a few activation actions. Click the Activation Action button in the Control Window to open the Activation Window and select different options by selecting one of the radio buttons on the left, filling in the corresponding panel, and clicking OK at the bottom (which closes the window).
url column (may get filled in automatically).
Activating rows loads the corresponding spectrum into SPAT.
for the options.
One of the most useful are the exec functions
listed under System.
These let you execute a shell command.
If you enter
'exec("curl", url, "-o", sn+".fits")',
then every row activation will result in a FITS file being written to
the current directory,
named after the nearby supernova (e.g. 1987A.fits),
containing the corresponding HST spectrum.
For more control over what happens, you could write a custom shell script
that takes command-line parameters supplied by such an exec
invocation.
STILTS is a set of command line tools that can do more or less the same things as TOPCAT. For full details see the manual, but here is some explanation of the basic concepts.
stilts executable
(chmod u+x stilts) and use that.
Alternatively, do java -jar stilts.jar.
All commands are of the form
stilts <task-name> <param1>=<value1> <param2>=<value2> ...
tpipe.
This takes an input file, lets you apply a number of
filters
(zero or more cmd=... parameters)
that modify the table data or metadata,
and output it in some way
(omode=... parameter).
It works like a Unix pipeline, but processing table data and metadata
not lines of an input stream.
If there are no cmd parameters, no changes are made.
If there is no omode parameter, the result is just written
to file (format default or as specified)
stilts tpipe in=messier.xml out=messier.fits
stilts tpipe in=rave-dr4.fits cmd="select abs(hrv)>100" omode=count
tpipe they are given as cmd= arguments.
For some other commands which distinguish between different table
streams they can be e.g.
icmd/ocmd (for input/output) or
cmd1/cmd1 (for two different input tables) etc.
Here are some examples of using filters:
cmd=’addcol B_R BMAG-RMAG’
cmd=’select skyDistanceDegrees(RA,DEC,78.63,-8.20)<0.001’
cmd=’sort RMAG-BMAG’ cmd=’head 10’
$n" syntax as alias for column #n.exp(x), janskyToAb(flux),
substring(str,index), ...) -
use stilts funcs to investigate.&& (and),
|| (or),
== (equals),
! (not), ...?
<value-if-true>
:
<value-if-false>
tpipe)Help is available from the command line too:
stilts <task-name> -help for help
on the commandstilts <task-name> help=<param-name>
(or ? at the prompt) for help on a parameter.
Admittedly, the plotting commands can be quite complicated... They have their chapter in the manual to help.
JyStilts is a Jython (like python, but implemented in Java and with no C extensions) front-end to stilts.
jython -jar
jystilts.jar
.
stilts tskymatch2 in1=survey.fits \
icmd1='addskycoords fk4 fk5 RA1950 DEC1950 RA2000 DEC2000' \
in2=mycat.csv ifmt2=csv \
icmd2='select VMAG>18' \
ra1=ALPHA dec1=DELTA ra2=RA2000 dec2=DEC2000 \
error=10 join=2not1 \
out=matched.fits
>>> import stilts
>>> t1 = stilts.tread('survey.fits')
>>> t1 = t1.cmd_addskycoords(t1, 'fk4', 'fk5', 'RA1950', 'DEC1950', 'RA2000', 'DEC2000')
>>> t2 = stilts.tread('mycat.csv', 'csv')
>>> t2 = t2.cmd_select('VMAG>18')
>>> tm = stilts.tskymatch2(in1=t1, in2=t2, ra1='ALPHA', dec1='DELTA',
... error=10, join='2not1')
>>> tm.write('matched.fits')
len(t)
t[100:200]
for row in t:
row['BMAG'] or row[2]
help(t.cmd_addcol)