Considerable effort has gone into making TOPCAT capable of dealing with large datasets. In particular it does not in general have to read entire files into memory in order to do its work, so it's not restricted to using files which fit into the java virtual machine's 'heap memory' or into the physical memory of the machine. As a rule of thumb, the program will work at reasonable speed with tables up to about 1-10 million rows, depending on the machine it's running on. It may well work work with hundreds of millions of rows, but performance may be more sluggish. The number of columns is less of an issue, though see below concerning performance.
However, the way you invoke the program affects how well it can cope with large tables; you may in some circumstances get a message that TOPCAT has run out of memory (either a popup or a terse "OutOfMemoryError" report on the console), and there are some things you can do about this:
-Xmx
flag, followed by the maximum heap memory,
for instance "topcat -Xmx1000M
" or
"java -Xmx1000M -jar topcat-full.jar
".
Don't forget the "M
" to indicate megabytes
or "G
" for gigabytes.
It's generally reasonable to increase this value up to nearly the
amount of free physical memory in your machine if you need to
(taking account of the needs of other processes running at the same time)
but attempting any more will usually result in abysmal performance.
See Section 10.2.2.
-Xmx
flags as above.
Feather format also has the same advantages.
java -version
" (or "topcat -version
")
will probably say something about 64-bit-ness if it is 64-bit.
It is also possible to use column-oriented storage for non-FITS
files by specifying the flag -Dstartable.storage=sideways
.
This is like using the -disk
flag but uses column-oriented
rather than row-oriented temporary files. However, using it for
such large files means that the conversion is likely to be rather
slow, so you may be better off converting the original file to
colfits
format in a separate step and using that.
adaptive
", means that the data for relatively small
tables are stored in memory, and for larger ones in temporary disk files.
This usually works fairly well and you're not likely to need to change it.
However, you can experiment if you like, and a small amount of
memory may be saved if you encourage it to store all table data on disk,
by specifying the -disk
flag on the command line.
You can achieve the same effect by adding the line
startable.storage=disk
in the
.starjava.properties
in your home directory.
See Section 10.1, Section 10.2.3.
As far as performance goes, the memory size of the machine you're using does make a difference. If the size of the dataset you're dealing with (this is the size of the FITS HDU if it's in FITS format but may be greater or less than the file size for other formats) will fit into unused physical memory then general everything will run very quickly because the operating system can cache the data in memory; if it's larger than physical memory then the data has to keep being re-read from disk and most operations will be much slower, though use of column-oriented storage can help a lot in that case.