Considerable effort has gone into making TOPCAT capable of dealing with large datasets. In particular it does not in general have to read entire files into memory in order to do its work, so it's not restricted to using files which fit into the java virtual machine's 'heap memory' or into the physical memory of the machine. As a rule of thumb, the program will work at reasonable speed with tables up to about 1-10 million rows, depending on the machine it's running on. It may well work work with hundreds of millions of rows, but performance may be more sluggish. The number of columns is less of an issue, though see below concerning performance.
However, the way you invoke the program affects how well it can cope with large tables; you may in some circumstances get a message that TOPCAT has run out of memory (either a popup or a terse "OutOfMemoryError" report on the console), and there are some things you can do about this:
-Xmxflag, followed by the maximum heap memory, for instance "
topcat -Xmx1000M" or "
java -Xmx1000M -jar topcat-full.jar". Don't forget the "
M" to indicate megabytes or "
G" for gigabytes. It's generally reasonable to increase this value up to nearly the amount of free physical memory in your machine if you need to (taking account of the needs of other processes running at the same time) but attempting any more will usually result in abysmal performance. See Section 10.2.2.
-Xmxflags as above. Feather format also has the same advantages.
java -version" (or "
topcat -version") will probably say something about 64-bit-ness if it is 64-bit.
It is also possible to use column-oriented storage for non-FITS
files by specifying the flag
This is like using the
-disk flag but uses column-oriented
rather than row-oriented temporary files. However, using it for
such large files means that the conversion is likely to be rather
slow, so you may be better off converting the original file to
colfits format in a separate step and using that.
adaptive", means that the data for relatively small tables are stored in memory, and for larger ones in temporary disk files. This usually works fairly well and you're not likely to need to change it. However, you can experiment if you like, and a small amount of memory may be saved if you encourage it to store all table data on disk, by specifying the
-diskflag on the command line. You can achieve the same effect by adding the line
.starjava.propertiesin your home directory. See Section 10.1, Section 10.2.3.
As far as performance goes, the memory size of the machine you're using does make a difference. If the size of the dataset you're dealing with (this is the size of the FITS HDU if it's in FITS format but may be greater or less than the file size for other formats) will fit into unused physical memory then general everything will run very quickly because the operating system can cache the data in memory; if it's larger than physical memory then the data has to keep being re-read from disk and most operations will be much slower, though use of column-oriented storage can help a lot in that case.