Okay, second article. Probably should have been the first, but I was still writing it, so...
Shadow's Technical Notes on Compiled TS (CTS) scripts
October 1, 2019
========================================================================================
1. OVERVIEW (What is a CTS anyway?)
Short answer: a .cts file is compiled bytecode that can be loaded and executed quickly by TWX Proxy. It is not an encrypted .ts file (in fact, there is no encryption done in the compilation process.)
Longer answer:
TWX Proxy reads and executes scripts in either .ts (twx script) or .cts (compiled twx script) format. Any .ts script can be compiled to a .cts using TWXC.EXE, shipped with the proxy distribution.
The primary benefit of a .cts file is that it loads faster because it is compiled bytecode that can be immediately processed into a Script object and executed. When loading a .ts file with twx, the file is first compiled (internally) and then processed into a Script object. Using a pre-compiled .cts file saves the compilation time and results in a much faster script starting time. It does NOT, however, have any effect on the speed of the script once it starts, since both are compiled before being executed.
2. WHAT'S NOT IN A CTS FILE?
While much of the information in the original TS file is maintained in the CTS output, some data is lost during the compilation to bytecode, and is impossible to restore through decompiling. Specifically, this includes:
a) Comments. This one is particularly frustrating if you are trying to understand how something works (assuming it was commented originally), but it is part of the process - the compiler simply throws out any line that starts with "#" and does not compile it in to the CTS.
b) Formatting. When converting all commands and parameters to bytecode, the compiler ignores any extra spaces, tabs, and things like extra parentheses around statements. These cannot be restored by the decompiler, though I have added an indent option that will indent using a standard method with tabs (if enabled), to make the code more readable.
c) Filesystem structure (for includes). This one is probably not an issue if you are decompiling single .ts files that do not call includes. However, for includes, the CTS only contains the filename of the included script and not its original location; so the decompiler will drop the includes in the same directory as the original script.
As a side effect of the lack of file structure, if you happen to include scripts with the same name from different directories, you have to either append the includes into a single file or create multiple files. For example, if you have the following includes in your script:
include "source\bot_includes\player\quikstats\player"
include "source\bot_includes\player\currentprompt\player"
include "source\bot_includes\player\startcnsettings\player"
include "source\bot_includes\player\getinfo\player"
include "source\bot_includes\player\moveintosector\player"
include "source\bot_includes\ship\getshipcapstats\ship"
include "source\bot_includes\ship\getshipstats\ship"
include "source\bot_includes\ship\savetheship\ship"
include "source\bot_includes\ship\loadshipinfo\ship"
This gets stored in the CTS file header as the following list of includes (note that the base script retains its full name with extension):
MOMBOT.TS
PLAYER
PLAYER
PLAYER
PLAYER
PLAYER
SHIP
SHIP
SHIP
SHIP
This produces the side effect that you have to either append files with the same namespace into a single file per namespace, or leave everything in the original ts file. If you don't intend to create a new directory structure for the includes (or move them back to one), it is necessary to name them the same for the code to work because the namespace that is used for includes is based on the filename -- so a variable called "test" in "player.ts" is referenced as "$player~test". I chose, in my decompiler, to append the files to a single include in the same directory as the base script, to keep a clean namespace.
3. FILE STRUCTURE
Each .cts file has the following overall structure:
Header (twxscript version[1-5], description size, file size)
Description
Parameters (all of the parameters used by the bytecode -- strings, values, etc)
Labels (for gosub/goto, if/else/elseif/while)
Includes (a list of all included files/namespaces)
Bytecode (a series of commands, parameters, and labels)
More details on each are as follows:
(a) Header
The header is defined in "ScriptCmp.pas" in the Delphi release of TWX. My port to C# is as follows:
Code:
// header at top of compiled script file (from ScriptCmp.pas)
[StructLayout(LayoutKind.Explicit)]
public struct TScriptFileHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 12)]
public string ProgramName;
[FieldOffset(12)]
public ushort Version;
[FieldOffset(16)]
public int DescSize;
[FieldOffset(20)]
public int CodeSize;
}
Version is one of the following, mostly used for backward compatibility (to know what versions can be read by each version of twx):
// TWX Proxy 2.02 is version 1
// TWX Proxy 2.03Beta is version 2?
// TWX Proxy 2.03Final is version 3
// TWX Proxy 2.04 is version 4
// TWX Proxy 2.05 is version 5
The size offsets are used when reading the cts file, to know how many bytes to read for the description and bytecode.
(b) Description
Description allows for the contents of an additional description file to be appended to the script. This is mostly unused today.
(c) Parameters
Immediately following the description are the parameters. Parameters are used by nearly all of the commmands in the bytecode. Paramaters are stored as one of the following types:
Code:
public const int PARAM_VAR = 1;
// User variable prefix
public const int PARAM_CONST = 2;
// Compiler string constant prefix
public const int PARAM_SYSCONST = 3;
// Read only system value
public const int PARAM_PROGVAR = 4;
// Program variable
public const int PARAM_CHAR = 5;
// Single character value
Each parameter has a type, a length and a value. They are stored in order, at the beginning of the code section of the file, and must be read and stored and then dereferenced while processing the bytecode.
The first type, TCmdParam, are command parameters. The second type, TVarParam, are variable names.
Here is an example of some CmdParams:
Code:
CmdParam = You have a corporate memo from
CmdParam = RELOG
CmdParam = :CONNECTIVITY~KEEPALIVE
CmdParam = CONNECTION LOST
CmdParam = ONLINE_WATCH
CmdParam = :CONNECTIVITY~ONLINE_WATCH
CmdParam = Your session will be terminated in
CmdParam = KEEPALIVE
CmdParam = :CONNECTIVITY~KEEPALIVE
CmdParam = 30000
And some VarParams:
Code:
VarParamName = $BOT~UNLIMITEDGAME
VarParamName = $~UNLIMITEDGAME
VarParamName = $SHIP~CAP_FILE
VarParamName = $PLANET~PLANET_FILE
VarParamName = $GAME~MBBS
VarParamName = $BOT~_CK_PTRADESETTING
VarParamName = $BOT~RYLOS
VarParamName = $BOT~ALPHA_CENTAURI
VarParamName = $BOT~STARDOCK
The code to read the parameters, in C#, is as follows:
Code:
ParamType = br.ReadByte();
while (ParamType > 0)
{
if ((ParamType == 1)) // TCmdParam (1)
{
Len = br.ReadInt32();
Val = br.ReadBytes(Len);
TCmdParam Param = new TCmdParam();
ValStr = System.Text.Encoding.Default.GetString(Val);
Param.Value = ApplyEncryption(ValStr, 113);
FParamList.Add(Param);
}
else // TVarParam (2)
{
Len = br.ReadInt32();
byte[] PVal = br.ReadBytes(Len);
TVarParam Param = new TVarParam();
String PValStr = System.Text.Encoding.Default.GetString(PVal);
Param.Value = ApplyEncryption(PValStr, 113);
Len = br.ReadInt32();
byte[] PName = br.ReadBytes(Len);
String PNameStr = System.Text.Encoding.Default.GetString(PName);
Param.Name = ApplyEncryption(PNameStr, 113);
FParamList.Add(Param);
}
ParamType = br.ReadByte();
}
You will note the "ApplyEncryption" functions used here. There is a very very basic "encryption" done on all command parameters -- which includes all text values -- in the cts file. It's really more aptly described as "obfuscation" than encryption, but getting this to decrypt properly took some time.
One of the other challenges in reading the parameters it that there is no length provided for the parameters section; it terminates with a ParamType of 0, and this is the only way to know when you've reached the end (so if you are not checking the length of each value you are reading, you will over-read this section and miss the next.)
(d) Includes
Some scripts include other files, as mentioned above. This section is a simple list of all included files. Like the parameters, it ends with a null.
Reading it is very easy:
Code:
Len = br.ReadInt32();
while (Len > 0)
{
Val = br.ReadBytes(Len);
String inc = Encoding.UTF8.GetString(Val, 0, Val.Length);
IncludeList.Add(inc);
Len = br.ReadInt32();
}
These include file names are very important later if you want to split the include files back out, because that requires processing any variables or labels that include $INCLUDE~ or INCLUDE~: strings.
(e) Labels
The labels section is next in the cts file. Labels designate "code branches" and are used by the GOTO/GOSUB functions as well as by triggers.
In addition to the labels created by the script writer, the compilation process creates additional labels that it uses to track branches. I will address the whole topic of branches in a different article.
Labels have a value and a location, and look like this:
Code:
Label = CHECKSTARTINGPROMPT (Location 1556)
Label = KILLTHETRIGGERS (Location 1856)
Label = :17 (Location 1607)
Label = :18 (Location 1607)
Label = BOT~:19 (Location 1850)
Label = BOT~:20 (Location 1850)
Label = BOT~BIGDELAY_KILLTHETRIGGERS (Location 1889)
Label = BOT~UNFREEZEBOT (Location 1922)
Label = BOT~WAIT_FOR_COMMAND (Location 1966)
The labels with a ~ in them refer to include files, which I will also deal with in another article.
The reason for the location - and this might be the least fun part of writing a decompiler for cts files - is that the labels are not in order like the parameters, but instead have a location (in bytes from the start of the bytecode section) and must be inserted in the output file at the given offset while processing the bytecode.
(f) Bytecode
Finally, the bytecode section. This contains the most information, and is surprisingly simple (for fast execution), but reading and processing this is the most complex part of decompiling a script; the code for processing the bytecode alone, in C#, is over 1200 lines.
The bytecode is read, byte by byte, until the end is reached (as defined in the header). This starts out as follows:
Code:
// fetch script ID
byte ExecScriptID = CodeRef.ReadByte();
// fetch line number
CodeLine = (int)CodeRef.ReadUInt16();
// fetch command from code
CmdLine.ID = CodeRef.ReadUInt16();
ScriptCmd Cmd = TCmd.Cmds(ref CmdLine.ID);
CmdLine.Name = Cmd.Name;
After the bytecode header, a list of parameter references follows, terminated by a null. Each reference points to an integer that refers back to the parameter number (from the previously red CmdParams). In some cases, references can point to more than one parameter. I will write more detail on this in another article.