AOH :: JUFFA2.TXT
Norbert Juffa's Turbo Pascal 6.0 bug list 2 of 2
|
From: dmurdoch@watstat.waterloo.edu (Duncan Murdoch)
t.edu!wupost!cs.utexas.edu!utgpu!watserv1!watdragon!watstat.waterloo.edu!dmurdoch
Newsgroups: comp.lang.pascal
Subject: Re: Norbert Juffa's bug list (part 2 of 2)
Organization: University of Waterloo
.Message-ID: <1992Apr7.155831.23316@watdragon.waterloo.edu>
.Date: Tue, 7 Apr 1992 15:58:31 GMT
16. Incorrect assembly of certain JMPs and CALLs by inline assembler
The inline assembler will incorrectly assemble certain JMPS and
CALLs that are invalid and are rejected by the MASM and TASM
assemblers. It will also incorrectly assemble JMPs and CALLs to
destination declared with the ABSOLUTE directive. The bugs can
be demonstrated by the following program:
PROGRAM Jmp_Call;
VAR AbsPointer: POINTER ABSOLUTE $1234:$5678;
NormPointr: POINTER;
PROCEDURE FarProc; FAR; ASSEMBLER;
ASM
END;
PROCEDURE NearProc; NEAR; ASSEMBLER;
ASM
END;
BEGIN
ASM
JMP NEAR PTR AbsPointer { This is illegal in MASM / TASM }
JMP FAR PTR AbsPointer { incorrectly assembled by inline assmbl.! }
JMP AbsPointer { This is illegal in MASM / TASM }
JMP NEAR PTR NormPointr { This is illegal in MASM / TASM }
JMP FAR PTR NormPointr
JMP NormPointr
JMP NEAR PTR FarProc
JMP FAR PTR FarProc
JMP FarProc
JMP NEAR PTR NearProc
JMP FAR PTR NearProc
JMP NearProc
CALL NEAR PTR AbsPointer { This is illegal in MASM / TASM }
CALL FAR PTR AbsPointer { incorrectly assembled by inline assmbl.! }
CALL AbsPointer { This is illegal in MASM / TASM }
CALL NEAR PTR NormPointr { This is illegal in MASM / TASM }
CALL FAR PTR NormPointr
CALL NormPointr
CALL NEAR PTR FarProc
CALL FAR PTR FarProc
CALL FarProc
CALL NEAR PTR NearProc
CALL FAR PTR NearProc
CALL NearProc
END;
END.
The instructions marked as "illegal in MASM / TASM" should be
flagged as errors by the inline assembler. "JMP NEAR PTR AbsPointer"
and "JMP NEAR PTR NormPointr" are near jumps to a different CS,
and in "JMP AbsPointer", AbsPointer can't be addressed with the
currently assumed registers. The same remarks apply to the
equivalent CALL statements in the source. Consequently, the
inline assembler produces garbage for the illegal statements.
"JMP FAR PTR AbsPointer" should assemble to "JMP 1234:5678", but
the inline assembler produces something very different. Instead
of the absolute segment 1234h it uses the value of CS and in
addition mangles the offset value. The assembly language program
below, which is equivalent to the above PASCAL program, shows
that "JMP FAR PTR AbsPointer" and "CALL FAR PTR AbsPointer" can
be assembled correctly by TASM / MASM, so the inline assembler
should do this as well.
DOSSEG
AbsSeg SEGMENT AT 1234h
ORG 5678H
AbsPointer DD ?
AbsSeg ENDS
DATA SEGMENT WORD PUBLIC 'DATA'
ASSUME DS:DATA
NormPointr DD ?
DATA ENDS
CODE SEGMENT BYTE PUBLIC 'CODE'
ASSUME CS:CODE, DS:DATA
FarProc PROC FAR
RET
FarProc ENDP
NearProc PROC NEAR
RET
NearProc ENDP
Main: MOV AX, SEG (NormPointr)
MOV DS, AX
; JMP NEAR PTR AbsPointer ; error !
JMP FAR PTR AbsPointer
; JMP AbsPointer ; error !
; JMP NEAR PTR NormPointr ; error !
JMP FAR PTR NormPointr
JMP NormPointr
JMP NEAR PTR FarProc
JMP FAR PTR FarProc
JMP FarProc
JMP NEAR PTR NearProc
JMP FAR PTR NearProc
JMP NearProc
; CALL NEAR PTR AbsPointer ; error !
CALL FAR PTR AbsPointer
; CALL AbsPointer ; error !
; CALL NEAR PTR NormPointr ; error !
CALL FAR PTR NormPointr
CALL NormPointr
CALL NEAR PTR FarProc
CALL FAR PTR FarProc
CALL FarProc
CALL NEAR PTR NearProc
CALL FAR PTR NearProc
CALL NearProc
CODE ENDS
STACK SEGMENT STACK
DB 100h DUP (?)
STACK ENDS
END MAIN
17. Other bugs in the inline assembler (ASM directive)
Several instructions that have a format with an immediate operand
support senseless or incorrect ranges for the immediate value. The
IN, OUT, and INT instructions will accept values between -128 and
255. Since negative values make no sense here, the possible range
should be restricted to 0 to 255. The ENTER instruction will also
take negative arguments. Again, this is not a very sensible choice.
How are -5 bytes reserved for local variables? The allowed range
for the arguments should be 0 to 255 and 0 to 65535, respectively.
Although it is not officially documented by Intel, the AAM and AAD
instructions may take additional arguments that indicate the base
on which to perform the conversion. This is supported by the inline
assembler. However, it accepts arguments between -128 and 127 while
it should accept bases between 0 and 255, since the bases available
with AAM and AAD must be positive.
The inline assembler performs no check on the index in
the stack top relative addressing mode of the coprocessor.
Very large or even negative values are allowed. For example,
FADD ST, ST(123456) will be accepted as perfectly legal.
This must be fixed to make sure the index is between zero and
seven.
There is no way to code an absolute far jump such as JMP F000:FFF0
(to perform a warm start). The same restriction applies to far
calls. In a conventional assembler such a jump could be coded as
follows:
BIOS SEGMENT AT 0F000h
ORG 0FFF0h
Restart LABEL FAR
BIOS ENDS
CODE SEGMENT BYTE PUBLIC 'CODE'
JMP Restart
CODE ENDS
There are no segment declarations available with the inline
assembler, so the jump has to be either hand coded with DBs or
changed to a memory indirect far jump using a appropriately
initialized pointer. This problem should be documented in
the Turbo-Pascal manuals.
One of the standard syntax available with the IMUL instruction is
not accepted by the inline assembler. IMUL reg16, immed8 is not
allowed, rather the inline assembler expects this to be coded as
IMUL reg16, reg16, immed8, where the two registers are identical.
The IMUL reg16, immed8 is commonly accepted by assemblers and is
also listed in Intel's documentation. Therefore, this syntax should
be supported by the inline assembler.
Some 286 protected mode instructions (LLDT, LMSW, LTR, SMSW, VERR,
VERW) when used with memory operands require the use of the PTR
directive to establish operand size with an untyped operand (e.g.,
[BX+SI]). Two other instruction, SGDT and SIDT, do not require the
use of PTR in these cases. This usage is inconsistent. Since the
operand size can be deduced from the instructions itself (just as
can be done in the case of a MOV AX, [BX]) no PTR directive at all
should be required. Likewise, the POP mem16 instruction should not
require a WORD PTR directive with an untyped memory operand, since
memory operand size is obvious from the instruction.
18. Errors / problems / documentation deficiencies using coprocessor
instructions with the new ASM directive of TP 6.0
In the $G+,N+ compiler mode the inline assembler does not
assemble coprocessor instructions into emulator interrupts
regardless of the $E switch setting. Instead it always generates
optimized coprocessor instructions (without inserted WAITs).
This causes programs compiled with $G+,N+,E+ to fail if no
coprocessor is present. The assembler must ensure that the
$E switch is off before performing this optimization.
Coprocessor instructions in the no-wait form (e.g. FNINIT,
FNSTSW, FNSTCW, FNSTENV, FNCLEX) are not encoded into emulator
interrupts, since it makes no sense to use them with the
emulator which cannot work in parallel with the CPU. This
may lead to problems if programmers are not aware of the
fact that these instructions will have absolutely no effect
in an emulator environment. Since it is desirable to have the
no-wait instructions available, programmers should be warned
by the documentation not to use them in programs or routines
that may be executed by the emulator or to explicitly code
around this problem by using the system variable Test8087.
An example of a work around solution follows.
ASM
. { some other code }
.
CMP Test8087, 0 { coprocessor present ? }
JNE @Emulate { no, do specific code for emulator}
FNINIT { can be safely used with 8087 }
JMP @Continue { skip emulator code }
@Emulate:
FINIT { this can be emulated }
@Continue: { continue with more code }
.
.
END;
19. Inconsistent error messages emitted by inline assembler
When constants that are out of range are supplied to assembler
instructions that take some kind of immediate operand, two different
error messages are emitted depending on the type of the destination
operand. If the destination operand is a byte operand, as in
ADD AL, 256 the compilation will result in error 155, 'Invalid
combination of opcode and operands'. However, if the destination
is a word operand as in ADD AX, 65536 the resulting error will be
#76, 'Constant out of range'. This discrepancy should be resolved
by always emitting the 'Constant out of range' error when an
immediate value is not within the specified limits called for by the
destination operand.
Instructions that require one of their operands to be a memory
reference (BOUND, LDS, LES, LEA, SGDT, SIDT, LGDT, LIDT) should
cause compile error 156 (memory reference expected) to be emitted
when a register is supplied instead of a memory reference. This
will give a more detailed description of the error than the
currently used error 155 (invalid combination of opcode and
operand).
There are space saving sign extending encodings available for OR,
AND, and XOR instructions that the inline assembler fails to use.
These encodings are the equivalents of the sign extending encodings
used with the ADD, ADC, SUB, SBB, and CMP instructions. A list of
the additional instructions follows:
Instruction | Encoding
---------------------+-------------------------------------------
OR reg16, const8 | 83 mod 001 r/m data8
OR mem16, const8 | 83 mod 001 r/m (disp) (disp) data8
AND reg16, const8 | 83 mod 100 r/m data8
AND mem16, const8 | 83 mod 100 r/m (disp) (disp) data8
XOR reg16, const8 | 83 mod 110 r/m data8
XOR mem16, const8 | 83 mod 110 r/m (disp) (disp) data8
20. Compiler Switch /V doesn't export names of SYSTEM routines
When using the /V of TPC or choosing standalone debugging within
the Turbo Pascal IDE, all public identifiers are supposed to be
included into the EXE file for debugging purposes. However, the
names of just about every routine from the SYSTEM unit are not
included, although the variables (such as HeapOrg) are included.
Among the few exceptions are the MemAvail and MaxAvail routines,
which are sometimes included into the debug information. This bug
is very annoying when programs are profiled with the Turbo Profiler
and one wants to know how much time the program spends in certain
SYSTEM routines. Also, when debugging programs with Turbo Debugger
one would rather like the disassembly to display a call as e.g.
CALL SYSTEM.LONGMUL instead of a cryptic CALL 152E:05B8. This makes
the disassembled code hard to follow. I therefore urge Borland to
assure correct inclusion of *all* public symbols in the debug
information generated by the /V switch.
21. Problem with AAM xx and AAD xx instructions when stepping/tracing
through inline ASM code with Turbo-Pascal's build-in debugger
The inline assembler correctly allows parameters with the AAM and
AAD instructions. Although this feature is not officially documented
by Intel, it works on all 80x86 processors and compatibles, such as
NEC's V30. The inline assembler will correctly assemble an instruction
like AAM 16, which is quite useful when one wants to print a number
in hexadecimal format. Turbo-Pascal's internal debugger does not
recognize AAM opcodes other than the plain AAM opcode. When stepping/
tracing through inline assembly code, it seems to skip the instruction,
causing the program to behave differently than in an ordinary run.
For example, the following instruction sequence will give 0505 in AX
when run on any 80x86, but will give 0055 in AX when stepped through
with Turbo-Pascal's internal debugger.
.
.
MOV AX, 0055h
AAM 16
. { AX should contain 0505h now }
Another example involving the AAD instruction will give 0066h in AX
when simply run, but AX will contain 00CEh when the code is stepped.
.
.
MOV AX, 0606h
AAD 16
. { AX should contain 0066h now }
22. Error in heap manager (GetMem, New)
Turbo-Pascal 6.0 allows memory allocation functions to allocate
data structures of more than 65528 bytes on the heap. Data
structures on the heap of size greater than 65528 bytes may
cause segment wrap-around, thereby destroying other data on the
heap or causing a general protection exception on processors
from the 80286 on upwards. This general protection exception
#GP(0) is triggered when a word is accessed at offset FFFFh in
a segment, even when the processor is in real mode. With no valid
#GP(0) handler present, the system will crash upon returning
from the INT 0Dh service routine since the exception has pushed
an error code *after* pushing the return address, which will not
be removed from the stack without a valid #GP(0) handler present
when the INT ODh executes it's IRET. 386 memory managers like
QEMM or the DOS-box of Windows in 386-Enhanced catch a #GP(0)
exception, but plain DOS, even with MS-DOS 5.0, crashes. The
following program illustrates the problems:
PROGRAM HeapBug;
TYPE SpcRecord = RECORD
W1: WORD;
W2: WORD;
B1: BYTE;
END;
SmallArray = ARRAY [1..8] OF CHAR;
BigArray = ARRAY [1..65535] OF CHAR;
SpcArray = ARRAY [1..13107] OF SpcRecord;
VAR P1 : ^SmallArray;
P2 : ^BigArray;
P3 : ^SpcArray;
Hptr: POINTER;
BEGIN
HPtr := HeapPtr; { save initial value of heap pointer }
WHILE HeapPtr = HPtr DO BEGIN
New (P1); { use up blocks in freelist }
END;
IF Ofs (HeapPtr^) <> 8 THEN
New (P1); { make sure large array will have ofs of 8 }
New (P2);
FillChar (P1^, 8, 'A'); { initialize 1st array }
FillChar (P2^, 65534, 'B');{ initialize 2nd array -> trashes 1st array }
IF P1^[6] <> 'A' THEN { chk if 1st array's integrity was violated }
WriteLn ('First array trashed!');
P3 := Pointer (P2);
P3^[13106].W2 := $55AA; { access at ofs FFFF causes #GP(0) on 80286 }
END.
The problem here is that 80x86 segments start at 16-byte boundaries
(paragraph boundaries), while allocation of data structures on the
heap is aligned at 8-byte boundaries. If a data structure in the
heap has a start address with an offset of 8 and is greater than
65528 bytes, accessing the very last bytes of that data structure
will cause undesired segment wrap around. Therefore, maximum allowed
allocation for data structures on the heap should be 65528 bytes.
23. Logical error in GRAPH.TextWidth function
The TextWidth function delivers uncorrect results when fonts
are scaled with the SetUserCharSize procedure. To compute the
width of the string passed to it, the TextWidth function adds
the width of all characters in the string. Depending on the
current setting of the Direction parameter within GRAPH the
resulting value is then multiplied and divided by either the
MultX and DivX or the MultY and DivY scaling factors. If these
scaling factors are not unity, this method will compute the
wrong text width. Since text justification using the OutTextXY
and SetTextJustify procedures relies on the TextWidth function
for computing the starting position for string output, this
output is not correctly justified. The TextWidth function, when
used with user supplied font scaling factors, usually returns
a width that is bigger than the actual width of the string. The
correct way to compute text width is to compute the actual size
of every character in the string using the scale factors supplied
by the user and add these values up. An example:
Suppose we want to compute the width of the string 'World'.
Assume that the unscaled width of the characters as taken from
the font information is 10, 7, 7, 5, 7, the output direction is
horizontal and that the scale factors are MultX = 5 and DivX = 8.
The current implementation of TextWidth would compute the width
as ((10+7+7+5+7) * 5) DIV 8 = (36 * 5) DIV 8 = 180 DIV 8 = 22.
A correct implementation however, would calculate the width as
follows: (10*5) DIV 8 + 3 * (7*5) DIV 8 + (5*5) DIV 8 = (6+12+3)
= 21. This version is correct since it uses character sizes as
used in the OutText and OutTextXY procedures.
24. Length of descender not taken into account by SetTextJustify
If text is to be written at the very bottom of the current
graphics window, one uses SetTextJustify (AnyMode, BottomText)
and OutTextXY (AnyX, ViewMaxY, AnyText) to accomplish that.
However, if the text contains letters that decend below the
base line for letters, descenders are outside the window and
clipped off. If one wants to output text in the manner described,
this is very annoying, since the programmer has to adjust the
Y-coordinate hinself according to the font size in effect. The
same problem occurs if text is to be written horizontally at
the very right of the graphics window. Obviously, the TextHeight
function used by the SetTextJustify procedure does not account
for descender length. To fix the problem described, justification
should be changed to account for the overall height of characters
including descenders.
25. "Snow" prevention fails on CGA due to unsafe algorithm
The internal DirectWrite routine of module CRT is designed to prevent
"snow" when writing directly to the CGA screen. However, a logical
error prevents that this snow-checking works 100% safe. The same
critisism applies to the WriteView method in module VIEWS of Turbo
Vision. The following is an excerpt from CRT.DirectWrite:
.
.
@@2: LODSB ; 1; get char
MOV BL,AL ; 2;
@@3: IN AL,DX ; 3; wait until out of current horiz. sync,if in
TEST AL,1 ; 4;
JNE @@3 ; 5;
CLI ; 6;
@@4: IN AL,DX ; 7; wait until next horiz. sync starts
TEST AL,1 ; 8;
JE @@4 ; 9;
MOV AX,BX ;10;
STOSW ;11; write to screen
STI ;12;
LOOP @@2 ;13;
.
.
If an interrupt occurs after line 3 and before line 6 in the
above code fragment, the program will *not* wait for the *start* of
the horizontal sync but only test if the CGA is *in* a horizontal
sync upon returning from the interrupt service routine. Since
horizontal sync allows only for the output of exactly one character
if output starts at the very beginning of horizontal sync, there
is a good chance that the above program writes to the screen after
the horizontal sync has been completed, thereby causing the CGA to
"snow". Of course, failure of the above code to prevent "snow" is
only noticeable in a system with very high interrupt rates e.g.
running serial communication as a background TSR. One additional
disadvantage of the above code is that it makes only use of the
horizontal sync period, although this is much shorter than the
vertical retrace period.
The following enhanced code is 100% safe to prevent snow and uses
the vertical and horizontal retrace periods. It has been tested on
an original IBM-CGA. Interrupt latency is only marginally higher than
with the original code and still allows to run interrupt driven
serial communication at the highest possible rate of 115000 baud.
DirectWrite:
CMP SI, DI ; start address = end address ?
JE EmptyStr ; yes, nothing to write
PUSH CX ; save
PUSH DX ; registers
PUSH DI ; that
PUSH DS ; must be
PUSH ES ; preserved
MOV CX, DI ; string end address
SUB CX, SI ; number of characters to write
MOV DL, CheckSnow ; get flag for snow check
MOV DH, TextAttr ; get current attribute
XOR AX, AX ; address BIOS data area
MOV DS, AX ; via segment 0
MOV AL, DS:CrtWidth+400h; width of scan line in current mode
MUL BH ; multiply by cursor y-position
XOR BH, BH ; clear hi-byte to prepare for
addition
ADD AX, BX ; add cursor x-position
ADD AX, AX ; two screen bytes for every
character
XCHG AX, DI ; offset into screen memory to DI
MOV AX, DS:Addr6845+400h; get 6845 base address
ADD AX, 6 ; 6845 status port
XCHG AX, DX ; AX = CheckSnow/TextAttr, DX = port
MOV BX, 0B800H ; screen segment for color modes
CMP DS:CrtMode+400h, 7 ; monochrome mode ?
JNE ColorMode ; no, one of the color modes
MOV BH, 0B0H ; screen at segment B000h if mono
ColorMode:PUSH ES ; address character string
POP DS ; via DS
MOV ES, BX ; extra segment addresses screen seg
CLD ; autoincrement for string instruct.
OR AL, AL ; CheckSnow = TRUE ? (AH=attribute)
JE OutLoop ; no, don't check for snow
WriteChr: LODSB ; get character to write, AH = attrib
XCHG AX, BX ; save character/attribute to write
WaitHor: CLI ; interrupts disturb critical timing
IN AL, DX ; read 6845 status
TEST AL, 8 ; in vertical retrace ?
JNZ WriteScr ; yes, it is safe to write to screen
TEST AL, 1 ; in horizontal retrace ?
JNZ WaitHor ; yes, wait until out of hor. retrace
WaitHor2: IN AL, DX ; read 6845 status
TEST AL, 1 ; horizontal or vertical retrace ?
JZ WaitHor2 ; no, wait until either kind of retr.
WriteScr: XCHG AX, BX ; in horiz. or vert. retrace: get ch
STOSW ; write character and attribute
STI ; interrupts ok now
LOOP WriteChr ; write next character until all thru
JMPS WriteDone ; screen write done
OutLoop: LODSB ; get character to write
STOSW ; write character and attribute
LOOP OutLoop ; until all characters printed
WriteDone:POP ES ; restore
POP DS ; destroyed
POP DI ; registers
POP DX
POP CX
EmptyStr: RET
26. GetDir doesn't report use of invalid drive number
The GetDir procedure should emit run time error 15 "Invalid
drive number" when passed an invalid drive number. However, the
procedure does not do the required check on the DOS return code
and therefore never raises run time error 15. Instead, it always
returns the String "X:\", where the X stands for any character
in the IBM character set. The bug can easily be fixed by adding
a few lines of code to the source module DIRH.ASM. The following
program will demonstrate the bug:
PROGRAM GetDirBug;
VAR DriveNr: INTEGER;
PathName: STRING;
BEGIN
REPEAT
Write ('Enter Drivenumber (try also numbers > 100, 99 exits): ');
ReadLn (DriveNr);
GetDir (DriveNr, PathName);
WriteLn('The path on drive ', DriveNr, ' is ', PathName);
UNTIL DriveNr = 99;
END. {GetDirBug}
27. Help bug
Context sensitive help (Ctrl-F1) for the predefined arrays Port
and PortW is missing. There was no help for these arrays in TP5.5
as well.
28. Problems with the file selector box in IDE
The history list of a file selector box contains only those
files that were selected entering the file name in the input
box, not those selected by double clicking the name in the
file list, which is the standard way to select a file if the
mouse is heavily used. Even when working mainly with the mouse
a history list is still useful, since the desired files may
be at the end of a file list 100 files long and one has to
get to the right part of the file list before being able to
double click the file name. By the way, this is also a problem
on the Apple Macintosh, since its file select boxes do not
have a history list feature at all. This can really be a pain
in the neck. It is therefore strongly recommended that all
files that have been selected with either method (that is, by
entering the name in the input box or by double clicking the
name in the file list) be put in the history list.
29. Possible problems in unit APP.PAS
APP.PAS contains a assembler function ISqr, that computes the
integral part of the square root of its integer argument. This
function has several shortcomings. First of all, it should more
appropriately named ISqrt. Then, for all arguments > 32760, it
will enter an endless loop. Finally it is not very fast, since
it makes use of the IMUL instruction. Unfortunately, it is not
clear to me, if the shortcomings pointed out cause any threat
to program integrity. If it is desirable to fix the function,
the following substitute could be used. It uses a more elegant
and faster algorithm and returns the correct result for all
positive INTEGERs. The code length is identical to the original
routine ISqr.
{ ISqrt (I) computes INT (SQRT (I)), that is, the integral part of the }
{ square root of integer I. It does not check for negative arguments. }
{ For all arguments 0..MaxInt the correct result is returned. The }
{ algorithm exploits the following property: }
{ n }
{ n**2 = Sigma (2i-1) }
{ i=1 }
FUNCTION ISqrt (I: INTEGER): INTEGER; ASSEMBLER;
ASM
MOV CX, I { load argument }
MOV AX, -1 { init result }
CWD { init odd numbers to -1 }
XOR BX, BX { init perfect squares to 0 }
@loop:INC AX { increment result }
INC DX { compute }
INC DX { next odd number }
ADD BX, DX { next perfect square }
CMP BX, CX { perfect square > argument ? }
JBE @loop { until square greater than argument }
END;
30. Poor performance of REAL type arithmetic
Although this does not constitute a real bug, an analysis of the
poor performance of the REAL type arithmetic will be given. The
rationale here is that a 'TURBO product' should also deliver
turbo performance wherever it can be achieved. One obvious example
that there is ample room for speed improvements is the REAL-Sqrt
function. It will take more time to compute the square root to 12
decimal places than the coprocessor emulator needs to compute the
function result to 19 decimal places. I feel that such a performance
is unacceptable. Unfortunately, there were no improvements in TP6.0
over TP 5.5.
Improvements are also possible in the LONGINT arithmetic, especially
the division, which will enjoy accelerations of factor four to six
(depending on the CPU) when coded using the DIV instruction.
Performance can be enhanced by careful register scheduling within
all routines, thus avoiding unnecessary memory accesses. This
measure will also reduce the overall instruction count for a routine.
Wherever possible, time saving CPU instructions such a MUL or DIV
should be used. This will vastly improve performance especially on
the 286, 386, and 486 CPUs. Most important is the choice of the
appropriate algorithm for each function. Tests show that the REAL
division uses the slowest out of four possible algorithms. This
clearly indicates that not much time was invested in finding short
but fast algorithms. On the other hand, the square rooting routine
uses a basically fast algorithm (Newton's iteration), but
obliterates it advantages by poor implementation. The trancendental
functions are based on polynomial approximations. It seems that no
care was taken to find the shortest and most accurate polynomials
possible. The speed advantages possible by a careful recoding of
the complete REAL arithmetic range from a few percent for simple
functions like LONGINT to REAL conversion to up to a factor of 20
for the Sqrt function.
31. Inefficient string handling
The string handling operations Insert, Delete, and Pos have
always been implemented in a very simple but quite ineffient
manner in Turbo-Pascal. There were no improvements in Turbo-
Pascal 6.0. Since an acceleration of 300% - 400% can be
achieved, this is hard to accept.
*** Note: The above mentioned improvements have been realized in a
replacement for the original SYSTEM.TPU. The source has been
made available to BORLAND, but will not be given here. The
library replacement (not the source though) is available
as TPL60NEW.ZIP via anonymous FTP from garbo@uwasa.fi
++++++++++++++++++++ Suggestions for enhancements ++++++++++++++++++++++++++
1. Suggested improvements for coprocessor / emulator arithmetic
The routine that patches the emulator interrupts (INT 34 to INT 3D)
back to coprocessor instructions at runtime if a coprocessor is
present always insert WAITs (9Bh) before the coprocessor instruction.
However, for all coprocessors except the 8087 these WAITs are
unnecessary, since the 287 and 387 synchronize with the CPU at
hardware level, using ports F0h thru FFh. These WAITs can therefore
be replaced by NOPs, resulting in somewhat faster code. Performance
improvements of up to 6% were observed with programs that make heavy
use of simple coprocessor instructions (linear equation solver) by
this simple change. A new routine, which does insert NOPs instead
of WAITs where approriate is presented here.
CODE SEGMENT BYTE PUBLIC 'CODE'
ASSUME CS:CODE
JMPS EQU <JMP SHORT>
;------------------------------------------------------------
; PATCH87 is the routine responsible for converting emulator
; interrupts back to coprocessor opcodes if a coprocessor is
; detected by the startup code.
;
; This routine is 1 byte shorter than the original one and has
; been enhanced to generate NOPs instead of WAITs before each
; coprocessor instruction when the coprocessor is a 287 or 387.
;
; INPUT: No input or output. The desired sideeffect is
; OUTPUT: patching the code at run-time.
;
; DESTROYS: -
;
; All rights reserved (c) 1988-1992 Norbert Juffa
;
; Borland is free to use this code if desired !
;-------------------------------------------------------------
JMPS EQU <JMP SHORT>
PATCH87 PROC FAR
PUSH BP ; save TURBO-Pascal framepointer
MOV BP, SP ; make new framepointer
PUSH AX ; save
PUSH SI ; destroyed
PUSH DS ; registers
TEST BYTE PTR [BP+7], 2; interrupts allowed before int ?
JZ $intdis ; no
STI ; yes, enable interrupts
$intdis:LDS SI, [BP+2] ; load return address
DEC SI ; point to int data
MOV AX, WORD PTR [SI] ; get interrupt number & data
DEC SI ; point to patch
SUB AL, 34h ; 34..3D --> 0..9
CMP AL, 9 ; interupt valid (between 0..9) ?
JA $invald ; invalid interrupt
JE $fwait ; interrupt $3D --> FWAIT
CMP AL, 8 ; interrupt $3C ?
JE $spcial ; yes, handle segment overrides
ADD AL, 0D8h ; new opcode
$tst286:MOV AH, AL ; second byte of opcode
MOV AL, 90h ; first byte is a nop
PUSH SP ; test if
POP BP ; 286 or
CMP SP, BP ; higher
JE $patch ; 286
MOV AL, 9Bh ; convert nop to wait
$patch: MOV WORD PTR [SI], AX ; store new opcode
MOV BP, SP ; address stack via BP
MOV WORD PTR [BP+8],SI; set new return address
$endptc:POP DS ; restore
POP SI ; destroyed
POP AX ; registers
POP BP ; restore TURBO-Pascal frameptr
IRET ; done
$fwait: MOV AX, 9B90h ; store FWAIT
JMPS $patch ; patch it in
$spcial:TEST AH, 20h ; bit 5 set indicates spec. func.
JNZ $invald ; not supported, invalid
MOV AL, AH ; generate
AND AX, 07C0h ; segment
SHR AL, 1 ; override
SHR AL, 1 ; byte
SHR AL, 1 ; and
XOR AL, 18h ; coprocessor
ADD AX, 0D826h ; opcode
MOV BYTE PTR [SI+2],AH; set new opcode
JMPS $tst286 ; put in new opcode
$invald:JMPS $endptc ; no error handling, ignore
PATCH87 ENDP
CODE ENDS
END
Another optimization could be performed if a program is
compiled in the $N+,E- mode. Since no emulator is used
anyhow, the compiler could give up generating emulator
interrupts and generate real coprocessors instructions
instead. On CPUs > 286 neither NOPs nor WAITs had to be
inserted before NDP instructions. This would save space
as well as time.
Those functions that use the Borland shortcut interrupt 3Eh
could test which NDP is present whenever this interrupt is
called. If Test8087 = 3, the enhanced instructions (e.g. FSIN,
FCOS) available on the 387/486/287XL could be executed. There
would be only minimum timing overhead, but vast performance
improvements on 386/486 machines. Since no elaborate argument
reduction schemes are necessary, the additional code would be
quite short.
The Borland shortcut interrupt provides some functions not
accessible from Turbo-Pascal 6.0. These functions are the tangent
Tan (subcode F0h), the dyadic logarithm Ld (subcode F6h), the
common logarithm Log (subcode F8h), power of two (subcode FCh),
and power of ten (subcode FEh). Tests show that these undocumented
functions are provided with a coprocessor as well as with the
emulator and are fully operational. These functions should be
made available to programmers through the SYSTEM unit and be
documented. Especially the Tan is quite useful since it only
takes 40% of the time of the equivalent construct Sin/Cos.
2. Inclusion of LOADALL in inline assemblervalid instructions
Since the undocumented AAM xx and AAD xx instructions are provided
by the inline assembler, the undocumented LOADALL instruction
(opcode 0F05h) could be provided as well when the compiler is in
$G+ mode. The Turbo-Debugger will correctly disassemble LOADALL.
3. Suggestions regarding 286 code generation feature ($G+)
Programs compiled with the $G+ switch will have reduced memory
requirements and will execute somewhat faster on a 286/386/486
CPU. Typically, memory and time savings will not execeed 2%.
Additionally, setting the $G switch on will allow the use of
real and protected mode 286 instructions. As explained in section
five of the README file, programs compiled with $G+ will not
check for the presence of a approriate processor at runtime. It
is strongly recommended that this behavior be changed. At least
two cases are known (one involving Borlands biggest competitor)
where programs were shipped that had been compiled with an 286
switch setting. Customers using them on PC type machines were
puzzled when they discovered that programs crashed on their systems
although they had performed flawlessly on their office computer.
Finally someone found the bug by tracing the program with a debugger.
To avoid such unpleasant confusion, programs compiled with $G+
should execute a short routine at startup to determine if an 286
or later processor is present. If this is not the case, it should
emit an error message and abort the program, just as programs
compiled with $N+,$E- abort if they fail to detect a coprocessor.
Since 286 real mode instructions can also be executed on NEC's
V20/V30 processors and on the 80186/188, it might be desirable
to have an 186 code generation feature. This would effectively
split the $G switch into two separate switches. No changes would
have to be made to the code generator, since it generates no 286
protected mode instructions. Thus, generated code would be the
same with either the 186 and 286 switches on. However, the inline
assembler would only recognize protected mode instructions when
the 286 switch is on. This would allow maximum utilization of the
286 real mode instructions and a run time check for the CPU at the
same time. Below is some code that can be used to distinguish between
8086/8088, 80188/186/V20/V30, and 80286/386/486.
;--------------------------------------------------------------------
; CPU_Test distinguishes between three groups of CPUs commonly used
; in computers and returns an associated code for each.
;
; OUTPUT: AX = 0 Group #0 may execute 8086 code only (8086/8088)
; AX = 1 Group #1 may additionally execute 286 real mode
; instructions (V20/V30, 80186/80188)
; AX = 2 Group #2 may additionally execute 286 protected
: mode instructions
;--------------------------------------------------------------------
CPU_Test PROC FAR
PUSH SP ; test updating
POP AX ; of stackpointer
CMP AX, SP ; stackpointer updated before push ?
JE @Grp2 ; no, must be 286, 386 or 486
CLC ; make sure carry clear
PUSHA ; PUSHA executed on 88/86 as JMP $+2
STC ; carry set if V20/V30 or 186/188
@8086: JC @Grp1 ; yes, its group #1
XOR AX, AX ; CPU is 8088/8086
RET ; done
@Grp1: POPA ; remove pushed bytes
MOV AX, 1 ; CPU is V20/V30 or 80186/80188
RET ; done
@Grp2: MOV AX, 2 ; CPU is 286/386/486
RET ; done
CPU_Test ENDP
4. Suggestions for enhancements in the code generator
4.1 Enhancing procedure entry/exit code in $G+ mode (286 code generation)
When a procedure/function does not use local variables, the
standard exit code in $G- mode is:
POP BP
RET
This is replaced by the following code in $G+ mode:
LEAVE
RET
However, for procedures/function that have no local variables,
it would be advantageous to always use the first sequence in
either mode, $G- and $G+. Although both sequences take the same
number of clock cylces on 286 and 386 processors, the first is
considerably faster on the 486. Since the code generator already
checks if no local variables are declared to generate optimized
entry code in $G+ mode, the optimized exit code could be
generated just as easily.
Although the use of the ENTER imm16, 0 instruction does produce
shorter code when a procedure/function has both, parameters and
local variables, the equivalent but longer (two or three byte more)
standard procedure entry code will execute faster than ENTER on
all Intel processors. Therefore, it should be considered if it is
really desirable to use ENTER at all. A lot of programs really
do run slower on a 386DX machine if compiled with $G+ instead of
$G-, as tests indicate.
processor | ENTER imm16, 0 | standard entry sequence
-----------+--------------------+------------------------
80286 | 11 clocks | 3 + 2 + 3 = 8 clocks
80386 | 10 clocks | 5 + 2 + 2 = 9 clocks
80486 | 14 clocks | 1 + 2 + 1 = 4 clocks
4.2 Optimizing entry code for non nested procedures without parameters
and local variables
If a procedure/function takes neither any parameters nor declares
any local variables and is not statically nested within another
procedure/function, there is no need for any entry code. Turbo
Pascal performs this optimization only for assembler procedures,
but skips it for normal procedures, probably so that nested and
non-nested procedures can use the same branch of the code
generator. The code generator could be enhanced to generate
procedure entry code only for those procedures that are either
statically nested (and thus have a hidden parameter, namely the
framepointer of the preceding procedure in the static chain),
take parameters, or declare local variables.
5. Suggestions for IDE
The status line for the edit mode should be enhanced to include the
shortcuts F5 Zoom and F6 Next. These additional hints will exactly
fit into the remaining space. When IDE is in the stepping/debugging
mode, shortcuts F4 Goto Cursor and Ctrl-F9 Run should be added to
the status line. This would accelerate debugging sessions, since
all program flow control could be excerted using simple mouse clicks
on the status line.
6. Suggestion regarding TURBO command-line options
There should be a help switch like /? or /Help on the Turbo-Pascal
Prorammer's Platform command line that displays a help screen
which describes the other command-line switches that are available
and explains what they will do.
Make REAL money with your website!
The entire AOH site is optimized to look best in Firefox® 2.0 on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2008 AOH
We do not send spam. If you have received spam bearing an artofhacking.com email address, please forward it with full headers to abuse@artofhacking.com.
