AOH :: JUFFA2.TXT

Norbert Juffa's Turbo Pascal 6.0 bug list 2 of 2


From: dmurdoch@watstat.waterloo.edu (Duncan Murdoch)
t.edu!wupost!cs.utexas.edu!utgpu!watserv1!watdragon!watstat.waterloo.edu!dmurdoch
Newsgroups: comp.lang.pascal
Subject: Re: Norbert Juffa's bug list (part 2 of 2)
Organization: University of Waterloo
.Message-ID: <1992Apr7.155831.23316@watdragon.waterloo.edu>
.Date: Tue, 7 Apr 1992 15:58:31 GMT




16. Incorrect assembly of certain JMPs and CALLs by inline assembler

     The inline assembler will incorrectly assemble certain JMPS and
     CALLs that are invalid and are rejected by the MASM and TASM
     assemblers. It will also incorrectly assemble JMPs and CALLs to
     destination declared with the ABSOLUTE directive. The bugs can
     be demonstrated by the following program:

     PROGRAM Jmp_Call;


    VAR AbsPointer: POINTER ABSOLUTE $1234:$5678;
         NormPointr: POINTER;

     PROCEDURE FarProc; FAR; ASSEMBLER;
     ASM
     END;

     PROCEDURE NearProc; NEAR; ASSEMBLER;
     ASM
     END;

     BEGIN
     ASM
        JMP  NEAR PTR AbsPointer   { This is illegal in MASM / TASM }
        JMP  FAR PTR AbsPointer    { incorrectly assembled by inline assmbl.! }
        JMP  AbsPointer            { This is illegal in MASM / TASM }
        JMP  NEAR PTR NormPointr   { This is illegal in MASM / TASM }
        JMP  FAR PTR NormPointr
        JMP  NormPointr
        JMP  NEAR PTR FarProc
        JMP  FAR PTR FarProc
        JMP  FarProc
        JMP  NEAR PTR NearProc
        JMP  FAR PTR NearProc
        JMP  NearProc
        CALL NEAR PTR AbsPointer   { This is illegal in MASM / TASM }
        CALL FAR PTR AbsPointer    { incorrectly assembled by inline assmbl.! }
        CALL AbsPointer            { This is illegal in MASM / TASM }
        CALL NEAR PTR NormPointr   { This is illegal in MASM / TASM }
        CALL FAR PTR NormPointr
        CALL NormPointr
        CALL NEAR PTR FarProc
        CALL FAR PTR FarProc
        CALL FarProc
        CALL NEAR PTR NearProc
        CALL FAR PTR NearProc
        CALL NearProc
     END;
     END.


     The instructions marked as "illegal in MASM / TASM" should be
     flagged as errors by the inline assembler. "JMP NEAR PTR AbsPointer"
     and "JMP NEAR PTR NormPointr" are near jumps to a different CS,
     and in "JMP AbsPointer", AbsPointer can't be addressed with the
     currently assumed registers. The same remarks apply to the
     equivalent CALL statements in the source. Consequently, the
     inline assembler produces garbage for the illegal statements.
     "JMP FAR PTR AbsPointer" should assemble to "JMP 1234:5678", but
     the inline assembler produces something very different. Instead
     of the absolute segment 1234h it uses the value of CS and in
     addition mangles the offset value. The assembly language program
     below, which is equivalent to the above PASCAL program, shows
     that "JMP FAR PTR AbsPointer" and "CALL FAR PTR AbsPointer" can
     be assembled correctly by TASM / MASM, so the inline assembler
     should do this as well.


                DOSSEG

     AbsSeg     SEGMENT AT 1234h
                ORG     5678H
     AbsPointer DD      ?
     AbsSeg     ENDS


     DATA       SEGMENT WORD PUBLIC 'DATA'
                ASSUME  DS:DATA
     NormPointr DD      ?
     DATA       ENDS


     CODE       SEGMENT BYTE PUBLIC 'CODE'
                ASSUME  CS:CODE, DS:DATA

     FarProc    PROC    FAR
                RET
     FarProc    ENDP

     NearProc   PROC    NEAR
                RET
     NearProc   ENDP

     Main:      MOV     AX, SEG (NormPointr)
                MOV     DS, AX
     ;          JMP     NEAR PTR AbsPointer ; error !
                JMP     FAR PTR AbsPointer
     ;          JMP     AbsPointer          ; error !
     ;          JMP     NEAR PTR NormPointr ; error !
                JMP     FAR PTR NormPointr
                JMP     NormPointr
                JMP     NEAR PTR FarProc
                JMP     FAR PTR FarProc
                JMP     FarProc
                JMP     NEAR PTR NearProc
                JMP     FAR PTR NearProc
                JMP     NearProc
     ;          CALL    NEAR PTR AbsPointer ; error !
                CALL    FAR PTR AbsPointer
     ;          CALL    AbsPointer          ; error !
     ;          CALL    NEAR PTR NormPointr ; error !
                CALL    FAR PTR NormPointr
                CALL    NormPointr
                CALL    NEAR PTR FarProc
                CALL    FAR PTR FarProc
                CALL    FarProc
                CALL    NEAR PTR NearProc
                CALL    FAR PTR NearProc
                CALL    NearProc

     CODE       ENDS


     STACK      SEGMENT STACK
                DB      100h DUP (?)
     STACK      ENDS

                END     MAIN


17. Other bugs in the inline assembler (ASM directive)

     Several instructions that have a format with an immediate operand
     support senseless or incorrect ranges for the immediate value. The
     IN, OUT, and INT instructions will accept values between -128 and
     255. Since negative values make no sense here, the possible range
     should be restricted to 0 to 255. The ENTER instruction will also
     take negative arguments. Again, this is not a very sensible choice.
     How are -5 bytes reserved for local variables? The allowed range
     for the arguments should be 0 to 255 and 0 to 65535, respectively.
     Although it is not officially documented by Intel, the AAM and AAD
     instructions may take additional arguments that indicate the base
     on which to perform the conversion. This is supported by the inline
     assembler. However, it accepts arguments between -128 and 127 while
     it should accept bases between 0 and 255, since the bases available
     with AAM and AAD must be positive.

     The inline assembler performs no check on the index in
     the stack top relative addressing mode of the coprocessor.
     Very large or even negative values are allowed. For example,
     FADD ST, ST(123456) will be accepted as perfectly legal.
     This must be fixed to make sure the index is between zero and
     seven.

     There is no way to code an absolute far jump such as JMP F000:FFF0
     (to perform a warm start). The same restriction applies to far
     calls. In a conventional assembler such a jump could be coded as
     follows:

     BIOS     SEGMENT AT 0F000h
     ORG      0FFF0h
     Restart  LABEL FAR
     BIOS     ENDS

     CODE     SEGMENT BYTE PUBLIC 'CODE'
     JMP      Restart
     CODE     ENDS

     There are no segment declarations available with the inline
     assembler, so the jump has to be either hand coded with DBs or
     changed to a memory indirect far jump using a appropriately
     initialized pointer. This problem should be documented in
     the Turbo-Pascal manuals.

     One of the standard syntax available with the IMUL instruction is
     not accepted by the inline assembler. IMUL reg16, immed8 is not
     allowed, rather the inline assembler expects this to be coded as
     IMUL reg16, reg16, immed8, where the two registers are identical.
     The IMUL reg16, immed8 is commonly accepted by assemblers and is
     also listed in Intel's documentation. Therefore, this syntax should
     be supported by the inline assembler.

     Some 286 protected mode instructions (LLDT, LMSW, LTR, SMSW, VERR,
     VERW) when used with memory operands require the use of the PTR
     directive to establish operand size with an untyped operand (e.g.,
     [BX+SI]). Two other instruction, SGDT and SIDT, do not require the
     use of PTR in these cases. This usage is inconsistent. Since the
     operand size can be deduced from the instructions itself (just as
     can be done in the case of a MOV AX, [BX]) no PTR directive at all
     should be required. Likewise, the POP mem16 instruction should not
     require a WORD PTR directive with an untyped memory operand, since
     memory operand size is obvious from the instruction.



18. Errors / problems / documentation deficiencies using coprocessor
     instructions with the new ASM directive of TP 6.0

     In the $G+,N+ compiler mode the inline assembler does not
     assemble coprocessor instructions into emulator interrupts
     regardless of the $E switch setting. Instead it always generates
     optimized coprocessor instructions (without inserted WAITs).
     This causes programs compiled with $G+,N+,E+ to fail if no
     coprocessor is present. The assembler must ensure that the
     $E switch is off before performing this optimization.

     Coprocessor instructions in the no-wait form (e.g. FNINIT,
     FNSTSW, FNSTCW, FNSTENV, FNCLEX) are not encoded into emulator
     interrupts, since it makes no sense to use them with the
     emulator which cannot work in parallel with the CPU. This
     may lead to problems if programmers are not aware of the
     fact that these instructions will have absolutely no effect
     in an emulator environment. Since it is desirable to have the
     no-wait instructions available, programmers should be warned
     by the documentation not to use them in programs or routines
     that may be executed by the emulator or to explicitly code
     around this problem by using the system variable Test8087.

     An example of a work around solution follows.

     ASM
        .                         { some other code }
        .
        CMP   Test8087, 0         { coprocessor present ? }
        JNE   @Emulate            { no, do specific code for emulator}
        FNINIT                    { can be safely used with 8087 }
        JMP   @Continue           { skip emulator code }

        @Emulate:
        FINIT                     { this can be emulated }

        @Continue:                { continue with more code }
        .
        .
     END;



19. Inconsistent error messages emitted by inline assembler

     When constants that are out of range are supplied to assembler
     instructions that take some kind of immediate operand, two different
     error messages are emitted depending on the type of the destination
     operand. If the destination operand is a byte operand, as in
     ADD AL, 256 the compilation will result in error 155, 'Invalid
     combination of opcode and operands'. However, if the destination
     is a word operand as in ADD AX, 65536 the resulting error will be
     #76, 'Constant out of range'. This discrepancy should be resolved
     by always emitting the 'Constant out of range' error when an
     immediate value is not within the specified limits called for by the
     destination operand.

     Instructions that require one of their operands to be a memory
     reference (BOUND, LDS, LES, LEA, SGDT, SIDT, LGDT, LIDT) should
     cause compile error 156 (memory reference expected) to be emitted
     when a register is supplied instead of a memory reference. This
     will give a more detailed description of the error than the
     currently used error 155 (invalid combination of opcode and
     operand).

     There are space saving sign extending encodings available for OR,
     AND, and XOR instructions that the inline assembler fails to use.
     These encodings are the equivalents of the sign extending encodings
     used with the ADD, ADC, SUB, SBB, and CMP instructions. A list of
     the additional instructions follows:

     Instruction          | Encoding
     ---------------------+-------------------------------------------
     OR  reg16, const8    | 83   mod 001 r/m    data8
     OR  mem16, const8    | 83   mod 001 r/m   (disp)   (disp)   data8
     AND reg16, const8    | 83   mod 100 r/m    data8
     AND mem16, const8    | 83   mod 100 r/m   (disp)   (disp)   data8
     XOR reg16, const8    | 83   mod 110 r/m    data8
     XOR mem16, const8    | 83   mod 110 r/m   (disp)   (disp)   data8


20. Compiler Switch /V doesn't export names of SYSTEM routines

     When using the /V of TPC or choosing standalone debugging within
     the Turbo Pascal IDE, all public identifiers are supposed to be
     included into the EXE file for debugging purposes. However, the
     names of just about every routine from the SYSTEM unit are not
     included, although the variables (such as HeapOrg) are included.
     Among the few exceptions are the MemAvail and MaxAvail routines,
     which are sometimes included into the debug information. This bug
     is very annoying when programs are profiled with the Turbo Profiler
     and one wants to know how much time the program spends in certain
     SYSTEM routines. Also, when debugging programs with Turbo Debugger
     one would rather like the disassembly to display a call as e.g.
     CALL SYSTEM.LONGMUL instead of a cryptic CALL 152E:05B8. This makes
     the disassembled code hard to follow. I therefore urge Borland to
     assure correct inclusion of *all* public symbols in the debug
     information generated by the /V switch.



21. Problem with AAM xx and AAD xx instructions when stepping/tracing
    through inline ASM code with Turbo-Pascal's build-in debugger

     The inline assembler correctly allows parameters with the AAM and
     AAD instructions. Although this feature is not officially documented
     by Intel, it works on all 80x86 processors and compatibles, such as
     NEC's V30. The inline assembler will correctly assemble an instruction
     like AAM 16, which is quite useful when one wants to print a number
     in hexadecimal format. Turbo-Pascal's internal debugger does not
     recognize AAM opcodes other than the plain AAM opcode. When stepping/
     tracing through inline assembly code, it seems to skip the instruction,
     causing the program to behave differently than in an ordinary run.
     For example, the following instruction sequence will give 0505 in AX
     when run on any 80x86, but will give 0055 in AX when stepped through
     with Turbo-Pascal's internal debugger.
     .
     .
     MOV  AX, 0055h
     AAM  16
     .                   { AX should contain 0505h now }

     Another example involving the AAD instruction will give 0066h in AX
     when simply run, but AX will contain 00CEh when the code is stepped.
     .
     .
     MOV  AX, 0606h
     AAD  16
     .                   { AX should contain 0066h now }



22. Error in heap manager (GetMem, New)

     Turbo-Pascal 6.0 allows memory allocation functions to allocate
     data structures of more than 65528 bytes on the heap. Data
     structures on the heap of size greater than 65528 bytes may
     cause segment wrap-around, thereby destroying other data on the
     heap or causing a general protection exception on processors
     from the 80286 on upwards. This general protection exception
     #GP(0) is triggered when a word is accessed at offset FFFFh in
     a segment, even when the processor is in real mode. With no valid
     #GP(0) handler present, the system will crash upon returning
     from the INT 0Dh service routine since the exception has pushed
     an error code *after* pushing the return address, which will not
     be removed from the stack without a valid #GP(0) handler present
     when the INT ODh executes it's IRET. 386 memory managers like
     QEMM or the DOS-box of Windows in 386-Enhanced catch a #GP(0)
     exception, but plain DOS, even with MS-DOS 5.0, crashes. The
     following program illustrates the problems:


     PROGRAM HeapBug;

     TYPE SpcRecord  = RECORD
                          W1: WORD;
                          W2: WORD;
                          B1: BYTE;
                       END;
          SmallArray = ARRAY [1..8] OF CHAR;
          BigArray   = ARRAY [1..65535] OF CHAR;
          SpcArray   = ARRAY [1..13107] OF SpcRecord;


     VAR P1  : ^SmallArray;
         P2  : ^BigArray;
         P3  : ^SpcArray;
         Hptr: POINTER;

     BEGIN
        HPtr := HeapPtr;           { save initial value of heap pointer }
        WHILE HeapPtr = HPtr DO BEGIN
           New (P1);               { use up blocks in freelist }
        END;
        IF Ofs (HeapPtr^) <> 8 THEN
           New (P1);               { make sure large array will have ofs of 8 }
        New (P2);
        FillChar (P1^, 8, 'A');    { initialize 1st array }
        FillChar (P2^, 65534, 'B');{ initialize 2nd array -> trashes 1st array }
        IF P1^[6] <> 'A' THEN      { chk if 1st array's integrity was violated }
           WriteLn ('First array trashed!');
        P3 := Pointer (P2);
        P3^[13106].W2 := $55AA;    { access at ofs FFFF causes #GP(0) on 80286 }
     END.

     The problem here is that 80x86 segments start at 16-byte boundaries
     (paragraph boundaries), while allocation of data structures on the
     heap is aligned at 8-byte boundaries. If a data structure in the
     heap has a start address with an offset of 8 and is greater than
     65528 bytes, accessing the very last bytes of that data structure
     will cause undesired segment wrap around. Therefore, maximum allowed
     allocation for data structures on the heap should be 65528 bytes.



23. Logical error in GRAPH.TextWidth function

     The TextWidth function delivers uncorrect results when fonts
     are scaled with the SetUserCharSize procedure. To compute the
     width of the string passed to it, the TextWidth function adds
     the width of all characters in the string. Depending on the
     current setting of the Direction parameter within GRAPH the
     resulting value is then multiplied and divided by either the
     MultX and DivX or the MultY and DivY scaling factors. If these
     scaling factors are not unity, this method will compute the
     wrong text width. Since text justification using the OutTextXY
     and SetTextJustify procedures relies on the TextWidth function
     for computing the starting position for string output, this
     output is not correctly justified. The TextWidth function, when
     used with user supplied font scaling factors, usually returns
     a width that is bigger than the actual width of the string. The
     correct way to compute text width is to compute the actual size
     of every character in the string using the scale factors supplied
     by the user and add these values up. An example:

     Suppose we want to compute the width of the string 'World'.
     Assume that the unscaled width of the characters as taken from
     the font information is 10, 7, 7, 5, 7, the output direction is
     horizontal and that the scale factors are MultX = 5 and DivX = 8.
     The current implementation of TextWidth would compute the width
     as ((10+7+7+5+7) * 5) DIV 8 = (36 * 5) DIV 8 = 180 DIV 8 = 22.
     A correct implementation however, would calculate the width as
     follows: (10*5) DIV 8 + 3 * (7*5) DIV 8 + (5*5) DIV 8 = (6+12+3)
     = 21. This version is correct since it uses character sizes as
     used in the OutText and OutTextXY procedures.



24. Length of descender not taken into account by SetTextJustify

     If text is to be written at the very bottom of the current
     graphics window, one uses SetTextJustify (AnyMode, BottomText)
     and OutTextXY (AnyX, ViewMaxY, AnyText) to accomplish that.
     However, if the text contains letters that decend below the
     base line for letters, descenders are outside the window and
     clipped off. If one wants to output text in the manner described,
     this is very annoying, since the programmer has to adjust the
     Y-coordinate hinself according to the font size in effect. The
     same problem occurs if text is to be written horizontally at
     the very right of the graphics window. Obviously, the TextHeight
     function used by the SetTextJustify procedure does not account
     for descender length. To fix the problem described, justification
     should be changed to account for the overall height of characters
     including descenders.




25. "Snow" prevention fails on CGA due to unsafe algorithm

     The internal DirectWrite routine of module CRT is designed to prevent
     "snow" when writing directly to the CGA screen. However, a logical
     error prevents that this snow-checking works 100% safe. The same
     critisism applies to the WriteView method in module VIEWS of Turbo
     Vision. The following is an excerpt from CRT.DirectWrite:

     .
     .
     @@2:    LODSB             ; 1; get char
             MOV     BL,AL     ; 2;
     @@3:    IN      AL,DX     ; 3; wait until out of current horiz. sync,if in
             TEST    AL,1      ; 4;
             JNE     @@3       ; 5;
             CLI               ; 6;
     @@4:    IN      AL,DX     ; 7; wait until next horiz. sync starts
             TEST    AL,1      ; 8;
             JE      @@4       ; 9;
             MOV     AX,BX     ;10;
             STOSW             ;11; write to screen
             STI               ;12;
             LOOP    @@2       ;13;
     .
     .


     If an interrupt occurs after line 3 and before line 6 in the
     above code fragment, the program will *not* wait for the *start* of
     the horizontal sync but only test if the CGA is *in* a horizontal
     sync upon returning from the interrupt service routine. Since
     horizontal sync allows only for the output of exactly one character
     if output starts at the very beginning of horizontal sync, there
     is a good chance that the above program writes to the screen after
     the horizontal sync has been completed, thereby causing the CGA to
     "snow". Of course, failure of the above code to prevent "snow" is
     only noticeable in a system with very high interrupt rates e.g.
     running serial communication as a background TSR. One additional
     disadvantage of the above code is that it makes only use of the
     horizontal sync period, although this is much shorter than the
     vertical retrace period.

     The following enhanced code is 100% safe to prevent snow and uses
     the vertical and horizontal retrace periods. It has been tested on
     an original IBM-CGA. Interrupt latency is only marginally higher than
     with the original code and still allows to run interrupt driven
     serial communication at the highest possible rate of 115000 baud.

     DirectWrite:

               CMP     SI, DI              ; start address = end address ?
               JE      EmptyStr            ; yes, nothing to write
               PUSH    CX                  ; save
               PUSH    DX                  ;  registers
               PUSH    DI                  ;   that
               PUSH    DS                  ;    must be
               PUSH    ES                  ;     preserved
               MOV     CX, DI              ; string end address
               SUB     CX, SI              ; number of characters to write
               MOV     DL, CheckSnow       ; get flag for snow check
               MOV     DH, TextAttr        ; get current attribute
               XOR     AX, AX              ; address BIOS data area
               MOV     DS, AX              ;  via segment 0
               MOV     AL, DS:CrtWidth+400h; width of scan line in current mode
               MUL     BH                  ; multiply by cursor y-position
               XOR     BH, BH              ; clear hi-byte to prepare for
addition
               ADD     AX, BX              ; add cursor x-position
               ADD     AX, AX              ; two screen bytes for every
character
               XCHG    AX, DI              ; offset into screen memory to DI
               MOV     AX, DS:Addr6845+400h; get 6845 base address
               ADD     AX, 6               ; 6845 status port
               XCHG    AX, DX              ; AX = CheckSnow/TextAttr, DX = port
               MOV     BX, 0B800H          ; screen segment for color modes
               CMP     DS:CrtMode+400h, 7  ; monochrome mode ?
               JNE     ColorMode           ; no, one of the color modes
               MOV     BH, 0B0H            ; screen at segment B000h if mono
     ColorMode:PUSH    ES                  ; address character string
               POP     DS                  ;  via DS
               MOV     ES, BX              ; extra segment addresses screen seg
               CLD                         ; autoincrement for string instruct.
               OR      AL, AL              ; CheckSnow = TRUE ? (AH=attribute)
               JE      OutLoop             ; no, don't check for snow
     WriteChr: LODSB                       ; get character to write, AH = attrib
               XCHG    AX, BX              ; save character/attribute to write
     WaitHor:  CLI                         ; interrupts disturb critical timing
               IN      AL, DX              ; read 6845 status
               TEST    AL, 8               ; in vertical retrace ?
               JNZ     WriteScr            ; yes, it is safe to write to screen
               TEST    AL, 1               ; in horizontal retrace ?
               JNZ     WaitHor             ; yes, wait until out of hor. retrace
     WaitHor2: IN      AL, DX              ; read 6845 status
               TEST    AL, 1               ; horizontal or vertical retrace ?
               JZ      WaitHor2            ; no, wait until either kind of retr.
     WriteScr: XCHG    AX, BX              ; in horiz. or vert. retrace: get ch
               STOSW                       ; write character and attribute
               STI                         ; interrupts ok now
               LOOP    WriteChr            ; write next character until all thru
               JMPS    WriteDone           ; screen write done
     OutLoop:  LODSB                       ; get character to write
               STOSW                       ; write character and attribute
               LOOP    OutLoop             ; until all characters printed
     WriteDone:POP     ES                  ; restore
               POP     DS                  ;  destroyed
               POP     DI                  ;   registers
               POP     DX
               POP     CX
     EmptyStr: RET




26. GetDir doesn't report use of invalid drive number

     The GetDir procedure should emit run time error 15 "Invalid
     drive number" when passed an invalid drive number. However, the
     procedure does not do the required check on the DOS return code

     and therefore never raises run time error 15. Instead, it always
     returns the String "X:\", where the X stands for any character
     in the IBM character set. The bug can easily be fixed by adding
     a few lines of code to the source module DIRH.ASM. The following
     program will demonstrate the bug:


     PROGRAM GetDirBug;

     VAR DriveNr:  INTEGER;
         PathName: STRING;

     BEGIN
        REPEAT
           Write  ('Enter Drivenumber (try also numbers > 100, 99 exits): ');
           ReadLn (DriveNr);
           GetDir (DriveNr, PathName);
           WriteLn('The path on drive ', DriveNr, ' is ', PathName);
        UNTIL DriveNr = 99;
     END. {GetDirBug}




27. Help bug

     Context sensitive help (Ctrl-F1) for the predefined arrays Port
     and PortW is missing. There was no help for these arrays in TP5.5
     as well.



28. Problems with the file selector box in IDE

     The history list of a file selector box contains only those
     files that were selected entering the file name in the input
     box, not those selected by double clicking the name in the
     file list, which is the standard way to select a file if the
     mouse is heavily used. Even when working mainly with the mouse
     a history list is still useful, since the desired files may
     be at the end of a file list 100 files long and one has to
     get to the right part of the file list before being able to
     double click the file name. By the way, this is also a problem
     on the Apple Macintosh, since its file select boxes do not
     have a history list feature at all. This can really be a pain
     in the neck. It is therefore strongly recommended that all
     files that have been selected with either method (that is, by
     entering the name in the input box or by double clicking the
     name in the file list) be put in the history list.



29. Possible problems in unit APP.PAS

     APP.PAS contains a assembler function ISqr, that computes the
     integral part of the square root of its integer argument. This
     function has several shortcomings. First of all, it should more
     appropriately named ISqrt. Then, for all arguments > 32760, it
     will enter an endless loop. Finally it is not very fast, since
     it makes use of the IMUL instruction. Unfortunately, it is not
     clear to me, if the shortcomings pointed out cause any threat
     to program integrity. If it is desirable to fix the function,
     the following substitute could be used. It uses a more elegant
     and faster algorithm and returns the correct result for all
     positive INTEGERs. The code length is identical to the original
     routine ISqr.

     { ISqrt (I) computes INT (SQRT (I)), that is, the integral part of the }
     { square root of integer I. It does not check for negative arguments.  }
     { For all arguments 0..MaxInt the correct result is returned. The      }
     { algorithm exploits the following property:                           }
     {          n                                                           }
     {  n**2 =  Sigma (2i-1)                                                }
     {          i=1                                                         }

     FUNCTION ISqrt (I: INTEGER): INTEGER; ASSEMBLER;

     ASM
           MOV   CX, I   { load argument }
           MOV   AX, -1  { init result }
           CWD           { init odd numbers to -1 }
           XOR   BX, BX  { init perfect squares to 0 }
     @loop:INC   AX      { increment result }
           INC   DX      { compute }
           INC   DX      {  next odd number }
           ADD   BX, DX  { next perfect square }
           CMP   BX, CX  { perfect square > argument ? }
           JBE   @loop   { until square greater than argument }
     END;




30. Poor performance of REAL type arithmetic

     Although this does not constitute a real bug, an analysis of the
     poor performance of the REAL type arithmetic will be given. The
     rationale here is that a 'TURBO product' should also deliver
     turbo performance wherever it can be achieved. One obvious example
     that there is ample room for speed improvements is the REAL-Sqrt
     function. It will take more time to compute the square root to 12
     decimal places than the coprocessor emulator needs to compute the
     function result to 19 decimal places. I feel that such a performance
     is unacceptable. Unfortunately, there were no improvements in TP6.0
     over TP 5.5.

     Improvements are also possible in the LONGINT arithmetic, especially
     the division, which will enjoy accelerations of factor four to six
     (depending on the CPU) when coded using the DIV instruction.

     Performance can be enhanced by careful register scheduling within
     all routines, thus avoiding unnecessary memory accesses. This
     measure will also reduce the overall instruction count for a routine.
     Wherever possible, time saving CPU instructions such a MUL or DIV

     should be used. This will vastly improve performance especially on
     the 286, 386, and 486 CPUs. Most important is the choice of the
     appropriate algorithm for each function. Tests show that the REAL
     division uses the slowest out of four possible algorithms. This
     clearly indicates that not much time was invested in finding short
     but fast algorithms. On the other hand, the square rooting routine
     uses a basically fast algorithm (Newton's iteration), but
     obliterates it advantages by poor implementation. The trancendental
     functions are based on polynomial approximations. It seems that no
     care was taken to find the shortest and most accurate polynomials
     possible. The speed advantages possible by a careful recoding of
     the complete REAL arithmetic range from a few percent for simple
     functions like LONGINT to REAL conversion to up to a factor of 20
     for the Sqrt function.



31. Inefficient string handling

     The string handling operations Insert, Delete, and Pos have
     always been implemented in a very simple but quite ineffient
     manner in Turbo-Pascal. There were no improvements in Turbo-
     Pascal 6.0. Since an acceleration of 300% - 400% can be
     achieved, this is hard to accept.


***  Note: The above mentioned improvements have been realized in a
           replacement for the original SYSTEM.TPU. The source has been
           made available to BORLAND, but will not be given here. The
           library replacement (not the source though) is available
           as TPL60NEW.ZIP via anonymous FTP from garbo@uwasa.fi




++++++++++++++++++++ Suggestions for enhancements ++++++++++++++++++++++++++


1. Suggested improvements for coprocessor / emulator arithmetic

     The routine that patches the emulator interrupts (INT 34 to INT 3D)
     back to coprocessor instructions at runtime if a coprocessor is
     present always insert WAITs (9Bh) before the coprocessor instruction.
     However, for all coprocessors except the 8087 these WAITs are
     unnecessary, since the 287 and 387 synchronize with the CPU at
     hardware level, using ports F0h thru FFh. These WAITs can therefore
     be replaced by NOPs, resulting in somewhat faster code. Performance
     improvements of up to 6% were observed with programs that make heavy
     use of simple coprocessor instructions (linear equation solver) by
     this simple change. A new routine, which does insert NOPs instead
     of WAITs where approriate is presented here.


     CODE    SEGMENT BYTE PUBLIC 'CODE'

             ASSUME  CS:CODE

     JMPS    EQU     <JMP SHORT>

     ;------------------------------------------------------------
     ; PATCH87 is the routine responsible for converting emulator
     ; interrupts back to coprocessor opcodes if a coprocessor is
     ; detected by the startup code.
     ;
     ; This routine is 1 byte shorter than the original one and has
     ; been enhanced to generate NOPs instead of WAITs before each
     ; coprocessor instruction when the coprocessor is a 287 or 387.
     ;
     ; INPUT:     No input or output. The desired sideeffect is
     ; OUTPUT:    patching the code at run-time.
     ;
     ; DESTROYS:  -
     ;
     ; All rights reserved (c) 1988-1992 Norbert Juffa
     ;
     ; Borland is free to use this code if desired !
     ;-------------------------------------------------------------

             JMPS  EQU  <JMP SHORT>

     PATCH87 PROC  FAR
             PUSH  BP                ; save TURBO-Pascal framepointer
             MOV   BP, SP            ; make new framepointer
             PUSH  AX                ; save
             PUSH  SI                ;  destroyed
             PUSH  DS                ;   registers
             TEST  BYTE PTR [BP+7], 2; interrupts allowed before int ?
             JZ    $intdis           ; no
             STI                     ; yes, enable interrupts
     $intdis:LDS   SI, [BP+2]        ; load return address
             DEC   SI                ; point to int data
             MOV   AX, WORD PTR [SI] ; get interrupt number & data
             DEC   SI                ; point to patch
             SUB   AL, 34h           ; 34..3D --> 0..9
             CMP   AL, 9             ; interupt valid (between 0..9) ?
             JA    $invald           ; invalid interrupt
             JE    $fwait            ; interrupt $3D --> FWAIT
             CMP   AL, 8             ; interrupt $3C ?
             JE    $spcial           ; yes, handle segment overrides
             ADD   AL, 0D8h          ; new opcode
     $tst286:MOV   AH, AL            ; second byte of opcode
             MOV   AL, 90h           ; first byte is a nop
             PUSH  SP                ; test if
             POP   BP                ;  286 or
             CMP   SP, BP            ;   higher
             JE    $patch            ; 286
             MOV   AL, 9Bh           ; convert nop to wait
     $patch: MOV   WORD PTR [SI], AX ; store new opcode
             MOV   BP, SP            ; address stack via BP
             MOV   WORD PTR [BP+8],SI; set new return address
     $endptc:POP   DS                ; restore
             POP   SI                ;  destroyed
             POP   AX                ;   registers
             POP   BP                ; restore TURBO-Pascal frameptr
             IRET                    ; done
     $fwait: MOV   AX, 9B90h         ; store FWAIT
             JMPS  $patch            ; patch it in
     $spcial:TEST  AH, 20h           ; bit 5 set indicates spec. func.
             JNZ   $invald           ; not supported, invalid
             MOV   AL, AH            ; generate
             AND   AX, 07C0h         ;  segment
             SHR   AL, 1             ;   override
             SHR   AL, 1             ;    byte
             SHR   AL, 1             ;     and
             XOR   AL, 18h           ;      coprocessor
             ADD   AX, 0D826h        ;       opcode
             MOV   BYTE PTR [SI+2],AH; set new opcode
             JMPS  $tst286           ; put in new opcode
     $invald:JMPS  $endptc           ; no error handling, ignore
     PATCH87 ENDP

     CODE    ENDS

     END


     Another optimization could be performed if a program is
     compiled in the $N+,E- mode. Since no emulator is used
     anyhow, the compiler could give up generating emulator
     interrupts and generate real coprocessors instructions
     instead. On CPUs > 286 neither NOPs nor WAITs had to be
     inserted before NDP instructions. This would save space
     as well as time.

     Those functions that use the Borland shortcut interrupt 3Eh
     could test which NDP is present whenever this interrupt is
     called. If Test8087 = 3, the enhanced instructions (e.g. FSIN,
     FCOS) available on the 387/486/287XL could be executed. There
     would be only minimum timing overhead, but vast performance
     improvements on 386/486 machines. Since no elaborate argument
     reduction schemes are necessary, the additional code would be
     quite short.

     The Borland shortcut interrupt provides some functions not
     accessible from Turbo-Pascal 6.0. These functions are the tangent
     Tan (subcode F0h), the dyadic logarithm Ld (subcode F6h), the
     common logarithm Log (subcode F8h), power of two (subcode FCh),
     and power of ten (subcode FEh). Tests show that these undocumented
     functions are provided with a coprocessor as well as with the
     emulator and are fully operational. These functions should be
     made available to programmers through the SYSTEM unit and be
     documented. Especially the Tan is quite useful since it only
     takes 40% of the time of the equivalent construct Sin/Cos.



2. Inclusion of LOADALL in inline assemblervalid instructions

     Since the undocumented AAM xx and AAD xx instructions are provided
     by the inline assembler, the undocumented LOADALL instruction
     (opcode 0F05h) could be provided as well when the compiler is in
     $G+ mode. The Turbo-Debugger will correctly disassemble LOADALL.



3. Suggestions regarding 286 code generation feature ($G+)

     Programs compiled with the $G+ switch will have reduced memory
     requirements and will execute somewhat faster on a 286/386/486
     CPU. Typically, memory and time savings will not execeed 2%.
     Additionally, setting the $G switch on will allow the use of
     real and protected mode 286 instructions. As explained in section
     five of the README file, programs compiled with $G+ will not
     check for the presence of a approriate processor at runtime. It
     is strongly recommended that this behavior be changed. At least
     two cases are known (one involving Borlands biggest competitor)
     where programs were shipped that had been compiled with an 286
     switch setting. Customers using them on PC type machines were
     puzzled when they discovered that programs crashed on their systems
     although they had performed flawlessly on their office computer.
     Finally someone found the bug by tracing the program with a debugger.
     To avoid such unpleasant confusion, programs compiled with $G+
     should execute a short routine at startup to determine if an 286
     or later processor is present. If this is not the case, it should
     emit an error message and abort the program, just as programs
     compiled with $N+,$E- abort if they fail to detect a coprocessor.

     Since 286 real mode instructions can also be executed on NEC's
     V20/V30 processors and on the 80186/188, it might be desirable
     to have an 186 code generation feature. This would effectively
     split the $G switch into two separate switches. No changes would
     have to be made to the code generator, since it generates no 286
     protected mode instructions. Thus, generated code would be the
     same with either the 186 and 286 switches on. However, the inline
     assembler would only recognize protected mode instructions when
     the 286 switch is on. This would allow maximum utilization of the
     286 real mode instructions and a run time check for the CPU at the
     same time. Below is some code that can be used to distinguish between
     8086/8088, 80188/186/V20/V30, and 80286/386/486.


     ;--------------------------------------------------------------------
     ; CPU_Test distinguishes between three groups of CPUs commonly used
     ; in computers and returns an associated code for each.
     ;
     ; OUTPUT:  AX  = 0   Group #0 may execute 8086 code only (8086/8088)
     ;          AX  = 1   Group #1 may additionally execute 286 real mode
     ;                    instructions (V20/V30, 80186/80188)
     ;          AX  = 2   Group #2 may additionally execute 286 protected
     :                    mode instructions
     ;--------------------------------------------------------------------

     CPU_Test PROC    FAR
              PUSH    SP                   ; test updating
              POP     AX                   ;  of stackpointer
              CMP     AX, SP               ; stackpointer updated before push ?
              JE      @Grp2                ; no, must be 286, 386 or 486
              CLC                          ; make sure carry clear
              PUSHA                        ; PUSHA executed on 88/86 as JMP $+2
              STC                          ; carry set if V20/V30 or 186/188
     @8086:   JC      @Grp1                ; yes, its group #1
              XOR     AX, AX               ; CPU is 8088/8086
              RET                          ; done
     @Grp1:   POPA                         ; remove pushed bytes
              MOV     AX, 1                ; CPU is V20/V30 or 80186/80188
              RET                          ; done
     @Grp2:   MOV     AX, 2                ; CPU is 286/386/486
              RET                          ; done
     CPU_Test ENDP



4. Suggestions for enhancements in the code generator

 4.1 Enhancing procedure entry/exit code in $G+ mode (286 code generation)


     When a procedure/function does not use local variables, the
     standard exit code in $G- mode is:

     POP   BP
     RET

     This is replaced by the following code in $G+ mode:

     LEAVE
     RET

     However, for procedures/function that have no local variables,
     it would be advantageous to always use the first sequence in
     either mode, $G- and $G+. Although both sequences take the same
     number of clock cylces on 286 and 386 processors, the first is
     considerably faster on the 486. Since the code generator already
     checks if no local variables are declared to generate optimized
     entry code in $G+ mode, the optimized exit code could be
     generated just as easily.

     Although the use of the ENTER imm16, 0 instruction does produce
     shorter code when a procedure/function has both, parameters and
     local variables, the equivalent but longer (two or three byte more)
     standard procedure entry code will execute faster than ENTER on
     all Intel processors. Therefore, it should be considered if it is
     really desirable to use ENTER at all. A lot of programs really
     do run slower on a 386DX machine if compiled with $G+ instead of
     $G-, as tests indicate.

     processor  |  ENTER imm16, 0    | standard entry sequence
     -----------+--------------------+------------------------
     80286      |  11 clocks         | 3 + 2 + 3 = 8 clocks
     80386      |  10 clocks         | 5 + 2 + 2 = 9 clocks
     80486      |  14 clocks         | 1 + 2 + 1 = 4 clocks



 4.2 Optimizing entry code for non nested procedures without parameters
     and local variables

     If a procedure/function takes neither any parameters nor declares
     any local variables and is not statically nested within another
     procedure/function, there is no need for any entry code. Turbo
     Pascal performs this optimization only for assembler procedures,
     but skips it for normal procedures, probably so that nested and
     non-nested procedures can use the same branch of the code
     generator. The code generator could be enhanced to generate
     procedure entry code only for those procedures that are either
     statically nested (and thus have a hidden parameter, namely the
     framepointer of the preceding procedure in the static chain),
     take parameters, or declare local variables.



5. Suggestions for IDE

     The status line for the edit mode should be enhanced to include the
     shortcuts F5 Zoom and F6 Next. These additional hints will exactly
     fit into the remaining space. When IDE is in the stepping/debugging
     mode, shortcuts F4 Goto Cursor and Ctrl-F9 Run should be added to
     the status line. This would accelerate debugging sessions, since
     all program flow control could be excerted using simple mouse clicks
     on the status line.



6. Suggestion regarding TURBO command-line options

     There should be a help switch like /? or /Help on the Turbo-Pascal
     Prorammer's Platform command line that displays a help screen
     which describes the other command-line switches that are available
     and explains what they will do.






Make REAL money with your website!

The entire AOH site is optimized to look best in Firefox® 2.0 on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2008 AOH
We do not send spam. If you have received spam bearing an artofhacking.com email address, please forward it with full headers to abuse@artofhacking.com.