Evaluation Relevance Reduction

Stockfish
development versions

AquaPGN

AquaPGN - manipulation of PGN files generated by Aquarium

Preliminary note:
This article was written in the original in German. The English translation comes from the author too. A sufficient quality of the translation cannot be guaranteed.

# Update from April/May/June 2020: new form ‘Interactive Evaluation Relevance Reduction’; article revision.

Notes to the form:

Not all 9 input fields are to be filled in with parameters. If the programme misses information, various error messages are displayed in red and flashing.

• Alternative move evaluation(s) for ⊙ White ⊚ Black:
Important for analysis of the high and suboptimal move evaluation.
• high 'move evaluation (-##)#(.##)':
Between -999.99 and 999.99.
• Suboptimal 'move evaluation (-##)#(.##)':
Between -999.99 and 999.99. If 'White' is selected, this one must turn out smaller than the high move evaluation, in case of 'Black' it is purely numerical higher on the horizontal x-coordinate axis.
• 2 x 3 win/remis/loss percent values,'Half-move to calculate Stockfish-WDL-statistics':
The percent values between 0 and 100 following each of the two move evaluations are not necessary, if the 'half-move' field is filled out with a number from 1. If the programme does not find correct percent values, it then automatically calculates the 3 win/remis/loss percent values. If no half-move is entered, 2 percent values are sufficient, the third is calculated by the programme. Percent values that exceed 100 in total or lead to an evaluation relevance that does not range from 0 to 1 are not accepted by the programme.
• 'evaluation at 0.75-game-res.-probab. (e=0.75) (##)#(.##)' (abbreviated 'e=0.75'):
This is the evaluation from the point of view of White on the horizontal x-coordinate axis ('x'), where the average probabilistic game result is 0.75(:0.25) in favour of White. In the initial version of this article it was referred to as 'win draw balance'. There, the evaluation relevance on the vertical y-coordinate axis is 0.5.
• 'evaluat. at 0.75+-game-res.-prob. (e>0.75) > e=0.75 (##)#(.##)' (abbreviated 'e>0.75'):
This is the evaluation from the point of view of white on the horizontal x-coordinate axis ('x') where the average probabilistic game result is higher than 0.75(:0.25) in favour of White. This evaluation is higher than the previous one. Compared to the initialform, it represents a new parameter that helps to achieve additional precision. It corresponds to the following last parameter. The evaluation relevance on the vertical y-coordinate axis is less than 0.5 there.
• '1.00 > 0.75-plus-game-result-probability > 0.75 0.#(####)' = 1 - (r>0.75 / 2):
This represents the average probabilistic game result in favour of White on the vertical y-coordinate axis ('y') in the case of the previously entered evaluation 'e>0.75'. This result is situated above 0.75(:0.25) and represents also a new parameter compared to the initialform, which helps to additional precision.

Displaying the results requires permission to execute Javascript code in the browser.

Inputs
move evaluations and Stockfish WDL data
high move evaluation

suboptimal move evaluation - worse than high move evaluation

parameters of the user evaluation relevance reduction
0.75-game-result-probability: evaluation > 0

0.75-plus-game-result-probability: evaluation and 0.75-plus

results:

data of the Stockfish WDL statistic
evaluation colour source win-% draw-% loss-% half move

evaluation comparison user/Stockfish-ERR
probabilistic game result = 0.75
evaluation relevance = 0.50
probabilistic game result = ?
evaluation relevance = ?
user-ERR Stockfish WDL ERR Stockfish WDL ERR ∅ user ERR Stockfish WDL ERR Stockfish WDL ERR ∅

 absolute evaluation difference relevant evaluation difference in case of user-ERR relevant evaluation difference in case of Stockfish WDL ERR

high evaluation from White's perspective from Black's perspective

 user-ERR Stockfish WDL ERR

move evaluation symbols (‼ ! !? ?! ? ??) and thresholds
extensive scheme: 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
move evaluation symbol
threshold ! — ‼
threshold !? — !
threshold ./. — !?
threshold ./. — ?!
threshold ?! — ?
threshold ? — ??

move evaluation symbols (‼ ! !? ?! ? ??) and thresholds
restriktive scheme: 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
move evaluation symbol
threshold ! — ‼
threshold !? — !
threshold ./. — !?
threshold ./. — ?!
threshold ?! — ?
threshold ? — ??

position evaluation symbols and thresholds in case of user/Stockfish WDL ERR
threshold adjustment at identical position evaluation sectors
9 sectors: 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
position evaluation symbols
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
thresholds
user ERR Stockfish WDL ERR
clear/extreme advantage White (+– / ++–)
moderate/clear advantage White (± / +–)
slight/moderate advantage White (⩲ / ±)
balanced position/slight advantage White (= / ⩲)
balanced position/slight advantage Black (= / ⩱)
slight/moderate advantage Black (⩱ / ∓)
moderate/clear advantage Black (∓ / –+)
clear/extreme advantage Black (–+ / ––+)

position evaluation symbols and thresholds in case of user/Stockfish WDL ERR
threshold adjustment at identical position evaluation sectors
7 sectors: 1/7 1/7 1/7 1/7 1/7 1/7 1/7
position evaluation symbols
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
thresholds
user ERR Stockfish WDL ERR
moderate/clear advantage White (± / +–)
slight/moderate advantage White (⩲ / ±)
balanced position/slight advantage White (= / ⩲)
balanced position/slight advantage Black (= / ⩱)
slight/moderate advantage Black (⩱ / ∓)
moderate/clear advantage Black (∓ / –+)

position evaluation symbols and thresholds
in case of user ERR: threshold adjustment at probabilistic game results
in case of Stockfish WDL ERR: threshold adjustment at both evaluations
9 sectors
position evaluation symbols
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
thresholds
user ERR Stockfish WDL ERR
clear/extreme advantage White (+– / ++–)
moderate/clear advantage White (± / +–)
slight/moderate advantage White (⩲ / ±)
balanced position/slight advantage White (= / ⩲)
balanced position/slight advantage Black (= / ⩱)
slight/moderate advantage Black (⩱ / ∓)
moderate/clear advantage Black (∓ / –+)
clear/extreme advantage Black (–+ / ––+)

position evaluation symbols and thresholds
in case of user ERR: threshold adjustment at probabilistic game results
in case of Stockfish WDL ERR: threshold adjustment at both evaluations
7 sectors
position evaluation symbols
colour high evaluation suboptimal evaluation
user
ERR
Stockfish
WDL ERR
user
ERR
Stockfish
WDL ERR
thresholds
user ERR Stockfish WDL ERR
moderate/clear advantage White (± / +–)
slight/moderate advantage White (⩲ / ±)
balanced position/slight advantage White (= / ⩲)
balanced position/slight advantage Black (= / ⩱)
slight/moderate advantage Black (⩱ / ∓)
moderate/clear advantage Black (∓ / –+)

 user ERR irrelevance start evaluations Stockfish WDL ERR evaluations with evaluation relevance = 0

graph of the user ERR:

Flot 0.8.3 – Copyright © 2007 - 2014 IOLA and Ole Laursen

graph of the Stockfish WDL ERR:

Flot 0.8.3 – Copyright © 2007 - 2014 IOLA and Ole Laursen

# Move and position evaluationstogether with NAG and Informator symbols

Chess players usually assess moves and positions on the board by using such symbols as follows:

brilliant move (‼) - NAG $3, impressive move (!) - NAG$ 1,
attractive move (!?) - NAG $5, questionable move (?!) - NAG$ 6,
weak move (?) - NAG $2, miserable move (??) - NAG$ 4,

balanced position or draw (=) - NAG $10, slight advantage for White (⩲ or +/=) - NAG$14,
slight advantage for Black (⩱ or =/+) - NAG $15, moderate advantage for White (± or +/-) - NAG$16,
moderate advantage for Black (∓ or -/+) - NAG $17, clear advantage for White (+-) - NAG$18,
clear advantage for Black (-+) - NAG $19, extreme advantage for White (++-) - NAG$20,
extreme advantage for Black (--+) - NAG $21. In addition the unclear position (∝) - NAG$13 should be mentioned. Actually it does not belong here because it just states that (supposedly) a position evaluation is not possible.

Reservation: the above descriptions for all these symbols are own creations and of course not binding. More information you will find here.

By the way 'NAG' means 'Numeric Annotation Glyphs'.

Such move and position assessments are quite practical: they waste little space and at a glance they reveal an evaluation range. Only the question arises, how such evaluations come about. Rule of thumb? Or is there a bit more accurate way? It would be really an advance if they were defined by any chess program evaluations in pawn units with which chess engines numerically express positional imbalances, that is positional advantages or disadvantages. But where to take such definitions if not steal? From which position evaluation of a chess engine one can, for example, speak of a slight advantage for White, from 0.10 pawn units or from 0.20 – apart from individual over- or understatements of the engines in the level of their evaluations? And how can an objective scale be found here?

In the further course of this article, several mathematically derived proposals with corresponding formulas will be submitted. Before that, however, various statistical and mathematical foundations have to be worked out.

# Blunder relevance or more polite: evaluation relevance

Chess programs usually rate positions in hundredths of pawn units, and comparing the spit out variants of the number cruncher in one position reveals the evaluation difference and margin of error respectively between the best and an inferior variant.

But how relevant are actually faulty moves and their evaluation? Example: in a lost position after compensationless loss of the queen one gives away for no reason additionally another figure. The chess program will acknowledge this mishap with a much higher evaluation in favour of the opponent. But how relevant is such a difference between the new and the previous position evaluation in a practically already lost game? Objectively – in other words apart from subjective faulty moves of the opponent – in fact not at all! In all probability the bungler will not be able to save the game even without the recent faulty move with best play on both sides.

To take it to the extreme: What is the objective threshold at which a game can objectively be considered won or lost? Depends. One could ironically say: the more bungler the higher. The higher the evaluation, the sooner one can count on the fact that the advantage will no longer be messed up, whereby one may invest more confidence with today's chess computer programs than with Homo sapiens. And if you are dealing with a potential patzer, you should not, for example, throw in the towel prematurely in an apparent position of loss, as Kasparov formerly did in the 2nd match against Deep Blue in 1997.

# Computer chess statistics

So what to do? One takes leave of the human bungler chess, turns to the strongest chess engines. Now, in principle, one can take two paths:

Stockfish WDL ERR:

Since mid-2020, Stockfish has provided win/draw/loss ("WDL" for win-draw-loss) assessment ratios alongside the actual evaluations. In the words of the Stockfish development team:

'UCI_ShowWDL
If enabled, show approximate WDL statistics as part of the engine output. These WDL numbers model expected game outcomes for a given evaluation and game ply for engine self-play at fishtest LTC conditions (60+0.6s per game).'

These WDL statistics or probabilities consider the course of the game insofar as they take the evaluated half-move into account. The formulas on which they are based can be found in the Stockfish program code ('win rate model'). The use of these statistics need not necessarily be limited to game analyses made with Stockfish, because this engine is the ultimate in positional analysis and therefore sets the standard for evaluation.

The highlight of the Stockfish WDL statistics is not only the derivation of the evaluation relevancies and differences, optimum rates, probabilistic game results, move and position evaluation symbols including thresholds discussed in this article in a similar way as in the User ERR presented below. The average values resulting from it (cf. 'evaluation comparison User/Stockfish ERR', line 3, columns 3 and 6 in the programme) can provide valuable clues for adjusting the parameters of the User ERR.

By the way, here is a little programme trick: Entering '0' (zero) in the two move evaluation fields and the 'half-move' field deletes the programme's internal memory for these two average values, which are retained when the parameters are loaded and saved.

One small limitation should not go unmentioned. The automatic determination of the percent values for an evaluation by entering the half-move leads to results that differ slightly from those calculated by Stockfish itself. The reason might be an unknown divisor variable called "PawnValueEg" in the Stockfish code (a possibly dynamically modified variable?), which was equated with 1 by the programme here.

User ERR:

The traditional variant presented in this article is to analyse the engine games by asking at what evaluation these programmes won their games - or not. The most meaningful games can probably be found on the Internet under 'TCEC' ('Top Chess Engine Championship') in the 'Superfinals'. Reasons: long thinking time, opponents were the two apparently best chess engines in each case and all position evaluations are step by step comprehensible.

From a statistical point of view, is there a kind of 'point of no return', an evaluation - apart from a concrete mate announcement, of course - from which the victory is undoubtedly settled and a drawing liquidation can no longer be considered? Theoretically no. The following table shows that chess engines were not able to convert in various TCEC Superfinals evaluations of up to 5.01 into victories. And nobody is able to say where the absolute evaluation limit for such evaluation errors - best game assumed in the following moves - can lie, since nobody is allowed to determine this limit with an infinite number of test games.

Even if such outliers happen very rarely, they prohibit the equation of any evaluation (even of 5.01 - as you could see) with win or loss. In other words, there is no evaluation generated 'point of no return'.

Now one must turn to the question, in which evaluations special average game results have to be located. Of particular interest seem to be evaluations where, once achieved, the average result of all games concerned amounts to 0.75 (from White's point of view). Such a value can be achieved by an equal number of wins and draws or by a number of defeats and a triple number of wins. For the sake of completeness, losses are also mentioned here, although they rarely occur when this special balance evaluation is reached.

For clarification first of all the results of the Superfinals in the Seasons 9 ff. in tabular form as well as the FIDE Candidates' Tournament 2018 with the evaluations of Stockfish 8 with a 30 seconds thinking time to be found on 'www.chessbomb.com'.

tournament analysis
engine
wins evaluation e=0.75
at average
game result
0.75(:0.25)
maximum
evaluation e>0.75
without win
average
game result
at maximum
evaluation e>0.75
without win
alternative
pair of values:
evaluation e>0.75 /
average
game result
≅ 0.875
9 Stockfish 16 1.75 0.62
10 Houdini 15 2.00 0.66
12 Stockfish 29 1.48 0.52
13 Stockfish 16 2.79 1.14
14 Stockfish 10 2.42 1.45
FIDE
Candidates'
Tournament 2018
Stockfish 8 20 0.67 16.68 0.9762 2.39 / 0.8800
Superfinal
TCEC Nr. 16
Stockfish
19092522
14 1.24 3.33 0.9667 1.65 / 0.8684
Superfinal
TCEC Nr. 16
AllieStein
v0.5-dev_7b41f8c-n11
5 3.96 8.18 0.9167 8.03 / 0.8571
Superfinal
TCEC Nr. 17
LCZero
v0.24-sv-t60-3010
17 1.34 5.01 0.9722 1.89 / 0.8810
Superfinal
TCEC Nr. 17
Stockfish
20200407DC
12 1.49 2.76 0.9615 1.89 / 0.8750
Superfinal
TCEC Nr. 18
Stockfish
202006170741
23 0.87 3.74 0.9792 1.41 / 0.8710
Superfinal
TCEC Nr. 18
LCZero
v0.25.1-svjio-t60-3972-mlh
16 0.69 2.12 0.9706 1.57 / 0.8636

The above analysis explained using the example of Superfinal No. 17 and the winning engine LCZero v0.24-sv-t60-3010:

LCZero won 17 games. 83 games thus ended in draws or losses for LCZero. And in all these games is now the 17th lowest evaluation to look for which LCZero indicated in his favour. Mind you, a positive evaluation that could not be realized to win. So you count the 17 highest evaluations and the lowest of them is 1.26. Therefore 17 draws or losses exist in which in each case an evaluation of at least 1.26 is encountered. In other words: LCZero achieved in 34 games an evaluation of 1.26 and in 17 games respectively, the result was a draw/loss or 1-0.

But a small complication is included in these numbers: LCZero had to acknowledge defeat in the 16th game, although it had already spat out an evaluation of 1.89 and 1.89 is situated above the previously determined evaluation threshold of 1.26. Because of this '0'- result, it is not possible to determine an average game result of 0.75 based on the actual results. Because this amounts to

$\frac{\left(17\cdot 1\right)+\left(16\cdot 0.5\right)+\left(1\cdot 0\right)}{34}=0.7353$

instead of 0.75. If the real numbers are stubborn, mathematics must intervene. The formula for the average game results between 0.5 and 0.75 on the y-coordinate axis is a linear function and is

$\mathrm{average}\mathrm{game result}=\frac{\left(2\cdot {\mathrm{e}}_{\text{=0.75}}\right)+\mathrm{evaluation}}{4\cdot {\mathrm{e}}_{\text{=0.75}}}$

Sought-after is the ominous 0.75 game result evaluation (abbreviated 'e=0.75'). So it must be transformed:

${\mathrm{e}}_{\text{=0.75}}=\frac{\mathrm{evaluation}}{\left(4\cdot \mathrm{average}\mathrm{game result}\right)-2}$

In the present LCZero case, therefore, it is to be calculated:

${\mathrm{e}}_{\text{=0.75}}=\frac{1.26}{\left(4\cdot 0.7353\right)-2}=1.3387$

The result is situated slightly above the actual e=0.75, which was to be expected.

In September 2017 the engine Houdini 6 was released. You can read the following on this website:

'The evaluations have again been calibrated to correlate directly with the win expectancy in the position. A +1.00 pawn advantage gives a 75% chance of winning the game against an equal opponent at blitz time control. At +1.50 the engine will win 90% of the time, and at +2.50 about 99% of the time. To win nearly 50% of the time, you need and advantage of about +0.60 pawn.'

Houdini kept his word. In the TCEC Superfinal Season 10 against Komodo, Houdini gained 15 victories and in the 15 draws or losses with the highest evaluations of Houdini, the minimum evaluation was 0.57. An almost precise landing.

The above table allows the cautious conclusion to be drawn that the Stockfish versions used since the TCEC Superfinal 13 give significantly higher evaluations than their previous versions. One thing should not fall by the wayside when interpreting these results: Stockfish 10 was given a 'contempt' of 0.24 (Stockfish 9: 0.20), which should raise the respective evaluation. It therefore seems obvious to subtract this contempt margin from the evaluation thresholds listed in the table for one's own analysis purposes.A tip, however, is allowed: Analyses with Stockfish should only be carried out with ‘contempt’ switched off in order not to artificially drive up the evaluations.

Finally it should be noted that the TCEC website has recently been updated with win draw probabilities and locates for the engine Stockfish the e=0.75 with around 1.56 (Superfinal 17) or even 1.91 (Superfinal 18). In view of the previous table a quite plausible value. However, it is critical that there only percentages for 'W' (win?) and 'D' (draw? - 100% - 'W' percentage) are given, but that the loss probability is swept under the carpet. The TCEC-e=0.75 of 1.56 assumed above is necessarily based on the assumption that 'D' also includes the probability of loss.

# Mathematical evaluation relevance reduction

Let's note: on the way of evaluation between 0.00 and infinity (∞) its relevance decreases continuously. Starting at 100% in the case of a 0.00 evaluation over 50% at the e=0.75 evaluation (the TCEC value of 1.56 is assumed below as an example for clarification) it ends at infinity with 0%.

One example: the evaluation for the best move is 2.00. Now a mishap happens: a faulty move because of a figure loss with an evaluation of -3.00. The absolute evaluation difference is -5.00. How relevant is this figure loss? Obviously less than -5.00.

In detail:
between the evaluations 2.00 and 1.56 the relevance of less than 50% is growing continuously;
at 1.56 it should amount to 50%; for this is the mean value between 100% and 0%; furthermore, the probabilistic game result of 0.75 at the evaluation 1.56 is the mean value between 0.5 at the 0.00 evaluation and 1 at a maximum engine evaluation;
at 0.00 the relevance reaches its maximum value of 100%;
-1.56 is again resulting in 50% and
at -3.00 it ends with a value well below 50%.

The sum of these percentages would now be interesting. By way of calculation doable, but somewhat complicated. The mathematical adepts have certainly long recognized that this up and down would have to be expressed with a mathematical function, for which the following applies: The more one moves away from the y-axis on both sides, the smaller the ordinates, the respective evaluation relevance amounts along these points on the x-axis, until they finally approach the x-axis on both sides at infinity as asymptotes. The x-axis thus represents the evaluations (on the part of an engine), the y-axis the evaluation relevance amounts.

At this point an exponential function of the general form f(x) = a^(x*b) was proposed in the first article version. Such exponential functions have the advantage that the point P(0;1) is always fulfilled and they approach the x-axis in (positive) infinity. The disadvantage of such a function, however, includes the fact that it can only determine 2 points, the already mentioned point P(0;1) and the point P(e=0.75;0.5). However, a further definition point P(e>0.75;r>0.75) would be urgently needed for better precision, for example to be able to capture the highest TCEC engine evaluations without win and the corresponding game results which are far above 0.75.

Solution: 3 equations for 3 negative and 3 positive sectors along the x-axis (x stands for engine evaluation):

1st positive and negative sector:

${\mathrm{y}}_{\mathrm{Rel}}=1-\frac{|\mathrm{x}|}{2\cdot {\mathrm{e}}_{\text{=0.75}}}$ linear equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}

2nd positive and negative sector:

${\mathrm{y}}_{\mathrm{Rel}}=-\frac{2\cdot {\mathrm{e}}_{\text{=0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}-2\cdot |\mathrm{x}|\cdot {\mathrm{r}}_{\text{>0.75}}-{\mathrm{e}}_{\text{>0.75}}+|\mathrm{x}|}{2\cdot \left({\mathrm{e}}_{\text{>0.75}}-{\mathrm{e}}_{\text{=0.75}}\right)}$ linear equation with 𝔻 {x | -e>0.75 ≤ x ≤ -e=0.75 or e=0.75 ≤ x ≤ e>0.75}

3rd positive and negative sector:

${\mathrm{y}}_{\mathrm{Rel}}={{\mathrm{r}}_{\text{>0.75}}}^{\frac{|\mathrm{x}|}{{\mathrm{e}}_{\text{>0.75}}}}$ exponential equation with 𝔻: {x | -∞ < x ≤ -e>0.75 or e>0.75 ≤ x < ∞}

The evaluation relevance functions are set. But how is the really relevant evaluation difference calculated over a certain distance on the x-axis, for example between 2.00 and -3.00? The evaluation relevance function only returns the respective y-value of a special point along the x-axis. As ingenious as it is simple: via integral function. All values between the x-axis and the function curve summed, i.e. the area there, between the best evaluation (for example 2.00) and the inferior evaluation (for example -3.00) represent the definite integral of this function - i.e. the relevant evaluation difference.

To calculate the integral, the antiderivatives of the evaluation relevance functions are required. These are as follows:

1st positive and negative sector:

${\mathrm{y}}_{\text{Int}}=\mathrm{x}-\frac{\mathrm{x}\cdot |\mathrm{x}|}{4\cdot {\mathrm{e}}_{\text{=0.75}}}$ quadratic equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}

2nd positive and negative sector:

${\mathrm{y}}_{\text{Int}}=-\frac{\mathrm{x}\cdot \left(4\cdot {\mathrm{e}}_{\text{=0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}-2\cdot |\mathrm{x}|\cdot {\mathrm{r}}_{\text{>0.75}}-2\cdot {\mathrm{e}}_{\text{>0.75}}+|\mathrm{x}|\right)}{4\cdot \left({\mathrm{e}}_{\text{>0.75}}-{\mathrm{e}}_{\text{=0.75}}\right)}$ quadratic equation with 𝔻 {x | e=0.75 ≤ x ≤ e>0.75}

3rd positive sector:

${\mathrm{y}}_{\text{Int}}=\frac{{\mathrm{e}}_{\text{>0.75}}\cdot {{\mathrm{r}}_{\text{>0.75}}}^{\frac{\mathrm{x}}{{\mathrm{e}}_{\text{>0.75}}}}}{\mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}$ exponential equation with 𝔻 {x | e>0.75 ≤ x < ∞}

3rd negative sector:

${\mathrm{y}}_{\text{Int}}=-\frac{{\mathrm{e}}_{\text{>0.75}}}{{{\mathrm{r}}_{\text{>0.75}}}^{\frac{\mathrm{x}}{{\mathrm{e}}_{\text{>0.75}}}}\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}$ exponential equation with 𝔻 {x | -∞ < x ≤ -e>0.75}

Note with the equations above, that the computer program Maxima uses the notation log(x) instead of the usual notation ln(x) for the natural logarithm. By the way also Javascript ('Math.log()'). If the above equations with 'ln' should be used in such programs, 'ln' would have to be replaced by 'log'.

If you experiment with the interactive form above, you will soon realize that in extreme evaluations the relevant evaluation difference hardly changes when these evaluations are entered even more extreme. Example for White:
high evaluation = 15
suboptimal evaluation = 0
e=0.75 = 2
e>0.75 = 3
probabilistic game result at e>0.75 = 0.85 (corresponds to a r>0.75 = 0.3)
result of the relevant evaluation difference = 2.64

If the high evaluation is increased to 18, the relevant evaluation difference amounts to 2.65. And a high evaluation of 1000 results in a relevant evaluation difference of 2.65. The same results occur if the suboptimal evaluation is -15, -18 or -1000 and the high evaluation amounts to 0.

The relevant evaluation differences are rounded up or down to 2 decimal places in the form. If you now want to calculate the high or suboptimal evaluation (in future called ‘irrelevance start evaluation’), from which every further increase or reduction to infinity will lead to an increase of the relevant evaluation difference (with 2 decimal places) by 0.01 at some point with a maximum probability of 50%, you need the following formula:

$-\frac{{\mathrm{e}}_{\text{>0.75}}\cdot {{\mathrm{r}}_{\text{>0.75}}}^{\frac{\text{irrelevance start evaluation}}{{\mathrm{e}}_{\text{>0.75}}}}}{\mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}=0.005$

solved according to irrelevance start evaluation and taking into account high and low results ("±"):

$\text{irrelevance start evaluation}=±\frac{{\mathrm{e}}_{\text{>0.75}}\cdot \mathrm{ln}\left(-\frac{\mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}{200\cdot {\mathrm{e}}_{\text{>0.75}}}\right)}{\mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}$

The result with the above parameters amounts to ±15.477.

The formula shows that the irrelevance start evaluation is independent of the 2nd evaluation (0 in the above case) and of e=0.75 (2 in the above case).

This formula normally applies to the localization of the irrelevance start evaluation in the 3rd positive and negative sector. With unusual values of e>0.75 and r>0.75, the irrelevance start evaluation slips into the 2nd positive and negative sector, so that far more complicated formulas are required. This happens when the following applies:

${\mathrm{e}}_{\text{>0.75}}<-\frac{0.005\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}{{\mathrm{r}}_{\text{>0.75}}}$

For example, if e>0.75 < 0.978 and the probabilistic game result at e>0.75 = 0.99. Or if e>0.75 < 0.0277 and the probabilistic game result at e>0.75 = 0.875. Highly unrealistic!

The evaluation relevance reduction and all the delicacies mentioned in this article (automatic move and position evaluation symbols as well as the probabilistic game results) have been implemented
in the ScpcPGN program, available free of charge on this website
and in the AquaPGN program (latest update 12th August 2020), available free of charge on this website.

# Probabilistic game results

Why is there talk of 'probabilistic' game results? Because they are derived from an engine evaluation and other parameters and therefore contain a stochastic statement about the presumed average game outcome. The situation was different in the discussion of the TCEC results, where only the 'average' game results were mentioned, because there was game material available with which the factual average game results could be calculated.

The probabilistic game result is always presented here from the point of view of White. If White wins, the result is 1-0, vice versa 0-1, and in the case of a draw ½-½. If you take the leading number in each case, you have the probabilistic game result used here.

It can be derived directly from the evaluation relevance:

for positive evaluations:
probabilistic game result = 1 - (evaluation relevance / 2);

for negative evaluations:
probabilistic game result = evaluation relevance / 2.

An engine evaluation of exactly 0.00 with an evaluation relevance of 1.00 results in a probabilistic game result of 0.50, i.e. a presumed draw. A probabilistic game result of approximately 1.00 would be an almost certain win for White, and one of approximately 0.00 would be an almost certain win for Black. 1.00 and 0.00 are mathematically never exactly reached. And an engine rating of exactly e=0.75 leads to a result of 0.75, i.e. a value that lies exactly between win for White and Draw. The results are therefore easier to interpret from White's point of view.

Clarification: the probabilistic game result is in no way equivalent to a win probability.

Many people make this mistake. For example, the programme Nibbler manages to confuse the - in reality - probabilistic game result with the 'Winrate', although in the starting position after 1. e4, for example, this 'Winrate' exceeds 50%, while the actual win probability in the 'WDL' display is only a modest 15%. But the programme author apparently does not notice this.

It applies lapidary:

In order to fulfill the duty of a chronicler also the game result equations:

1st positive und negative sector:

${\mathrm{y}}_{\mathrm{PGR}}=\frac{1}{2}+\frac{\mathrm{x}}{4\cdot {\mathrm{e}}_{\text{=0.75}}}$ linear equation with 𝔻 {x | -e=0.75 ≤ x ≤ e=0.75}

2nd positive sector:

${\mathrm{y}}_{\mathrm{PGR}}=\frac{2\cdot {\mathrm{e}}_{\text{=0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}-2\cdot \mathrm{x}\cdot {\mathrm{r}}_{\text{>0.75}}+3\cdot {\mathrm{e}}_{\text{>0.75}}-4\cdot {\mathrm{e}}_{\text{=0.75}}+\mathrm{x}}{4\cdot \left({\mathrm{e}}_{\text{>0.75}}-{\mathrm{e}}_{\text{=0.75}}\right)}$ linear equation with 𝔻 {x | e=0.75 ≤ x ≤ e>0.75}

2nd negative sector:

${\mathrm{y}}_{\mathrm{PGR}}=-\frac{2\cdot {\mathrm{e}}_{\text{=0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}+2\cdot \mathrm{x}\cdot {\mathrm{r}}_{\text{>0.75}}-{\mathrm{e}}_{\text{>0.75}}-\mathrm{x}}{4\cdot \left({\mathrm{e}}_{\text{>0.75}}-{\mathrm{e}}_{\text{=0.75}}\right)}$ linear equation with 𝔻 {x | -e>0.75 ≤ x ≤ -e=0.75}

3rd positive sector:

${\mathrm{y}}_{\mathrm{PGR}}=-\frac{{{\mathrm{r}}_{\text{>0.75}}}^{\frac{\mathrm{x}}{{\mathrm{e}}_{\text{>0.75}}}}-2}{2}$ exponential equation with 𝔻 {x | e>0.75 ≤ x < ∞}

3rd negative sector:

${\mathrm{y}}_{\mathrm{PGR}}=\frac{1}{2\cdot {{\mathrm{r}}_{\text{>0.75}}}^{\frac{\mathrm{x}}{{\mathrm{e}}_{\text{>0.75}}}}}$ exponential equation with 𝔻 {x | -∞ < x ≤ -e>0.75}

Of course, the probabilistic game results can also be found in the interactive form.

How you should not do it though:

Sune Fischer and Pradu Kannan have examined the mathematical relation between 'winning probability W and the pawn advantage P' in the article 'Pawn Advantage, Win Percentage, and Elo'.

Whether 'winning probability' really means the real (lower) ‘winning probability’ or perhaps only the (higher, since draws are taken into account) probabilistic game result can be deduced from the article elsewhere:

'When applying the condition that the win probability is 0.5 if there is no pawn advantage …'

If ‘the win probability is 0.5’ and the 'pawn advantage' is zero, the loss probability would necessarily also have to be 0.5 in order to evaluate the position as balanced. But where then are the draws, which should approach with a win probability of 50% this mark, with low loss probability?! It seems that the authors' knowledge of chess game is quite limited. This nonsense must therefore be corrected to the effect that the authors are not referring to the 'win probability’ but to the probabilistic game result discussed in this article, which includes draws and losses. This is how the calculation works: A probabilistic game result of 0.5 is equivalent to an evaluation – or if you like a 'pawn advantage' – of 0.00.

'Data was taken from a collection of 405,460 computer games in PGN format. Whenever exactly 5 plys in a game had gone by without captures, the game result was accumulated twice in a table indexed by the material configuration. … Only data pertaining to the material configuration was taken. This was considered reasonable because the material configuration is the most important quantity that affects the result of a game.'

If by 'material configuration' the material balance is meant as the difference of the mutual figure values is to be assumed, because it is stated elsewhere:

'For each material configuration, a pawn value was computed using conventional pawn-normalized material ratios that are close to those used in strong chess programs (P=1, N=4, B=4.1, R=6, Q=12).'

Apart from the fact that these figure values seem to be quite generous, the material balance is very coarse compared to the evaluations of chess engines, which are based on much more difficile criteria and last but not least on considerable search depths. But all this would still be bearable if the relation between win probability and figure balance presented by the authors were stringent. Meanwhile, an ominous parameter 'K' appears in their ultimate formula:

$\mathrm{W}=\frac{1}{1+{10}^{\frac{-\mathrm{P}}{\mathrm{K}}}}\phantom{\rule{2em}{0ex}}\text{or}\phantom{\rule{2em}{0ex}}{\mathrm{y}}_{\mathrm{PGR}}=\frac{1}{1+{10}^{\frac{-\mathrm{x}}{\mathrm{K}}}}$

And they estimate this parameter 'K' at '4' – roughly.

If you resolve this formula to K, you get:

$\mathrm{K}=\frac{\mathrm{ln}\left(10\right)\cdot \mathrm{P}}{\mathrm{ln}\left(-\frac{\mathrm{W}}{\mathrm{W}-1}\right)}\phantom{\rule{2em}{0ex}}\mathrm{or}\phantom{\rule{2em}{0ex}}\mathrm{K}=\frac{\mathrm{ln}\left(10\right)\cdot \mathrm{x}}{\mathrm{ln}\left(-\frac{{\mathrm{y}}_{\mathrm{PGR}}}{{\mathrm{y}}_{\mathrm{PGR}}-1}\right)}$

And if you insert into this formula, for example, the Ps and Ws determined above for the winning engines of TCEC 17 (LCZero) and 18 (Stockfish), you get very different Ks between 1.7 and 3.2. Conversely, a K of 4 with a probabilistic game result of 0.75 would result in an evaluation of 1.91, which is not very realistic according to the above table values. Obviously, it is illusory to try to mathematically force the desired relation into a single sigmoid function with only one parameter ('K'). In contrast, the form 'Interactive Evaluation Relevance Reduction' presented at the beginning of this article works with a total of 5 formulas and 3 parameters to calculate the probabilistic game results. Precision instead of simplification!

It may seem tasteless to derive in the following the move evaluation symbols quasi automated from engine evaluations, as they are often chosen based on a deeper understanding of the position and are not oriented towards engine evaluations. Example: In a position there is quite clearly only one reasonable move that every child can find, all other moves would be miserable. It would be more than stupid to attest this one move the quality feature '‼'. Or a little more subtle: In lost position, a move that is objectively weak, i.e. theoretically refutable, is setting a trap that holds the chance of revival. A typical "interesting move (!?) - NAG 5", which should perhaps not be characterized with "?" or the like. Nevertheless, in many cases it can make sense by all means to determine such move evaluation symbols from a comparison of the engine evaluations for two alternative moves, especially if there is no opportunity to examine a position more carefully, for example in automatic game analyses. The intention of Grandmaster Robert Hübner cannot be followed in this way. In the English-speaking Wikipedia he is quoted as follows: 'German grandmaster Robert Hübner prefers an even more specific and restrained use of move evaluation symbols: 'I have attached question marks to the moves which change a winning position into a drawn game, or a drawn position into a losing one, according to my judgment; a move which changes a winning game into a losing one deserves two question marks ...‘' Uncertain assessments such as 'winning position', 'drawn game', 'drawn position' or 'losing one' do not become more suitable for programming by the addition 'according to my judgment'. The starting point for the classification of the move evaluation symbol is, of course, the real made move, on the other hand the best alternative move for bad moves and the second-best alternative move for good moves. For these two moves - as explained above - the relevance reduced evaluation difference has to be determined and this in turn has to be translated into the move evaluation symbol. Thereby the definite integral of the entire evaluation range from -∞ to +∞ divided into not only 6, but 7 or 8 sectors of equal size. There are not only the 6 sectors for which a move evaluation symbol is to be assigned, but also the neutral sector of a move that is approximately equal to the best or second-best move. Half of this neutral sector comes in the positive evaluation direction and half in the negative. One can either use a neutral sector with the same integral size as the remaining sectors or a neutral sector twice as large, consisting of 2 sectors with the usual integral size, one for each evaluation direction. This would total to either 7 or 8 equal integral areas (on the latter variant 2 integral sectors for the neutral sector). Mind you: We are talking here about integral sectors and sizes respectively in the sense of definite integrals, i.e. the relevant evaluation differences, not to be confused with the absolute differences between 2 move evaluations on the x-axis. For a given relevant evaluation difference, the latter are quite different, depending on where the move evaluations are located on the x-axis. The further they move away from the y-axis, i.e. from the move evaluation 0.00, the more their distance to each other increases with a given relevant evaluation difference. Mathematically it is even possible - based on von e=0.75, e>0.75, r>0.75 as well as a given move evaluation - to calculate that limit value of a new move evaluation which would result in case of a move with any move evaluation symbol. Hard to understand, therefore an example: Given is a faulty move of White with an evaluation of -0.30 and a e=0.75 of 1.50, a e>0.75 of 3.00 and a probabilistic game result at e>0.75 of 0.875. From which evaluation would an alternative good move of White compared to this weak and at the same time next best move deserve the move evaluation symbol '‼'? Depending on the used scheme, the answer will be for example 1.52 or 1.62. Of course, such move evaluation symbols only come into effect if correspondingly high definite integrals - pardon: relevant evaluation differences - are available at all. A correct move of white with an engine evaluation of 100.00 will hardly earn a'!?','!' or '‼', even if the second-best move is only 10.00. This positive evaluation difference is simply irrelevant and is therefore confirmed with a relevant evaluation difference of almost 0.00. A won position can usually not be spoiled with the second best moves. That is just the effect of the evaluation relevance reduction. How big should these relevant evaluation differences for the move evaluation symbols turn out now? Possibly with the exception of the neutral sector, the entire integral area could be divided into equal parts, or the subdivision could be aligned in that way that a brilliant move can already be stated if it exceeds the win draw balance and the next best move has to make do with an evaluation of 0.00. The first alternative deals with the move evaluation symbols more economically, the second is more generous. Extensive move evaluation symbol scheme 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7: Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is brilliant move (‼) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a better evaluation, impressive move (!) - 1/14 + 1/7 = 3/14 of the total integral towards a better evaluation, attractive move (!?) - 1/14 of the total integral towards a better evaluation, questionable move (?!) - 1/14 of the total integral towards a suboptimal evaluation, weak move (?) - 1/14 + 1/7 = 3/14 of the total integral towards a suboptimal evaluation and miserable move (??) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a suboptimal evaluation. From this, the thresholds of the move evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code. Generally, move evaluation symbols are assigned more generously here than in the following scheme '1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8'. Restrictive move evaluation symbol scheme 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8: Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is brilliant move (‼) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a better evaluation, impressive move (!) - 1/8 + 1/8 = 1/4 of the total integral towards a better evaluation, attractive move (!?) - 1/8 of the total integral towards a better evaluation, questionable move (?!) - 1/8 of the total integral towards a suboptimal evaluation, weak move (?) - 1/8 + 1/8 = 1/4 of the total integral towards a suboptimal evaluation and miserable move (??) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a suboptimal evaluation. Generally, move evaluation symbols are assigned less generously here than in the previous scheme '1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 1/7'. In the interactive form, the thresholds between the symbols in both scheme tables are listed, as far as algebra allows it, i.e. as far as the margin of relevant evaluation difference remaining after the initial evaluation allows an award. If not, the character string '-----' is output. Optimum rate: In the results under the form 'Interactive Evaluation Relevance Reduction' you will also find the 'optimum rate of the suboptimal move evaluation'. This contains the precise numerical expression for the move evaluation symbol of the suboptimal move evaluation (nothing, ?!, ?, ??). It is calculated from the quotients for move evaluations for White total integral from -∞ to the suboptimal move evaluation / total integral from -∞ to the better move evaluation, for move evaluations for Black total integral from +∞ to the suboptimal move evaluation / total integral from +∞ to the better move evaluation. So it is regularly below 100% and reaches the optimum of 100% only exceptionally with 2 equal move evaluations. # Concretisation of the position evaluation sectors The 9 evaluation sectors listed at the beginning of the article can now be described in more detail using the developed mathematical foundations. Four evaluation sectors in each case are positive and negative. The balanced position shall apply to minimal advantages for White and Black around the value zero. The sector of the minimum advantage for White or Black is 50% of the total balanced sector. 9 position evaluation sectors with threshold adjustment at the probabilistic game results: Here an assumption takes place that is not mandatory, but very plausible: The end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' should coincide exactly with the evaluation e=0.75 for which the probabilistic game result amounts to 0.75. Conversely for Black: The end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' should coincide exactly with the evaluation -e=0.75, for which the probabilistic game result amounts to 0.25 from White's point of view. With this basic assumption is accompanied that a slight or moderate advantage probabilistically represents a tendency to draw and a clear or extreme advantage probabilistically represents a tendency to win. Further assumption: The end of the sector 'clear advantage for White' and the beginning of the sector 'extreme advantage for White' should exactly coincide with the evaluation e>0.75. Conversely for Black: The end of the sector 'clear advantage for Black' and the beginning of the sector 'extreme advantage for Black' should coincide exactly with the evaluation -e>0.75. When using this scheme, it would probably be useful to adjust the probabilistic game result at e>0.75 to 0.875, so that it lies exactly in the middle between 0.75 and 1.00. Now some mathematics again: The task now is to quantify these individual advantage sectors. For example if one would compare a white move with an overwhelming advantage of 100.00 to a patzer move leading to a draw (0.00), the absolute evaluation difference would be 100.00, but the relevant valuation difference would be only the practically complete definite integral of all functions in the exclusively positive range of the x-axis (which in turn is identical to the definite integral in the exclusively negative range of the x-axis). By the way, the mathematical formula for the complete integral from -∞ to +∞ is: $\frac{2\cdot {\mathrm{e}}_{\text{>0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)-2\cdot {\mathrm{e}}_{\text{=0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)+{\mathrm{e}}_{\text{>0.75}}\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)+2\cdot {\mathrm{e}}_{\text{=0.75}}\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)-4\cdot {\mathrm{e}}_{\text{>0.75}}\cdot {\mathrm{r}}_{\text{>0.75}}}{2\cdot \mathrm{ln}\left({\mathrm{r}}_{\text{>0.75}}\right)}$ Next thought experiment: If one would now compare a white move with an advantage of e=0.75 exactly at the border between moderate and clear advantage with a patzer move that leads to a draw (0.00), the absolute evaluation difference would be e=0.75, but the relevant evaluation difference would only be the complete definite integral in the 1st positive sector of the x-axis. As mathematical formula: 0.75 * e=0.75. If one now sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage, between the latter and begin of the moderate advantage and again between the latter and the begin of the clear advantage in each case for White/Black, the integral value of 0,75 * e=0.75 would have to be divided into 3 sectors: 20% = 0.15 * e=0.75 for the sector balanced position from 0.00, 40% = 0.30 * e=0.75 for the sector slight advantage for White/Black and 40% = 0.30 * e=0.75 for the sector moderate advantage for White/Black. From this, the thresholds of the position evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code. 7 position evaluation sectors with threshold adjustment at the probabilistic game results: 'Extreme advantage for White (+--) or Black (-++) - NAG20/$21' may not be everyone’s cup of tea. For these contemporaries now a now a repetition of the previous proposal, but this time with only 7 evaluation sectors without extremes. Here, the end of the sector 'slight advantage for White' and the beginning of the sector 'moderate advantage for White' coincide exactly with e=0.75, for which the probabilistic game result amounts to 0.75, and the end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' coincide exactly with e>0.75. Conversely for Black: The end of the sector 'slight advantage for Black' and the beginning of the sector 'moderate advantage for Black' coincide exactly with -e=0.75, for which the probabilistic game result amounts to 0.25 from the white point of view, and the end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' coincide exactly with the evaluation -e>0.75. This basic assumption is accompanied by the fact that slight or moderate advantage probabilistically represents a tendency to draw and clear advantage probabilistically represents a tendency to win. If one here sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage and between the latter and begin of the moderate advantage in each case for White/Black, the integral value of 0,75 * e=0.75 would have to be divided into 2 sectors: 1/3 = 0.25 * e=0.75 for the sector balanced position from 0.00 and 2/3 = 0.50 * e=0.75 for the sector slight advantage for White/Black. 9 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/9 1/9 1/9 1/9 1/18 1/18 1/9 1/9 1/9 1/9 of the total integral: If one discards the above guideline of threshold adjustment at the probabilistic game results and again prefers 4.5 positive or negative position evaluation sectors this time, however, of equal quantity, the evaluation areas would turn out as shares of the total integral as follows: 1/18 for the sector balanced position from 0.00, 1/9 for the sector slight advantage for White/Black, 1/9 for the sector moderate advantage for White/Black, 1/9 for the sector clear advantage for White/Black and 1/9 for the sector extreme advantage for White/Black. 7 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 of the total integral: If one discards the above guideline of threshold adjustment at the probabilistic game results and is also not a friend of 4.5 positive or negative position evaluation sectors with extremes, this scheme with sectors of equal quantity remains: 1/14 for the sector balanced position from 0.00, 1/7 for the sector slight advantage for White/Black, 1/7 for the sector moderate advantage for White/Black and 1/7 for the sector clear advantage for White/Black. The interactive form lists the position evaluation symbols and the limit values between the symbols, the latter in a separate line for each of the 4 schemes. A tip by the way: If the well-disposed reader strives to use the position evaluation symbols, however not getting hold of them, the following link to the AqChessUnicode font could be helpful. This by the way is also attached to the chess GUI Aquarium. And those who would not be averse from entering these special chess characters directly with the keyboard for commenting in texts and call a Windows operating system their own may find out about the 'Keyboard layout for chess annotation with special symbols in Windows programs via AutoHotkey'. # The human factor An evaluation of around 1.50 pawn units to achieve an average game result of 0.75 applies to a largely optimal chess play, as the best chess engines practice it nowadays, but not necessarily also for chess players, not even for grandmasters, who also play far too often bullshit and should therefore theoretically make do with a clearly higher e=0.75. The reason for this would be their tendency to make mistakes, which lets them draw or even lose games which were believed to be already won. One objection to this, however, is the fact that this measured value would be pushed down again by the mistakes of their opponents of the genus Homo sapiens, because their mistakes often lead to wins which were not necessarily inevitable and for good chess engines such positions under pressure might have been defensible. In this way, many actual draws with temporarily high evaluations could be statistically included in the number of wins without pushing the e=0.75 up or, vice versa, even minimizing it, since with every additional win a lower evaluation in the waiting list rises to the new e=0.75. In this respect a suboptimal chess play would be upgraded by the suboptimal opponent's play. Which impact of the chess playing Homo sapiens for the e=0.75 will take more effect is uncertain. If chess grandmasters still had the guts to face the best chess engines, their true e=0.75 might not be determined either. After all, when would they have a clear advantage in such games or even carry off wins? Maybe in extreme handicap games? With them it could be tested how many pawns would have to be taken away from the computer opponent in the starting position in order to wangle wins and draws on a significant scale for the got off scot-free master. Or how a given opening would have to be constructed to release the chess engine into a questionable position. So the grandmasterly e=0.75 could be determined after all. But since contemporary chess luminaries have been avoiding such comparisons more and more for a long time in order to escape disgrace, such a question hardly arises any more. Since such game material from matches between man and machine is hardly available, there is currently and presumably also for eternal times only left the half-baked possibility to evaluate games between humans. Whereby one should always keep in mind that the resulting scores were diluted by the dubious playing style of the opponent. Forget it. No sooner said than done by analysis of 144 world championship matches between Karpov and Kasparov in the years 1984 to 1990. The very last game remains unconsidered, since Kasparov settled a draw with Karpov there with a clear advantage, although the win - as it says in chess slang - was only a question of technique. A draw was enough for him to win the world championship title. All games were superficially analyzed by Stockfish with a short reflection time and an average depth of just over 20 half moves. To make a long story short: Kasparov won 21 times, Karpov 19 times. The 21 and 19 highest evaluations respectively in draw games were between 3.67 and 1.00 for Kasparov and between 7.80 and 1.04 for Karpov. If you like, you can read from this a win draw balance of at least 1.00 … Despite a positive evaluation of at least 1.26, the game was still set in the sand in 5 games. Kasparov even messed up the 18th match in the 1986 world championship fight despite a clear 3.67! # Excursus: "Draw range" At this point the term "draw range", which is wandering around like a ghost from time to time, should be critically scrutinized a little. Because it suggests wrongly that it would coincide with the evaluation range "balanced position or draw (=) - NAG$10". To the reader's chagrin, however, there comes to light a pretty different understanding of this term.

Variant 1:

"Houdini insists on Txc6 and specified at depth 25 an evaluation of 0.76, which probably does not exceed the draw range yet." (Thema "Endspielkönnen gefragt" von Joe Boden Datum 2013-02-09 13:03)

"Therefore one believes with Houdini that a (won) endgame is still in the draw range when it shows +0.80..." (Schachfeld).

This suggests that on the basis of a position evaluation of a chess engine in the low range a statement could be made about the draw outcome of the game. Clearly every game win starts small, namely with a minimal advantage, even maybe after the first move. And if one position the chess engine on the first moves after a game won in this way and let yourself be convinced that the game did by no way start with an initial advantage of significantly more than +0.80, you might start to think long and hard. And the argumentative counter attack by later failures, which are said to have caused the disaster, won’t work, if the patzer is called Stockfish for example and has an ELO of approximately 3500. Take to heart the TCEC loss games of Stockfish. There you will find a lot of games that ended in disaster for this engine despite a negative "draw range" of about -0.76 or -0.80, although it is not well-known for negligently dealing with its positions within the alleged "draw range". Who else but Stockfish should be able to keep such positions in a draw?

Variant 2:

"If during a game no side has a winning advantage, it is also said that "the game is within the draw range"." (Wikipedia).

'Draw range
Scope for a position evaluation, which will lead in the end with the best possible play on both sides to a draw. In the example, White is worse, but is still in the draw range, because he can prevent the pawn from promotion with his king. But if he had the idea to play 1.Kh1, e.g. hoping for 1...f2 and stalemate, he would have left the draw range and Black could now force the victory with best play, namely by 1...Kg4 including gain of the opposition. Whether the starting position of the chess game is in the draw range, or whether perhaps White could force the victory, is too complex to be answered." (www.schwachspieler.de).

Here, the term "draw range" is associated with an ominous "scope for a position evaluation" in the case of a forced draw by certain moves with the best play, which can apparently be proven. In connection with a demonstrable draw, however, even to utter the word "range" is a sign of distorted logic. The draw is 0.00, nothing else. In this case a chess program would have to deliver not only a position evaluation of 0.00, but also one or more draw variants, which are mandatory according to the rules of logic or according to endgame tablebases. This only works in special positions, especially in all maximum 7-man positions, which are completely analysed, all others are simply so complex that one has to be content with a position evaluation between zero and checkmate without being able to draw any compelling conclusions about the outcome of the game. And if a chess program in a real draw position would show a rubbish evaluation differing from 0.00, the program would have a code problem and this would not justify the alogical term "draw range".

If, as usual, a draw would not be provable, one should certainly not use the term "draw range" to lead the reader to believe in would-be knowledge that one cannot have in view of the complexity of a chess game. Then only statistics/probabilistics (the actual topic of this article) govern with regard to all considerations about the outcome of the game and opening databases with win, draw and loss rates of one and the same position can tell a tale about it.

Contact: mail@konrod.info