Evaluation Relevance Reduction

external link:


Stockfish
development versions


the latest versions of Stockfish

internal link:


AquaPGN

AquaPGN - manipulation of PGN files generated by Aquarium

article links:













Preliminary note:
This article was written in the original in German. The English translation comes from the author too. A sufficient quality of the translation cannot be guaranteed.

Move and position evaluations together with NAG and Informator symbols


Chess players usually assess moves and positions on the board by using such symbols as follows:

brilliant move (!!) - NAG $ 3,
impressive move (!) - NAG $ 1,
attractive move (!?) - NAG $ 5,
questionable move (?!) - NAG $ 6,
weak move (?) - NAG $ 2,
miserable move (??) - NAG $ 4,

balanced position or draw (=) - NAG $ 10,
slight advantage for White (⩲ or +/=) - NAG $14,
slight advantage for Black (⩱ or =/+) - NAG $15,
moderate advantage for White (± or +/-) - NAG $16,
moderate advantage for Black (∓ or -/+) - NAG $17,
clear advantage for White (+-) - NAG $18,
clear advantage for Black (-+) - NAG $19,
extreme advantage for White (++-) - NAG $20,
extreme advantage for Black (--+) - NAG $21.

In addition the unclear position (∝) - NAG $13 should be mentioned. Actually it does not belong here because it just states that (supposedly) a position evaluation is not possible.

Reservation: the above descriptions for all these symbols are own creations and of course not binding. More information you will find here.

By the way “NAG” means „Numeric Annotation Glyphs“.

Such move and position assessments are quite practical: they waste little space and at a glance they reveal a rating range. Only the question arises, how such evaluations come about. Rule of thumb? Or is there a bit more accurate way? It would be really an advance if they were defined by any chess program evaluations in pawn units with which chess engines numerically express positional imbalances, that is positional advantages or disadvantages. But where to take such definitions if not steal? From which position evaluation of a chess engine one can, for example, speak of a slight advantage for White, from 0.10 pawn units or from 0.20 – apart from individual over- or understatements of the engines in the level of their evaluations? And how can an objective scale be found here?

In the further course of this article, several mathematically derived proposals with corresponding formulas will be submitted. Before that, however, various statistical and mathematical foundations have to be worked out.

Blunder relevance or more polite: evaluation relevance


Chess programs usually rate positions in hundredths of pawn units, and comparing the spit out variants of the number cruncher in one position reveals the evaluation difference and margin of error respectively between the best and an inferior variant.

But how relevant are actually faulty moves and their evaluation? Example: in a lost position after compensationless loss of the queen one gives away for no reason additionally another figure. The chess program will acknowledge this mishap with a much higher evaluation in favour of the opponent. But how relevant is such a difference between the new and the previous position evaluation in a practically already lost game? Objectively – in other words apart from subjective faulty moves of the opponent – in fact not at all! In all probability the bungler will not be able to save the game even without the recent faulty move with best play on both sides.

This conclusion raises the question as of which evaluation a game can objectively be regarded as won or lost, then. Depends. One could ironically say: the more bungler the higher. The higher the evaluation, the sooner one can count on the fact that the advantage will no longer be messed up, whereby one may invest more confidence with today's chess computer programs than with Homo sapiens. And if you are dealing with a potential patzer, you should not, for example, throw in the towel prematurely in an apparent position of loss, as Kasparov formerly did in the 2nd match against Deep Blue in 1997.

Computer chess statistics


So what to do? One takes leave of the human bungler chess, turns to the strongest computer chess program and analyzes its games under the question from which evaluation this program has won its games - or not. The best program at the moment is Stockfish. It has an ELO of about 3400 and is downloadable. And the most meaningful games can probably be found on the Internet under "TCEC" ("Top Chess Engine Championship"), "Season 9 - 01.05.2016" and "Superfinal 11.11.2016". Reasons: long thinking time, opponent was the second best chess engine “Houdini” and all position evaluations are step by step comprehensible.

What can be read from the 100 games in all of this epic competition?

Stockfish officially won 17 games, but actually only 16. At the end of the 17th game, in which an undeserved victory was wangled for Stockfish, both engines showed an identical evaluation of 0.00 because the position could not be won within 50 moves by Stockfish. Not as officially stated 75, but 76 games thus ended in draws. And in all these games is to be found the highest evaluation which Stockfish indicated in his favour. Mind you, a positive evaluation that could not be realized to win. The highest evaluation of 1.75 - who would have thought it - was in the ominous game 17, which could not be won because of the 50 moves rule. Now look out for the lowest evaluation that appears after 1.75 in all 16 victories. It is 1.79 and even appears in three games: in the 43rd, 55th and 83rd game. In all 16 wins Stockfish achieved at least this evaluation and always won.

By the way 1.75 is confirmed by a small competition Stockfish participated in under the name Brainfish. The highest evaluation in the 7 draws was 1.71. Amazing coincidence!

Finally, a rumour shall be circulated that the Chessbase GUI considers an evaluation of 1.60 to be a presumed win. It seems a little too low.

So is the 1.79 evaluation the statistical point of no return (oint from which no return to the draw takes place)? Not at all. The database of just over 100 games is far too small for a reliable judgement. One investigates for example in the games of the TCEC-Superfinal 6.11.2015 in season 8 between the Engines Stockfish and Komodo. In Game 22, move 62, there is a position which Stockfish evaluated 26.13 (Komodo by the way with 4.22). The alleged win two moves later, when both engines remorsefully reduced their evaluation to 0.00 (Stockfish after 2,873,800 kN), turned out to be a hallucination, more precise a perpetual check.

Even if such outliers happen very rarely, they prohibit the equation of any evaluation (even of 26.13 - as you could see) with win or loss. In other words, there is no point of no return.

Now one must turn to the question, in which evaluation something like a balance between draw and winning games has to be located. It would have to be an evaluation in which, once achieved, all the games concerned are divided into 50% victories and 50% losses. If you take the TCEC game material from the "Superfinal", in which 16 Stockfish wins were recorded as you know, you will find 16 draws in which at least an evaluation of 0.62 can be found. In other words: Stockfish evaluated 0.62 in 32 games and in 16 games each, the result was a draw or 1-0. Once again in other words: an evaluation of 0.62 was statistically half the battle for Stockfish to win.

In June 2017 150 games with a longer thinking time between “Stockfish 090617 64 POPCNT" and "Komodo 11.01 64-bit" were published here. Stockfish won 25 games, Komodo only 6. So Stockfish was still the number 1 at that time. In the 25 draws with the highest Stockfish ratings they ranged between 0.52 (game 43) and 1.85 (game 4). The TCEC results were thus largely confirmed.

In September 2017 the engine "Houdini 6" was released. You can read the following on this website:

„The evaluations have again been calibrated to correlate directly with the win expectancy in the position. A +1.00 pawn advantage gives a 75% chance of winning the game against an equal opponent at blitz time control. At +1.50 the engine will win 90% of the time, and at +2.50 about 99% of the time. To win nearly 50% of the time, you need and advantage of about +0.60 pawn.“

"Houdini" kept his word. In the TCEC Superfinal Season 10 against Komodo, "Houdini" gained 15 victories and in the 15 draws with the highest evaluations of "Houdini", the minimum evaluation was 0.57. An almost precise landing.

Is a reciprocal proportionality possibly indicated with regard to the increasing playing strength of the engines and the win draw balance? Does the increase in playing strength result in a reduction of the win draw balance? If one takes into account the certainly significantly higher win draw balance in grandmaster chess (cf. the references in the chapter "The human factor"), an intruding intellectual approach. The future will prove it.

Let's note: on the way of evaluation between 0.00 and infinity (∞) its relevance decreases continuously. Starting with 100% over 50% at the value of here boldly assumed 0.62, it ends in infinity at 0%. What can be done with that now?

One example: the evaluation for the best move is 2.00. Now a mishap happens: a faulty move because of a figure loss with an evaluation of -1.00. The evaluation difference is -3.00. How relevant is this figure loss? Obviously less than -3.00.

In detail:
between 2.00 and 0.62 the relevance is growing continuously;
at 0.62 it is 50%;
at 0.00 it reaches its maximum value of 100%;
-0.62 is again resulting in 50% and
at -1.00 it ends with a value well below 50%.

Mathematical relevance reduction


The sum of these individual values would now be interesting. By way of calculation feasible, but somewhat complicated. The mathematical adepts have certainly long recognized that this up and down would have to be expressed with a mathematical function, for which the following applies: the more you move away from the y-axis on both sides, the smaller the ordinate, the reduced evaluation at this point on the x-axis, until it finally "touches" the x-axis on both sides in infinity.

Proposal: an exponential function of the general form f(x) = a^(x*b). Such exponential functions have the advantage that the point P(0;1) is always fulfilled and they approach the x-axis in (positive) infinity. Just what we need. Instead of positioning yourself now clearly with the point P(0,62;0,5), insert the general point P(x_0_5;0,5) into the function. "x_0_5" because at this point on the x-axis the evaluation relevance is 50 %, i.e. "0.5". The result of the exponential function is then

f(x) = 0,5^(x/x_0_5).

For the value x_0_5 = 0.62 this looks as follows:

Exponentialfunktion.jpg



This function is not y-axis symmetric and for now only applies to x ≧ 0. You could now use f(x) = 0.5^(|x|/x_0_5) to make it artificially y-axis symmetric, but does not yield any results. More on that later.

The function equation is set. But how is the really relevant evaluation difference calculated over a certain distance on the x-axis, for example between 2.00 and -1.00? The function f(x) only returns the respective y-value of a special point along the x-axis. As ingenious as it is simple: via integral function. All values between the x-axis and the function curve summed, i.e. the area there, between the best evaluation (for example 2.00) and the inferior evaluation (for example -1.00) represent the definite integral of this function - i.e. the relevant evaluation difference.

To calculate the integral, the antiderivative of the function equation is required. According to the computer program Maxima it is:

F(x) = -x_0_5/(ln(2)*2^(x/x_0_5)).

However, it can only be used for x ≧ 0. For the previously mentioned amount function f(x) = (1/2)^(|x|/x_0_5), Maxima cannot determine an antiderivative. But it's no big deal.

Note with the Maxima formula above, that Maxima uses the notation log(x) instead of the usual notation ln(x) for the natural logarithm. By the way also Javascript ("Math.log()"). If Maxima formulas with "log" should be used elsewhere, "log" would have to be replaced by "ln".

The y-axis symmetric antiderivative is

F(x) = -x_0_5/(ln(2)*2^(|x|/x_0_5))

and has for the value x_0_5 = 0.62 the following figure:

Integralfunktion.jpg



The relevant evaluation difference can then be calculated as a definite integral as follows:

in the exclusively negative x-range (x ≦ 0) with F(integral_start) - F(integral_end) and
in the exclusively positive x-range (x ≧ 0) with F(integral_end) - F(integral_start),
where integral_start is the lower value and integral_end is the higher value on the x-axis - irrespective of whether the values are positive or negative.

The complete formula for calculating the integral in Maxima is:

if integral_start<0 and integral_end<=0 then
(x_0_5*2^(integral_end/x_0_5))/log(2)-(x_0_5*2^(integral_start/x_0_5))/log(2)
elseif integral_start>=0 and integral_end>0 then
x_0_5/(log(2)*2^(integral_start/x_0_5))-x_0_5/(log(2)*2^(integral_end/x_0_5))
else
-(x_0_5*2^(integral_start/x_0_5))/log(2)-x_0_5/(log(2)*2^(integral_end/x_0_5))+(2*x_0_5)/log(2)


The evaluation relevance reduction and all the delicacies mentioned in this article (automatic move and position evaluation symbols as well as the probabilistic game results) have been implemented in the AquaPGN program (latest update 25 March 2018), available free of charge on this website.

And those who have a longing for putting the above cryptic formula to a practical test, may take delight in the interactive form below - not without first enabling the execution of Javascript code.

Probabilistic game result


A nice side effect of the above integral calculation is the determination of the probabilistic ("related to probability") game result for each move evaluation.
This requires first of all the definite integral in the range from -∞ to the evaluation of the relevant move (called "x" in the formula) as a dividend. The Maxima formula is

if x<=0 then
(x_0_5*2^(x/x_0_5))/log(2)
else
(-x_0_5/(log(2)*2^(x/x_0_5))+(2*x_0_5)/log(2))

Again the hint: "log" means here the natural logarithm!

Furthermore, the complete definite integral is required in the entire evaluation range from -∞ to +∞ as a divisor. It is (2*x_0_5)/ln(2) or in Maxima (2*x_0_5)/log(2). Strictly speaking, this is a simplified limit value that is never fully reached even with extreme x values on both sides.

Then the probabilistic game result is calculated as a division of both values. The resulting formula does without logarithm and is surprisingly slim:

if x<=0 then 2^(x/x_0_5-1) else 1-2^(-x/x_0_5-1)

So it's a little better structured:
for negative evaluations 2^[(x/x_0_5) - 1] and
for positive evaluations 1 - {2^[(-x/x_0_5) – 1]}

An engine evaluation of exactly 0.00 results in 0.50, thus in a draw. A result of 0.00 would be an almost certain win for Black, a 1.00 an almost certain win for White. And an engine rating of exactly x_0_5, i.e. the win draw balance, leads to a result of 0.75, i.e. a value that lies exactly between win for White and Draw. The results are therefore easier to interpret from White's point of view.

One can therefore conclude that with this simple formula for the probabilistic game results the logarithms were minimized.

Only marginally: This calculation of the probabilistic game outcome could also be achieved quite precisely using a sigmoid function. A candidate for such a function would be:

f(x) = x/(2*sqrt(3*x_0_5^2+x^2))+1/2

Of course, the probabilistic game results can also be found in the interactive form.

Concretisation of the position evaluation ranges


The 9 evaluation ranges listed at the beginning of the article can now be described in more detail using the developed mathematical foundations. Four evaluation ranges in each case are positive and negative. The balanced position shall apply to minimal advantages for White and Black around the value zero. The range of the minimum advantage for White or Black is therefore only 50% of the total balanced range.

Proposal for the 9 evaluation ranges - scheme 5/40 5/40 4/40 4/40 4/40 2/40 2/40 4/40 4/40 4/40 5/40 5/40 of the total integral:

Here an assumption takes place that is not mandatory, but very plausible: The end of the range "moderate advantage for White" and the beginning of the range "clear advantage for White" should coincide exactly with the x value to which the win draw balance applies, for example 0.62. The expression "x_0_5" was used for this value in the above formulas. Conversely for Black: The end of the range "moderate advantage for Black" and the beginning of the range "clear advantage for Black" should coincide exactly with the x value to which the negative win draw balance applies, for example -0.62. With this basic assumption it is clear that a slight or moderate advantage statistically represents a tendency to draw and a clear or extreme advantage statistically represents a tendency to win. By the way: Now it is also understandable why here 4 advantage ranges for each party (according to the NAG guidelines) were preferred to the usually used three advantage ranges.

Now some mathematics again:

The task now is to quantify these individual advantage ranges. For example if one would compare a white move with an overwhelming advantage of 100.00 to a patzer move leading to a draw (0.00), the absolute evaluation difference would be 100.00, but the relevant valuation difference would be only the practically complete definite integral in the exclusively positive range of the x-axis (which in turn is identical to the definite integral in the exclusively negative range of the x-axis). As mathematical formula: x_0_5/ln(2). If x_0_5 = 0.62, this would be 0.89447.

Next thought experiment: If one would now compare a white move with an advantage of x_0_5 exactly at the border between moderate and clear advantage with a patzer move that leads to a draw (0.00), the absolute evaluation difference would be x_0_5, but the relevant valuation difference would only be half of the complete definite integral in the positive range of the x-axis. As mathematical formula: x_0_5/(2*ln(2)). If x_0_5 = 0.62 this would be 0.89447 : 2 = 0.4472.

If one now sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage, between the latter and begin of the moderate advantage and again between the latter and the begin of the clear advantage in each case for White/Black, the integral value of x_0_5/(2*ln(2)) would have to be divided into 3 ranges:

20% = x_0_5/(10*ln(2)) for the range balanced position,
40% = x_0_5/(5*ln(2)) for the range slight advantage for White/Black and
40% = x_0_5/(5*ln(2)) for the range moderate advantage for White/Black.

The following limit values of the position evaluations can now be calculated for White and Black between

balanced position and slight advantage for White/Black:
±(ln(10/9)*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.094;

slight and moderate advantage for White/Black:
±(ln(10/7)*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.319;

moderate and clear advantage for White/Black:
±x_0_5 (who would have thought?),
for x_0_5 = 0.62: ±0.62.

Similar goes now the calculation of the limit value between clear and extreme advantage in each case for White/Black, with only the difference that for the second half of the total integral value on the positive/negative x-axis only two equally sized evaluation ranges are available.

The limit value is: ±2*x_0_5
and in the case of x_0_5 = 0.62: ±1.24

The last two value limits ±x_0_5 and ±2*x_0_5 make things easier extraordinarily: they coincide with the single and double amount respectively of the assumed win draw balance and are known in practice without extensive calculation.

Alternative proposal for the 9 evaluation ranges - scheme 1/9 1/9 1/9 1/9 1/9 1/18 1/18 1/9 1/9 1/9 1/9 of the total integral:

If one discards the above guideline, according to which the end of the range "moderate advantage for White/Black" and the beginning of the range "clear advantage for White/Black" coincide exactly with the valuations for which the positive and negative win draw balance respectively applies, and again prefers 4 positive and negative advantage ranges respectively this time however of the same quantity, the formulas for calculating the limit values would be somewhat more complicated:

between balanced position and slight advantage for White/Black:
±((ln(9)-3*ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.1053;

between slight and moderate advantage for White/Black:
±((ln(3)-ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.362;

between moderate and clear advantage for White/Black:
±((ln(9)-2*ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.7253;
between clear and extreme advantage for White/Black:
±((ln(9)-ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±1.3453.



Alternative proposal for only 7 evaluation ranges - scheme 1/7 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 1/7 of the total integral:

"Extreme advantage for white (+--) or black (-++) - NAG $20/$21" may not be everyone’s cup of tea. For these contemporaries now a mathematical analysis of 3 advantage ranges of equal quantity each for White/Black. The formulas for calculating the limit values are as follows:

between balanced position and slight advantage for White/Black:
±((ln(7/3)-ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.137;

between light and moderate advantage for White/Black:
±((ln(7)-2*ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±0.5005;

between moderate and clear advantage for White/Black:
±((ln(7)-ln(2))*x_0_5)/ln(2),
for x_0_5 = 0.62: ±1.1205.

The interactive form lists the position evaluation symbols and the limit values between the symbols, the latter in a separate line for each of the 3 schemes.

A tip by the way: If the well-disposed reader strives to use the position evaluation symbols, however not getting hold of them, the following link to the AqChessUnicode font could be helpful. This by the way is also attached to the chess GUI Aquarium.

And those who would not be averse from entering these special chess characters directly with the keyboard for commenting in texts and call a Windows operating system their own may find out on this website about the "Keyboard Layout Creator".

Concretisation of the move evaluation ranges


It may seem tasteless to derive in the following the move evaluation symbols quasi automated from engine evaluations, as they are often chosen based on a deeper understanding of the position and are not oriented towards engine evaluations. Example: In a position there is quite clearly only one reasonable move that every child can find, all other moves would be miserable. It would be more than stupid to attest this one move the quality feature ‼. Or a little more subtle: In lost position, a move that is objectively weak, i.e. theoretically refutable, is setting a trap that holds the chance of revival. A typical "interesting move (!?) - NAG $5", which should perhaps not be characterized with "?" or the like. Nevertheless, in many cases it can make sense by all means to determine such move evaluation symbols from a comparison of the engine evaluations for two alternative moves, especially if there is no opportunity to examine a position more carefully, for example in automatic game analyses.

The starting point for the classification of the move evaluation symbol is, of course, the real made move, on the other hand the best alternative move for bad moves and the second-best alternative move for good moves. For these two moves - as explained above - the relevance reduced evaluation difference has to be determined and this in turn has to be translated into the move evaluation symbol. Thereby the definite integral of the entire evaluation range from -∞ to +∞, which is (2*x_0_5)/ln(2), divided into not only 6, but 7 or 8 areas of equal size. There are not only the 6 ranges for which a move rating symbol is to be assigned, but also the neutral range of a move that is approximately equal to the best or second-best move. Half of this neutral range is in the positive evaluation direction and half in the negative. You can either use a neutral range with the same integral size as the remaining ranges or a neutral range twice as large, consisting of 2 ranges with the usual integral size, one for each evaluation direction. This would total to either 7 or 8 integral areas of equal size (on the latter variant 2 integral ranges for the neutral range).

Mind you: We are talking here about integral ranges and sizes respectively in the sense of definite integrals, i.e. the relevant evaluation differences, not to be confused with the absolute differences between 2 move evaluations on the x-axis. For a given relevant evaluation difference, the latter are quite different, depending on where the move evaluations are located on the x-axis. The further they move away from the y-axis, i.e. from the move evaluation 0.00, the more their distance to each other increases with a given relevant evaluation difference, since the definite integrals there turn out flatter.

Mathematically it is even possible - based on a certain win draw balance as well as a given move evaluation (x_old) - to calculate that limit value of a new move evaluation (x_new) which would result in case of a move with any move evaluation symbol. Hard to understand, therefore an example: Given is a faulty move of White with an evaluation of -0.30 and a win draw balance of 0.62. From which evaluation would an alternative good move of White compared to this weak and at the same time next best move deserve the move evaluation symbol"!!"? Depending on the used scheme, the answer will be for example 0.50, 0.56, or 0.21. The following general Maxima formula is used for this:

if x_old<=0 and (x_0_5*2^(x_old/x_0_5))/log(2)+(2*factor_plus_minus*x_0_5)/log(2) <= x_0_5/log(2) then
(x_0_5*2^(x_old/x_0_5))/log(2)+(2*factor_plus_minus*x_0_5)/log(2) = (x_0_5*2^(x_new/x_0_5))/log(2)
elseif x_old<=0 and (x_0_5*2^(x_old/x_0_5))/log(2)+(2*factor_plus_minus*x_0_5)/log(2) > x_0_5/log(2) then
(x_0_5*2^(x_old/x_0_5))/log(2)+(2*factor_plus_minus*x_0_5)/log(2) = (2*x_0_5)/log(2)-x_0_5/(log(2)*2^(x_new/x_0_5)))
elseif x_old>0 and (x_0_5*2^(x_old/x_0_5))/log(2)+(2*factor_plus_minus*x_0_5)/log(2) <= x_0_5/log(2) then
-x_0_5/(log(2)*2^(x_old/x_0_5))+(2*factor_plus_minus*x_0_5)/log(2)+(2*x_0_5)/log(2) = (x_0_5*2^(x_new/x_0_5))/log(2)
else
-x_0_5/(log(2)*2^(x_old/x_0_5))+(2*factor_plus_minus*x_0_5)/log(2)+(2*x_0_5)/log(2) = (2*x_0_5)/log(2)-x_0_5/(log(2)*2^(x_new/x_0_5))

The "factor_plus_minus" depends on the scheme as well as the move evaluation symbol and contains a positive or negative fraction of the total integral, for example with “!!”
1/14 + 1/7 + 1/7 = 5/14,
1/8 + 1/8 + 1/8 = 3/8,
1/20 + 2/20 + 2/20 = 1/4 or
1/12 + 1/12 + 1/12 = 1/4.
With "??" these values would be negative.

In addition to this factor, the concrete values for x_0_5 and x_old must also be entered in the above equation system so that the relevant alternative equation can be found. And finally, a solution of the found equation according to "x_new" is needed, which one can confidently leave to Maxima via "solve(equation, x_new), numer". However, if Maxima is not available, for example because you are currently stuck in another computer program, you can help yourself to the following formulas to determine the relevant evaluation "x_new" if the correct alternative equation is known:

for x_old and x_new <= 0:

x_new = (x_0_5*log(2^(x_old/x_0_5)+2*factor_plus_minus))/log(2)

for x_old <= 0 and x_new > 0:

x_new = (x_0_5*log(-1/(2^(x_old/x_0_5)+2*factor_plus_minus-2)))/log(2)

for x_old > 0 and x_neu <= 0:

x_new = (x_0_5*log(factor_plus_minus*2^(x_old/x_0_5+1)+2^(x_old/x_0_5+1)-1)-log(2)*x_old)/log(2)

for x_old and x_new > 0:

x_new = (x_0_5*log(-2^(x_old/x_0_5)/(factor_plus_minus*2^(x_old/x_0_5+1)-1)))/log(2)

Again, the note that Maxima requires the notation log(x) instead of the usual notation ln(x) for the natural logarithm. By the way also Javascript ("Math.log()"). If the above formulas should be used elsewhere, "log" therefore should be replaced by "ln".

Of course, such move evaluation symbols only come into effect if correspondingly high definite integrals - pardon: relevant evaluation differences - are available at all. A correct move of white with an engine evaluation of 100.00 will hardly earn a"!?","!" or "!!", even if the second-best move is only 10.00. This positive evaluation difference is simply irrelevant and is therefore confirmed with a relevant evaluation difference of almost 0.00. A lost position is and remains lost, even with the best moves. That is just the effect of the evaluation relevance reduction.

How big should these relevant evaluation differences for the move evaluation symbols turn out now? Possibly with the exception of the neutral range, the entire integral area could be divided into equal parts, or the subdivision could be aligned in that way that a brilliant move can already be stated if it exceeds the win draw balance and the next best move has to make do with an evaluation of 0.00. The first alternative deals with the move evaluation symbols more economically, the second is more generous.

Proposal for economical move evaluation symbols:

Here, each of the integral areas, which in turn means nothing else than the relevant evaluation differences, amounts to
in the case of 7 ranges (2*x_0_5)/(7*ln(2)), in the event of x_0_5 = 0.62 therefore 0.255, and
in the case of 8 ranges x_0_5/(4*ln(2)), in the event of x_0_5 = 0.62 thus 0.223.

Therefore in the following interactive form associated with a move is

in the case of scheme 1/7 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 1/7 and 7 equal ranges of the total integral a

“!!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(0.5 + 1 + 1) * (2*x_0_5)/(7*ln(2)) = (5*x_0_5)/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.638;

“!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(0,5 + 1) * (2*x_0_5)/(7*ln(2)) = (3*x_0_5)/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.383;

“!?” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
0,5 * (2*x_0_5)/(7*ln(2)) = x_0_5/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.127;

“?!” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
0,5 * (2*x_0_5)/(7*ln(2)) = x_0_5/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.127;

“?” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(0,5 + 1) * (2*x_0_5)/(7*ln(2)) = (3*x_0_5)/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.383;

“??” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(0.5 + 1 + 1) * (2*x_0_5)/(7*ln(2)) = (5*x_0_5)/(7*ln(2)),
in the case of x_0_5 = 0.62 thus 0.638;

in the case of scheme 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 and 8 equal ranges of the total integral a

“!!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 1 + 1) * x_0_5/(4*ln(2)) = (3*x_0_5)/(4*ln(2)),
in the case of x_0_5 = 0.62 thus 0.671;

“!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 1) * x_0_5/(4*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447;

“!?” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
1 * x_0_5/(4*ln(2)) = x_0_5/(4*ln(2)),
in the case of x_0_5 = 0.62 thus 0.223;

“?!” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
1 * x_0_5/(4*ln(2)) = x_0_5/(4*ln(2)),
in the case of x_0_5 = 0.62 thus 0.223;

“?” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 1) * x_0_5/(4*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447;

“??” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 1 + 1) * x_0_5/(4*ln(2)) = (3*x_0_5)/(4*ln(2)),
in the case of x_0_5 = 0.62 thus 0.671.

The advantage of this scheme is the transition of the move evaluation symbol from"!?" to"!" exactly at the win draw balance as well as from"?!" to"?" exactly at the negative win draw balance, if the evaluation of the comparative move amounts to 0.00. In the case of x_0_5 = 0.62 and execution of a correct move with an evaluation of 0.62, and a next best move with an evaluation of 0.00 for example, the move evaluation symbol will be "!", whereas in the case of an evaluation of 0.61 for the correct move, the move evaluation symbol would turn out with "!?" less euphoric. And another advantage of this scheme is the transition from "!" to "!!" exactly at the double win draw balance as well as from "?" to "??" exactly at the double negative win draw balance if the evaluation of the comparative move is 0.00.

By the way, this scheme almost corresponds to the intention of Grand Master Robert Hübner, who is quoted in the English Wikipedia as follows:

„German grandmaster Robert Hübner prefers an even more specific and restrained use of move evaluation symbols: 'I have attached question marks to the moves which change a winning position into a drawn game, or a drawn position into a losing one, according to my judgment; a move which changes a winning game into a losing one deserves two question marks ...'“

Proposal for generous move evaluation symbols:

Here the individual integral areas are distributed quite differently. In the following interactive form associated with a move is

in the case of scheme 5/20 2/20 2/20 1/20 1/20 2/20 2/20 5/20 each of the total integral a

“!!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 2 + 2) * (x_0_5/(10*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447;

“!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 2) * (x_0_5)/(10*ln(2)) = (3*x_0_5)/(10*ln(2)),
in the case of x_0_5 = 0.62 thus 0.268;

“!?” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
1 * x_0_5/(10*ln(2)) = x_0_5/(10*ln(2)),
in the case of x_0_5 = 0.62 thus 0.089;

“?!” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
1 * x_0_5/(10*ln(2)) = x_0_5/(10*ln(2)),
in the case of x_0_5 = 0.62 thus 0.089;

“?” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 2) * (x_0_5)/(10*ln(2)) = (3*x_0_5)/(10*ln(2)),
in the case of x_0_5 = 0.62 thus 0.268;

“??” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 2 + 2) * (x_0_5/(10*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447;

in the case of scheme 3/12 1/12 1/12 1/12 1/12 1/12 1/12 3/12 each of the total integral a

“!!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 1 + 1) * x_0_5/(6*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447;

“!” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
(1 + 1) * x_0_5/(6*ln(2)) = x_0_5/(3*ln(2)),
in the case of x_0_5 = 0.62 thus 0.298;

“!?” - if between its evaluation and the evaluation of the next best move there is a relevant evaluation difference of at least
1 * x_0_5/(6*ln(2)) = x_0_5/(6*ln(2)),
in the case of x_0_5 = 0.62 thus 0.149;

“?!” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
1 * x_0_5/(6*ln(2)) = x_0_5/(6*ln(2)),
in the case of x_0_5 = 0.62 thus 0.149;

“?” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 1) * x_0_5/(6*ln(2)) = x_0_5/(3*ln(2)),
in the case of x_0_5 = 0.62 thus 0.298;

“??” - if between its evaluation and the evaluation of the best move there is a relevant evaluation difference of at least
(1 + 1 + 1) * x_0_5/(6*ln(2)) = x_0_5/(2*ln(2)),
in the case of x_0_5 = 0.62 thus 0.447.

In the interactive form, the limit values between the symbols are listed after the lines with the schemes, if the algebra provides it. If not, the character string "------" is output. The order of the limit values results from that of the move evaluation symbols !! ! !? ?! ? ?? with the correct or faulty move as the starting point.

The human factor


The win draw balance of about 0.62 or 0.60 applies, mind you, to largely optimal way of chess playing as chess engines like Stockfish practice today, but not for chess players, not even for grandmasters, who also play far too often bullshit and therefore would have to make do with a much higher win draw balance. The objection that this measured value would be pressed again by the blunders of their opponents of the species Homo sapiens won’t work. A suboptimal chess skill cannot be upgraded by the suboptimal opponent's way of playing. Human chess players would have to face the best chess engines to determine their true win draw balance. Grandmasters increasingly keep out of the way of such comparisons in order to escape disgraces. There could be no talk at all of winning any more and draws would also be rare. Only more handicap games remained. They could be used to test how many pawns would have to be taken away from the computer opponent in the initial position in order to enable the remained unshorn master wins and draws to any substantial extent. In this way, the grandmaster's win draw balance could be determined after all.

Since such game material from matches between man and machine is hardly available, there is currently and presumably also for eternal times only left the half-baked possibility to evaluate games between humans. Whereby one should always keep in mind that the resulting scores were diluted by the dubious playing style of the opponent. Forget it.

No sooner said than done by analysis of 144 world championship matches between Karpov and Kasparov in the years 1984 to 1990. The very last game remains unconsidered, since Kasparov settled a draw with Karpov there with a clear advantage, although the win - as it says in chess slang - was only a question of technique. A draw was enough for him to win the world championship title. All games were superficially analyzed by Stockfish with a short reflection time and an average depth of just over 20 half moves.

To make a long story short: Kasparov won 21 times, Karpov 19 times. The 21 and 19 highest evaluations respectively in draw games were between 3.67 and 1.00 for Kasparov and between 7.80 and 1.04 for Karpov. If you like, you can read from this a win draw balance of at least 1.00 …

Oh yes, another peculiarity that hardly comes to light in Stockfish games: despite a positive evaluation of at least 1.26, the game was still set in the sand in 5 games. And 1.26 means at least a clear advantage according to the above specifications. But probably not so much for grandmasters. Kasparov even messed up the 18th match in the 1986 world championship fight despite a clear 3.67! I’d like to see anyone else to do that. So it would be better to replace here the term “win draw balance” with “win draw/loss balance”.









 input - each in pawn units in the format "(-##)#(.##)" 





 results: evaluation differences 

 probabilistic game results 
 in the case of correct move 


 in the case of faulty move 



 position evaluation symbols: +–– +– ± ⩲ = ⩱ ∓ –+ –++
 scheme 5/40 5/40 4/40 4/40 4/40 4/40 4/40 5/40 5/40 



 scheme 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 



 schema 1/7 1/7 1/7 1/7 1/7 1/7 1/7 




  move evaluation symbols: !! ! !? ?! ? ?? 
 2 economical schemes in the case of correct move 



 2 generous schemes in the case of correct move 



 2 economical schemes in the case of faulty move 



 2 generous schemes in the case of faulty move 







Excursus: "Draw range"


At this point the term "draw range", which is wandering around like a ghost from time to time, should be critically scrutinized a little. Because it suggests wrongly that it would coincide with the evaluation range "balanced position or draw (=) - NAG $10". To the reader's chagrin, however, there comes to light a pretty different understanding of this term.

Variant 1:

"Houdini insists on Txc6 and specified at depth 25 an evaluation of 0.76, which probably does not exceed the draw range yet." (Thema "Endspielkönnen gefragt" von Joe Boden Datum 2013-02-09 13:03)

"Therefore one believes with Houdini that a (won) endgame is still in the draw range when it shows +0.80..." (Schachfeld).

This suggests that on the basis of a position evaluation of a chess engine in the low range a statement could be made about the draw outcome of the game. Clearly every game win starts small, namely with a minimal advantage, even maybe after the first move. And if one position the chess engine on the first moves after a game won in this way and let yourself be convinced that the game did by no way start with an initial advantage of significantly more than +0.80, you might start to think long and hard. And the argumentative counter attack by later failures, which are said to have caused the disaster, won’t work, if the patzer is called Stockfish for example and has an ELO of approximately 3500. Take to heart the TCEC loss games of Stockfish. There you will find a lot of games that ended in disaster for this engine despite a negative "draw range" of about -0.76 or -0.80, although it is not well-known for negligently dealing with its positions within the alleged "draw range". Who else but Stockfish should be able to keep such positions in a draw?

Variant 2:

"If during a game no side has a winning advantage, it is also said that "the game is within the draw range"." (Wikipedia).

“Draw range
Scope for a position evaluation, which will lead in the end with the best possible play on both sides to a draw. In the example, White is worse, but is still in the draw range, because he can prevent the pawn from promotion with his king. But if he had the idea to play 1.Kh1, e.g. hoping for 1...f2 and stalemate, he would have left the draw range and Black could now force the victory with best play, namely by 1...Kg4 including gain of the opposition. Whether the starting position of the chess game is in the draw range, or whether perhaps White could force the victory, is too complex to be answered." (www.schwachspieler.de).

Here, the term "draw range" is associated with an ominous "scope for a position evaluation" in the case of a forced draw by certain moves with the best play, which can apparently be proven. In connection with a demonstrable draw, however, even to utter the word "range" is a sign of distorted logic. The draw is 0.00, nothing else. In this case a chess program would have to deliver not only a position evaluation of 0.00, but also one or more draw variants, which are mandatory according to the rules of logic or according to endgame tablebases. This only works in special positions, especially in all maximum 7-man positions, which are completely analysed, all others are simply so complex that one has to be content with a position evaluation between zero and checkmate without being able to draw any compelling conclusions about the outcome of the game. And if a chess program in a real draw position would show a rubbish evaluation differing from 0.00, the program would have a code problem and this would not justify the alogical term "draw range".

If, as usual, a draw would not be provable, one should certainly not use the term "draw range" to lead the reader to believe in would-be knowledge that one cannot have in view of the complexity of a chess game. Then only statistics (the actual topic of this article) govern with regard to all considerations about the outcome of the game and opening databases with win, draw and loss rates of one and the same position can tell a tale about it.






 

Finito ♦ Aus die Maus ♦ Schicht im Schacht ♦ Klappe zu - Affe tot

So long ♦ See You Later, Alligator - In A While, Crocodile ♦ Over And Out