Evaluation Relevance Reduction

D • Start

D • Computer ⇒

D • AutoIt Script
Variablen und
Funktionen

D • Obamas
Geburtsurkunde

D • Medizin ⇒

D • Das indoktrinierte
Gehirn

D • Der Vitamin D-
Betrug

D • Krebs - ein
Überlebens-
mechanismus

D • Organspende -
legale Folter

D • Masern-Virus:
Medizinkritik
contra Lügenpresse

D • Philosophie ⇒

D • Kritik der
Willensfreiheit

D • Epiphänome-
nalismus und
Identitätsansicht

D • Hegel -
was ist Dialektik?

D • Solipsismus -
Zufall der Geburt -
Weltgeist

D • Putnam/Müller -
Linguistische
Blödeleien
über den
Solipsismus

D • Politik ⇒

D • Kritik der Anti-
Kommunismus-
Propaganda

D • Gesetz und Recht

D • 9/11: Waren es
Kernexplosionen?

D • Verfassungs-
widrigkeit des
Maastrichter
Vertrages

D • Aperçus zur
Lügenpresse

D • Schach ⇒

D • Bewertungs-
relevanz-Reduktion

D • PGN-Engine-
Analyzer -
PGN-Engine-
Datei-Analysen

D • Computerschach-
Empfehlungen

D • vom Programm
PGN-Engine-Analyzer
automatisch
kommentierte
Partien
Prolog/Download ⇒

D • Kasparov -
Deep Blue
1997 New York

D • Partien
zur Ansicht
im Browser

D • Kommentierte
Partien
Prolog/Download ⇒

D • Kasparov -
Deep Blue
1997 New York
1. Partie

D • Kasparov -
Deep Blue
1997 New York
2. Partie

D • Behting-Studie

D • PGN4Web Hilfe

D • Tastaturlayout für
Schachkommentare

D • TCEC-PGN-Dateien
für Aquarium
und Scid vs. PC

D • Wissenschaft ⇒

D • Mathematik ⇒

D • Statistik ⇒

D • diskrete
Häufigkeitsverteilung

D • Dreieckshöhen

D • Ebenengleichung

E • computer ⇒

E • AutoIt script
variables and
functions

E • agriculture ⇒

E • passion fruit trees

E • chess ⇒

E • PGN-Engine-
Analyzer -
PGN-Engine-
file analyses

E • by the programme
PGN-Engine-Analyzer
automatically
annotated games
prolog/download ⇒

E • Kasparov -
Deep Blue
1997 New York

E • games
for display
in the browser

E • keyboard layout for
chess annotation

E • TCEC-PGN-files
for Aquarium
and Scid vs. PC

E • medicine ⇒

E • The indoctrinated
brain

E • The vitamin D
fraud

E • cancer

E • myth HIV - AIDS

E • organ donation

E • philosophy ⇒

E • solipsism -
chance of birth -
world soul

E • politics ⇒

E • 9/11 -
nuclear explosions?

E • legal stuff ⇒

E • terms & Conditions

E • website disclaimer

E • copyright notice

E • anti-spam policy

E • linking policy

E • financial disclaimer

E • legal disclaimer

E • medical disclaimer

Sitemap

external link:

Stockfish
development versions

latest versions of Stockfish

internal links:

PGN-Engine-Analyzer

analysis programme for generation of PGN files with engine variations and evaluating them using extensive data

modification of TCEC-PGN-files

in Notepad++ for Aquarium und Scid vs. PC

keyboard layout for chess annotation

with special symbols in Windows programs via AutoHotkey script

article links:

start of article

interactive form

move and position evaluations together with NAG and Informator symbols

blunder relevance or more polite: evaluation relevance

computer chess statistics

engine WDL ERR

user ERR

mathematical relevance reduction

probabilistic game results

concretisation of the move evaluation sectors

concretisation of the position evaluation sectors

the human factor

excursus: "draw range"

Preliminary note:
This article was written in the original in German. The English translation comes from the author too. A sufficient quality of the translation cannot be guaranteed.

Update from February 2025:
new input fields for figure values in a position to be analysed for calculating the WDL evaluations;
the old input fields ‘move number ...’ and 'minimum evaluation ...’ are no longer required;
the Stockfish engine calculates the WDL values on the basis of the position evaluation and the sum of the figure values; this Internet programme now acts in the same way if the WDL values are not entered sufficiently by the user;
the figure value sum can be made available to the programme by the user in 3 different ways, as a sum number to be calculated by him, by specifying the number of different figures or by inserting the position FEN (whereby the first FEN part up to the first gap is sufficient);
the 'minimum evaluation > 0 for White win = 100%' to be entered previously to calculate the WDL evaluations is now calculated automatically by the programme using only the figure value total and displayed under ’data of the engine WDL statistic"; starting from this evaluation for White/Black, positions are statistically considered 100% won/lost and from the negative amount of this evaluation for White/Black 100% lost/won; almost all WDL analyses in the programme are based on this value and gain significantly in precision through it.

Notes to the form:

Not all 18 input fields are to be filled in with parameters. If the programme misses information, various error messages are displayed in red and flashing.

Alternative move evaluation(s) for ⊙ White ⊚ Black:
Important for analysis of the high and suboptimal move evaluation.
High 'move evaluation (-##)#(.##)':
Between -999.99 and 999.99.
Suboptimal 'move evaluation (-##)#(.##)':
Between -999.99 and 999.99. If 'White' is selected, this one must turn out smaller than the high move evaluation, in case of 'Black' it is purely numerical higher on the horizontal x-coordinate axis.
2 x 3 win/remis/loss percent values:
The percent values between 0 and 100 following each of the two move evaluations are not necessary, if a correct entry is made for one of these alternatives in the ‘position figure values - 3 input alternatives’ area. If the programme does not find correct percent values, it then automatically calculates the 3 win/remis/loss percent values. 2 percent values are sufficient, the third is calculated by the programme. Percent values that exceed 100 in total or lead to an evaluation relevance that does not range from 0 to 1 are not accepted by the programme.
'position figure values of the highest evaluation in terms of amount':
If 2 evaluations are entered, the programme requires the piece value sum of the position from which an engine calculated the highest evaluation according to amount. Example: 2 evaluations with -1 and -2. Figure value sum of the position that was evaluated by the engine with -2.
'sum figure values for calculating the WDL analyses (##)# (q = 9 / r = 5 / b = 3 / n = 3 / p = 1)':
Here it is up to the user to calculate the sum of the figure values for White and Black using the following figure values:
queen = 9
rook = 5
bishop = 3
knight = 3
pawn = 1
The starting position has a figure value total of 78.
The Stockfish engine limits the figure value total to a maximum of 78 and a minimum of 17, as does the programme.
If the numerical data in this first alternative is correct, i.e. greater than 0 or less than or equal to 206, the two other input alternatives for the figure value sum are ignored by the programme. Otherwise, it checks the second input alternative.
'alternatively: queens ... rooks ... bishops ... knights ... pawns':
In this alternative, it is up to the user to enter the number of pieces for White and Black. The entry ‘0’ or omitted entries for individual piece fields are permitted, but not for all fields.
Again, if the figures in this second alternative are correct, the programme ignores the third input alternative to the figure value sum. Otherwise it checks the third input alternative.
'alternatively FEN board position insertion':
In this alternative, it is up to the user to insert the FEN of the position to be analysed, whereby the first FEN part up to the first gap is sufficient. This FEN part must contain 8 texts/numbers (interrupted by 7 '/') in order to be valid.
Chess programmes usually provide a function that copies the FEN from a given position to the clipboard so that the clipboard entry can then be transferred to the FEN field using Ctrl-V.
'minimum evaluation > 0 for White win = 100% for calculating the WDL analyses #(.##)':
This value should empirically be around 2.35 when using the Stockfish engine; starting from this evaluation, positions are statistically considered 100% won and from the negative amount of this evaluation 100% lost; almost all WDL analyses in the programme are based on this value and gain significantly in precision through it.
'evaluation at 0.75-game-res.-probab. (e_=0.75) (##)#(.##)' (abbreviated 'e_=0.75'):
This is the evaluation from the point of view of White on the horizontal x-coordinate axis ('x'), where the average probabilistic game result is 0.75(:0.25) in favour of White. In the initial version of this article it was referred to as 'win draw balance'. There, the evaluation relevance on the vertical y-coordinate axis is 0.5.
'evaluat. at 0.75+-game-res.-prob. (e_>0.75) > e_=0.75 (##)#(.##)' (abbreviated 'e_>0.75'):
This is the evaluation from the point of view of white on the horizontal x-coordinate axis ('x') where the average probabilistic game result is higher than 0.75(:0.25) in favour of White. This evaluation is higher than the previous one. Compared to the initialform, it represents a new parameter that helps to achieve additional precision. It corresponds to the following last parameter. The evaluation relevance on the vertical y-coordinate axis is less than 0.5 there.
'1.00 > 0.75-plus-game-result-probability > 0.75 0.#(####)' = 1 - (r_>0.75 / 2):
This represents the average probabilistic game result in favour of White on the vertical y-coordinate axis ('y') in the case of the previously entered evaluation 'e_>0.75'. This result is situated above 0.75(:0.25) and represents also a new parameter compared to the initial form, which helps to additional precision.

Displaying the results requires permission to execute Javascript code in the browser.

Interaktive Bewertungs-Relevanz-Reduktion („BRR“)
Update February 2025

Parameter laden Felder leeren Parameter speichern

Zugbewertung(en) für Weiß Schwarz

Eingaben

Zugbewertungen und etwaige Engine-WDL-Daten

hohe Zugbewertung Zugbewertung (-##)#(,##) Gewinn-% laut Engine-WDL-Statistik 100/(#)#(,#) Remis-% laut Engine-WDL-Statistik 100/(#)#(,#) Verlust-% laut Engine-WDL-Statistik 100/(#)#(,#)

suboptimale Zugbewertung - schlechter als hohe Zugbewertung Zugbewertung (-##)#(,##) Gewinn-% laut Engine-WDL-Statistik 100/(#)#(,#) Remis-% laut Engine-WDL-Statistik 100/(#)#(,#) Verlust-% laut Engine-WDL-Statistik 100/(#)#(,#)

Stellungs-Figurenwerte - 3 Eingabe-Alternativen Summe Figurenwerte zur Berechnung der WDL-Auswertungen (##)# (D = 9 / T = 5 / L = 3 / S = 3 / B = 1)
alternativ: Damen Türme Läufer Springer Bauern
alternativ: FEN-Brettstellung-Einfügung

Parameter der Anwender-Bewertungs-Relevanz-Reduktion

0,75-Partieresultat-Probabilität: Bewertung > 0 Bewertung an 0,75-Partieresultat-Probabil. (e_=0.75) (##)#(,##)

0,75-plus-Partieresultat-Probabilität: Bewertung und 0,75-plus Bewertung an 0,75+-Partier.-Prob. (e_>0.75) > e_=0.75 (##)#(,##) 1,00 > 0,75-plus-Partieresultat-Probabilität > 0,75 0,#(####)

Resultate:

Daten der Engine-WDL-Statistik
Bewertung	Farbe	Quelle	Gewinn-%	Remis-%	Verlust-%	Figurenwerte	Nullstelle	⇐ ‑Nullstelle ⇔ 0 ⇔ Nullstelle ⇒

Bewertungs-Vergleich Anwender/Engine-BRR
probabilistisches Partieresultat = 0,75 Bewertungs-Relevanz = 0,50			probabilistisches Partieresultat = ? Bewertungs-Relevanz = ?
Anwender-BRR	Engine-WDL-BRR	Engine-WDL-BRR ∅	Anwender-BRR	Engine-WDL-BRR	Engine-WDL-BRR ∅

Bewertungsdifferenzen
absolute Bewertungsdifferenz
relevante Bewertungsdifferenz bei Anwender-BRR
relevante Bewertungsdifferenz bei Engine-WDL-BRR

Probabilistische Partieresultate
	Anwender- BRR	Anwender- BRR
	hohe Bewertung	suboptimale Bewertung
aus Sicht von Weiß
aus Sicht von Schwarz

Optimumquote der suboptimalen Zugbewertung
Anwender-BRR
Engine-WDL-BRR

Zugbewertungssymbole (‼ ! !? ?! ? ??) und Grenzwerte
extensives Schema: 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
Zugbewertungssymbol
Grenzwert ! ⇒ ‼
Grenzwert !? ⇒ !
Grenzwert ./. ⇒ !?
Grenzwert ?! ⇐ ./.
Grenzwert ? ⇐ ?!
Grenzwert ?? ⇐ ?

Zugbewertungssymbole (‼ ! !? ?! ? ??) und Grenzwerte
restriktives Schema: 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
Zugbewertungssymbol
Grenzwert ! ⇒ ‼
Grenzwert !? ⇒ !
Grenzwert ./. ⇒ !?
Grenzwert ?! ⇐ ./.
Grenzwert ? ⇐ ?!
Grenzwert ?? ⇐ ?

Stellungsbewertungssymbole und Grenzwerte bei Anwender/Engine-WDL-BRR
Grenzwert-Justierung an identischen Stellungsbewertungssektoren
9 Sektoren: 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
	Stellungsbewertungssymbole
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
	Grenzwerte
	Anwender-BRR		Engine-WDL-BRR
klarer/extremer Vorteil Weiß (+– ⇒ ++–)
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)
klarer/extremer Vorteil Schwarz (––+ ⇐ –+)

Stellungsbewertungssymbole und Grenzwerte bei Anwender/Engine-WDL-BRR
Grenzwert-Justierung an identischen Stellungsbewertungssektoren
7 Sektoren: 1/7 1/7 1/7 1/7 1/7 1/7 1/7
	Stellungsbewertungssymbole
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
	Grenzwerte
	Anwender-BRR		Engine-WDL-BRR
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)

Stellungsbewertungssymbole und Grenzwerte
bei Anwender-BRR: Grenzwert-Justierung an probabilistischen Partieresultaten
bei Engine-WDL-BRR: Grenzwert-Justierung an beiden Bewertungen
9 Sektoren
	Stellungsbewertungssymbole
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
	Grenzwerte
	Anwender-BRR		Engine-WDL-BRR
klarer/extremer Vorteil Weiß (+– ⇒ ++–)
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)
klarer/extremer Vorteil Schwarz (––+ ⇐ –+)

Stellungsbewertungssymbole und Grenzwerte
bei Anwender-BRR: Grenzwert-Justierung an probabilistischen Partieresultaten
bei Engine-WDL-BRR: Grenzwert-Justierung an beiden Bewertungen
7 Sektoren
	Stellungsbewertungssymbole
Farbe	hohe Bewertung		suboptimale Bewertung
	Anwender- BRR	Engine- WDL-BRR	Anwender- BRR	Engine- WDL-BRR
für Bewertung(en) — Weiß/Schwarz irrelevant
	Grenzwerte
	Anwender-BRR		Engine-WDL-BRR
moderater/klarer Vorteil Weiß (± ⇒ +–)
leichter/moderater Vorteil Weiß (⩲ ⇒ ±)
Ausgleich/leichter Vorteil Weiß (= ⇒ ⩲)
Ausgleich/leichter Vorteil Schwarz (⩱ ⇐ =)
leichter/moderater Vorteil Schwarz (∓ ⇐ ⩱)
moderater/klarer Vorteil Schwarz (–+ ⇐ ∓)

Extrem‑Bewertungen im positiven/negativen Bereich
Anwender-BRR	Irrelevanz-Start-Bewertungen
Engine-WDL-BRR	Start-Bewertungen mit Bewertungs-Relevanz = 0

Graph der Anwender-BRR:

Graph der Engine-WDL-BRR:

Move and position evaluations
together with NAG and Informator symbols

Chess players usually assess moves and positions on the board by using such symbols as follows:

brilliant move (‼) - NAG $ 3,
impressive move (!) - NAG $ 1,
attractive move (!?) - NAG $ 5,
questionable move (?!) - NAG $ 6,
weak move (?) - NAG $ 2,
miserable move (??) - NAG $ 4,

balanced position or draw (=) - NAG $ 10,
slight advantage for White (⩲ or +/=) - NAG $14,
slight advantage for Black (⩱ or =/+) - NAG $15,
moderate advantage for White (± or +/-) - NAG $16,
moderate advantage for Black (∓ or -/+) - NAG $17,
clear advantage for White (+-) - NAG $18,
clear advantage for Black (-+) - NAG $19,
extreme advantage for White (++-) - NAG $20,
extreme advantage for Black (--+) - NAG $21.

In addition the unclear position (∝) - NAG $13 should be mentioned. Actually it does not belong here because it just states that (supposedly) a position evaluation is not possible.

Reservation: the above descriptions for all these symbols are own creations and of course not binding. More information you will find here.

By the way 'NAG' means 'Numeric Annotation Glyphs'.

Such move and position assessments are quite practical: they waste little space and at a glance they reveal an evaluation range. Only the question arises, how such evaluations come about. Rule of thumb? Or is there a bit more accurate way? It would be really an advance if they were defined by any chess program evaluations in pawn units with which chess engines numerically express positional imbalances, that is positional advantages or disadvantages. But where to take such definitions if not steal? From which position evaluation of a chess engine one can, for example, speak of a slight advantage for White, from 0.10 pawn units or from 0.20 – apart from individual over- or understatements of the engines in the level of their evaluations? And how can an objective scale be found here?

In the further course of this article, several mathematically derived proposals with corresponding formulas will be submitted. Before that, however, various statistical and mathematical foundations have to be worked out.

Blunder relevance or more polite: evaluation relevance

Chess programs usually rate positions in hundredths of pawn units, and comparing the spit out variants of the number cruncher in one position reveals the evaluation difference and margin of error respectively between the best and an inferior variant.

But how relevant are actually faulty moves and their evaluation? Example: in a lost position after compensationless loss of the queen one gives away for no reason additionally another figure. The chess program will acknowledge this mishap with a much higher evaluation in favour of the opponent. But how relevant is such a difference between the new and the previous position evaluation in a practically already lost game? Objectively – in other words apart from subjective faulty moves of the opponent – in fact not at all! In all probability the bungler will not be able to save the game even without the recent faulty move with best play on both sides.

To take it to the extreme: What is the objective threshold at which a game can objectively be considered won or lost? Depends. One could ironically say: the more bungler the higher. The higher the evaluation, the sooner one can count on the fact that the advantage will no longer be messed up, whereby one may invest more confidence with today's chess computer programs than with Homo sapiens. And if you are dealing with a potential patzer, you should not, for example, throw in the towel prematurely in an apparent position of loss, as Kasparov formerly did in the 2nd match against Deep Blue in 1997.

Computer chess statistics

So what to do? One takes leave of the human bungler chess, turns to the strongest chess engines. Now, in principle, one can take two paths:

Engine WDL ERR:

Since mid-2020, Stockfish has provided win/draw/loss ("WDL" for win-draw-loss) assessment ratios alongside the actual evaluations. In the words of the Stockfish development team:

'Stockfish's "centipawn" evaluation is decoupled from the classical value of a pawn, and is calibrated such that an advantage of "100 centipawns" means the engine has a 50% probability to win from this position in selfplay at fishtest LTC time control.
If the option UCI_ShowWDL is enabled, the engine will show Win-Draw-Loss probabilities alongside its "centipawn" evaluation. These probabilities depend on the engine's evaluation and the material left on the board …'

These WDL statistics or probabilities consider the course of the game in that they take into account the piece value sum in the position to be analysed, whereby the Stockfish engine limits the piece value sum to a maximum of 78 and a minimum of 17. The formulas on which they are based can be found in the Stockfish program code ('win rate model'). The use of these statistics need not necessarily be limited to game analyses made with Stockfish, because this engine is the ultimate in positional analysis and therefore sets the standard for evaluation.

The highlight of the engine WDL statistics is not only the derivation of the evaluation relevancies and differences, optimum rates, probabilistic game results, move and position evaluation symbols including thresholds discussed in this article in a similar way as in the User ERR presented below. The average values resulting from it (cf. 'evaluation comparison user/engine ERR', line 3, columns 3 and 6 in the programme) can provide valuable clues for adjusting the parameters of the User ERR.

By the way, here is a little programme trick: Entering '0' (zero) in the two move evaluation fields and the 'sum figure values …' field deletes the programme's internal memory for these two average values, which are retained when the parameters are loaded and saved.

Determining of the absolute value of the minimum evaluation at which the engine WDL predicts a one hundred per cent win in a chess position with a certain sum of piece values for White and Black: When using the Stockfish engine in April 2025 (subject to programme changes), it is mathematically exactly a minimum of 2.285243936060993 for a piece value sum of up to 17 and a maximum of 3.138276944691337 for a piece value sum of 78 or more (starting position). From these evaluations onwards, positions for White are statistically considered 100% won and from the negative amount of this evaluation onwards 100% lost. Why actually 'zero'? Because with this positive and negative evaluation, the mathematical WDL evaluation relevance functions have a zero. In other words: From these zeros onwards, the relevance of evaluations is zero (absolute 'evaluation relevance reduction')!

Experimentally it was found that for evaluations of 1.27 on average, the probabilistic game result for white is 0.75 and for evaluations of 1.90 on average, the probabilistic game result for white is 0.875.

User ERR:

The traditional variant presented in this article is to analyse the engine games by asking at what evaluation these programmes won their games - or not. The most meaningful games can probably be found on the Internet under 'TCEC' ('Top Chess Engine Championship') in the 'Superfinals'. Reasons: long thinking time, opponents were the two apparently best chess engines in each case and all position evaluations are step by step comprehensible.

From a statistical point of view, is there a kind of 'point of no return', an evaluation - apart from a concrete mate announcement, of course - from which the victory is undoubtedly settled and a drawing liquidation can no longer be considered? Theoretically no. The following table shows that chess engines were not able to convert in various TCEC Superfinals evaluations of up to 5.01 into victories. And nobody is able to say where the absolute evaluation limit for such evaluation errors - best game assumed in the following moves - can lie, since nobody is allowed to determine this limit with an infinite number of test games.

Even if such outliers happen very rarely, they prohibit the equation of any evaluation (even of 5.01 - as you could see) with win or loss. In other words, there is no evaluation generated 'point of no return'.

Now one must turn to the question, in which evaluations special average game results have to be located. Of particular interest seem to be evaluations where, once achieved, the average result of all games concerned amounts to 0.75 (from White's point of view). Such a value can be achieved by an equal number of wins and draws or by a number of defeats and a triple number of wins. For the sake of completeness, losses are also mentioned here, although they rarely occur when this special balance evaluation is reached.

For clarification first of all the results of the Superfinals in the Seasons 9 ff. in tabular form as well as the FIDE Candidates' Tournament 2018 with the evaluations of Stockfish 8 with a 30 seconds thinking time to be found on 'www.chessbomb.com'.

tournament	analysis engine	wins	evaluation e_=0.75 at average game result 0.75(:0.25)	maximum evaluation e_>0.75 without win	average game result at maximum evaluation e_>0.75 without win	alternative pair of values: evaluation e_>0.75 / average game result ≅ 0.875
9	Stockfish	16	1.75	0.62
10	Houdini	15	2.00	0.66
12	Stockfish	29	1.48	0.52
13	Stockfish	16	2.79	1.14
14	Stockfish	10	2.42	1.45
FIDE Candidates' Tournament 2018	Stockfish 8	20	0.67	16.68	0.9762	2.39 / 0.8800
Superfinal TCEC No. 16	Stockfish 19092522	14	1.24	3.33	0.9667	1.65 / 0.8684
Superfinal TCEC No. 16	AllieStein v0.5-dev_7b41f8c-n11	5	3.96	8.18	0.9167	8.03 / 0.8571
Superfinal TCEC No. 17	LCZero v0.24-sv-t60-3010	17	1.34	5.01	0.9722	1.89 / 0.8810
Superfinal TCEC No. 17	Stockfish 20200407DC	12	1.49	2.76	0.9615	1.89 / 0.8750
Superfinal TCEC No. 18	Stockfish 202006170741	23	0.87	3.74	0.9792	1.41 / 0.8710
Superfinal TCEC No. 18	LCZero v0.25.1-svjio-t60-3972-mlh	16	0.69	2.12	0.9706	1.57 / 0.8636
Superfinal TCEC No. 19	LCZero v0.26.3-rc1_T60.SV.JH.92-190	9	0.69	2.72	0.9500	0.81 / 0.8750
Superfinal TCEC No. 19	Stockfish 202009282242_nn-baeb9ef2d183	18	1.05	3.72	0.9737	1.51 / 0.8750

A spreadsheet created with LibreOffice in German to calculate the above TCEC superfinal values 16 to 19 is available for download:

LibreOffice spreadsheet TCEC values
ODS-Datei [52 KB]: spreadsheet TCEC values

The above analysis explained using the example of Superfinal No. 17 and the winning engine LCZero v0.24-sv-t60-3010:

LCZero won 17 games. 83 games thus ended in draws or losses for LCZero. And in all these games is now the 17th lowest evaluation to look for which LCZero indicated in his favour. Mind you, a positive evaluation that could not be realized to win. So you count the 17 highest evaluations and the lowest of them is 1.26. Therefore 17 draws or losses exist in which in each case an evaluation of at least 1.26 is encountered. In other words: LCZero achieved in 34 games an evaluation of 1.26 and in 17 games respectively, the result was a draw/loss or 1-0.

But a small complication is included in these numbers: LCZero had to acknowledge defeat in the 16th game, although it had already spat out an evaluation of 1.89 and 1.89 is situated above the previously determined evaluation threshold of 1.26. Because of this '0'- result, it is not possible to determine an average game result of 0.75 based on the actual results. Because this amounts to

$\frac{(17 \cdot 1) + (16 \cdot 0.5) + (1 \cdot 0)}{34} = 0.7353$

instead of 0.75. If the real numbers are stubborn, mathematics must intervene. The formula for the average game results between 0.5 and 0.75 on the y-coordinate axis is a linear function and is

$average game result = \frac{(2 \cdot e_{=0.75}) + evaluation}{4 \cdot e_{=0.75}}$

Sought-after is the ominous 0.75 game result evaluation (abbreviated 'e_=0.75'). So it must be transformed:

$e_{=0.75} = \frac{evaluation}{(4 \cdot average game result) - 2}$

In the present LCZero case, therefore, it is to be calculated:

$e_{=0.75} = \frac{1.26}{(4 \cdot 0.7353) - 2} = 1.3387$

The result is situated slightly above the actual e_=0.75, which was to be expected.

In September 2017 the engine Houdini 6 was released. You can read the following on this website:

'The evaluations have again been calibrated to correlate directly with the win expectancy in the position. A +1.00 pawn advantage gives a 75% chance of winning the game against an equal opponent at blitz time control. At +1.50 the engine will win 90% of the time, and at +2.50 about 99% of the time. To win nearly 50% of the time, you need and advantage of about +0.60 pawn.'

Houdini kept his word. In the TCEC Superfinal Season 10 against Komodo, Houdini gained 15 victories and in the 15 draws or losses with the highest evaluations of Houdini, the minimum evaluation was 0.57. An almost precise landing.

The above table allows the cautious conclusion to be drawn that the Stockfish versions used since the TCEC Superfinal 13 give significantly higher evaluations than their previous versions. One thing should not fall by the wayside when interpreting these results: Stockfish 10 was given a 'contempt' of 0.24 (Stockfish 9: 0.20), which should raise the respective evaluation. It therefore seems obvious to subtract this contempt margin from the evaluation thresholds listed in the table for one's own analysis purposes.A tip, however, is allowed: Analyses with Stockfish should only be carried out with ‘contempt’ switched off in order not to artificially drive up the evaluations.

Finally it should be noted that the TCEC website has recently been updated with win draw probabilities and locates for the engine Stockfish the e_=0.75 with around 1.56 (Superfinal 17) or even 1.91 (Superfinal 18). In view of the previous table a quite plausible value. However, it is critical that there only percentages for 'W' (win?) and 'D' (draw? - 100% - 'W' percentage) are given, but that the loss probability is swept under the carpet. The TCEC-e_=0.75 of 1.56 assumed above is necessarily based on the assumption that 'D' also includes the probability of loss.

Mathematical evaluation relevance reduction

Let's note: on the way of evaluation between 0.00 and infinity (∞) its relevance decreases continuously. Starting at 100% in the case of a 0.00 evaluation over 50% at the e_=0.75 evaluation (the TCEC value of 1.56 is assumed below as an example for clarification) it ends at infinity with 0%.

One example: the evaluation for the best move is 2.00. Now a mishap happens: a faulty move because of a figure loss with an evaluation of -3.00. The absolute evaluation difference is -5.00. How relevant is this figure loss? Obviously less than -5.00.

In detail:
between the evaluations 2.00 and 1.56 the relevance of less than 50% is growing continuously;
at 1.56 it should amount to 50%; for this is the mean value between 100% and 0%; furthermore, the probabilistic game result of 0.75 at the evaluation 1.56 is the mean value between 0.5 at the 0.00 evaluation and 1 at a maximum engine evaluation;
at 0.00 the relevance reaches its maximum value of 100%;
-1.56 is again resulting in 50% and
at -3.00 it ends with a value well below 50%.

The sum of these percentages would now be interesting. By way of calculation doable, but somewhat complicated. The mathematical adepts have certainly long recognized that this up and down would have to be expressed with a mathematical function, for which the following applies: The more one moves away from the y-axis on both sides, the smaller the ordinates, the respective evaluation relevance amounts along these points on the x-axis, until they finally approach the x-axis on both sides at infinity as asymptotes. The x-axis thus represents the evaluations (on the part of an engine), the y-axis the evaluation relevance amounts.

At this point an exponential function of the general form f(x) = a^(x*b) was proposed in the first article version. Such exponential functions have the advantage that the point P(0;1) is always fulfilled and they approach the x-axis in (positive) infinity. The disadvantage of such a function, however, includes the fact that it can only determine 2 points, the already mentioned point P(0;1) and the point P(e_=0.75;0.5). However, a further definition point P(e_>0.75;r_>0.75) would be urgently needed for better precision, for example to be able to capture the highest TCEC engine evaluations without win and the corresponding game results which are far above 0.75.

Solution: 3 equations for 3 negative and 3 positive sectors along the x-axis (x stands for engine evaluation):

1st positive and negative sector:

$y_{Rel} = 1 - \frac{| x |}{2 \cdot e_{=0.75}}$ linear equation with 𝔻 {x | -e_=0.75 ≤ x ≤ e_=0.75}

2nd positive and negative sector:

$y_{Rel} = - \frac{2 \cdot e_{=0.75} \cdot r_{>0.75} - 2 \cdot | x | \cdot r_{>0.75} - e_{>0.75} + | x |}{2 \cdot (e_{>0.75} - e_{=0.75})}$ linear equation with 𝔻 {x | -e_>0.75 ≤ x ≤ -e_=0.75 or e_=0.75 ≤ x ≤ e_>0.75}

3rd positive and negative sector:

$y_{Rel} = {r_{>0.75}}^{\frac{| x |}{e_{>0.75}}}$ exponential equation with 𝔻: {x | -∞ < x ≤ -e_>0.75 or e_>0.75 ≤ x < ∞}

The evaluation relevance functions are set. But how is the really relevant evaluation difference calculated over a certain distance on the x-axis, for example between 2.00 and -3.00? The evaluation relevance function only returns the respective y-value of a special point along the x-axis. As ingenious as it is simple: via integral function. All values between the x-axis and the function curve summed, i.e. the area there, between the best evaluation (for example 2.00) and the inferior evaluation (for example -3.00) represent the definite integral of this function - i.e. the relevant evaluation difference.

To calculate the integral, the antiderivatives of the evaluation relevance functions are required. These are as follows:

1st positive and negative sector:

$y_{Int} = x - \frac{x \cdot | x |}{4 \cdot e_{=0.75}}$ quadratic equation with 𝔻 {x | -e_=0.75 ≤ x ≤ e_=0.75}

2nd positive and negative sector:

$y_{Int} = - \frac{x \cdot (4 \cdot e_{=0.75} \cdot r_{>0.75} - 2 \cdot | x | \cdot r_{>0.75} - 2 \cdot e_{>0.75} + | x |)}{4 \cdot (e_{>0.75} - e_{=0.75})}$ quadratic equation with 𝔻 {x | e_=0.75 ≤ x ≤ e_>0.75}

3rd positive sector:

$y_{Int} = \frac{e_{>0.75} \cdot {r_{>0.75}}^{\frac{x}{e_{>0.75}}}}{\ln (r_{>0.75})}$ exponential equation with 𝔻 {x | e_>0.75 ≤ x < ∞}

3rd negative sector:

$y_{Int} = - \frac{e_{>0.75}}{{r_{>0.75}}^{\frac{x}{e_{>0.75}}} \cdot \ln (r_{>0.75})}$ exponential equation with 𝔻 {x | -∞ < x ≤ -e_>0.75}

Note with the equations above, that the computer program Maxima uses the notation log(x) instead of the usual notation ln(x) for the natural logarithm. By the way also Javascript ('Math.log()'). If the above equations with 'ln' should be used in such programs, 'ln' would have to be replaced by 'log'.

If you experiment with the interactive form above, you will soon realize that in extreme evaluations the relevant evaluation difference hardly changes when these evaluations are entered even more extreme. Example for White:
high evaluation = 15
suboptimal evaluation = 0
e_=0.75 = 2
e_>0.75 = 3
probabilistic game result at e_>0.75 = 0.85 (corresponds to a r_>0.75 = 0.3)
result of the relevant evaluation difference = 2.64

If the high evaluation is increased to 18, the relevant evaluation difference amounts to 2.65. And a high evaluation of 1000 results in a relevant evaluation difference of 2.65. The same results occur if the suboptimal evaluation is -15, -18 or -1000 and the high evaluation amounts to 0.

The relevant evaluation differences are rounded up or down to 2 decimal places in the form. If you now want to calculate the high or suboptimal evaluation (in future called ‘irrelevance start evaluation’), from which every further increase or reduction to infinity will lead to an increase of the relevant evaluation difference (with 2 decimal places) by 0.01 at some point with a maximum probability of 50%, you need the following formula:

$- \frac{e_{>0.75} \cdot {r_{>0.75}}^{\frac{irrelevance start evaluation}{e_{>0.75}}}}{\ln (r_{>0.75})} = 0.005$

solved according to irrelevance start evaluation and taking into account high and low results ("±"):

$irrelevance start evaluation = \pm \frac{e_{>0.75} \cdot \ln (- \frac{\ln (r_{>0.75})}{200 \cdot e_{>0.75}})}{\ln (r_{>0.75})}$

The result with the above parameters amounts to ±15.477.

The formula shows that the irrelevance start evaluation is independent of the 2nd evaluation (0 in the above case) and of e_=0.75 (2 in the above case).

This formula normally applies to the localization of the irrelevance start evaluation in the 3rd positive and negative sector. With unusual values of e_>0.75 and r_>0.75, the irrelevance start evaluation slips into the 2nd positive and negative sector, so that far more complicated formulas are required. This happens when the following applies:

$e_{>0.75} < - \frac{0.005 \cdot \ln (r_{>0.75})}{r_{>0.75}}$

For example, if e_>0.75 < 0.978 and the probabilistic game result at e_>0.75 = 0.99. Or if e_>0.75 < 0.0277 and the probabilistic game result at e_>0.75 = 0.875. Highly unrealistic!

The evaluation relevance reduction and all the delicacies mentioned in this article (automatic move and position evaluation symbols as well as the probabilistic game results) have been implemented
in the ScpcPGN program, available free of charge on this website
and in the AquaPGN program (latest update 12th August 2020), available free of charge on this website.

Probabilistic game results

Why is there talk of 'probabilistic' game results? Because they are derived from an engine evaluation and other parameters and therefore contain a stochastic statement about the presumed average game outcome. The situation was different in the discussion of the TCEC results, where only the 'average' game results were mentioned, because there was game material available with which the factual average game results could be calculated.

The probabilistic game result is always presented here from the point of view of White. If White wins, the result is 1-0, vice versa 0-1, and in the case of a draw ½-½. If you take the leading number in each case, you have the probabilistic game result used here.

It can be derived directly from the evaluation relevance:

for positive evaluations:
probabilistic game result = 1 - (evaluation relevance / 2);

for negative evaluations:
probabilistic game result = evaluation relevance / 2.

An engine evaluation of exactly 0.00 with an evaluation relevance of 1.00 results in a probabilistic game result of 0.50, i.e. a presumed draw. A probabilistic game result of approximately 1.00 would be an almost certain win for White, and one of approximately 0.00 would be an almost certain win for Black. 1.00 and 0.00 are mathematically never exactly reached. And an engine rating of exactly e_=0.75 leads to a result of 0.75, i.e. a value that lies exactly between win for White and Draw. The results are therefore easier to interpret from White's point of view.

Clarification: the probabilistic game result is in no way equivalent to a win probability.

Many people make this mistake. For example, the programme Nibbler manages to confuse the - in reality - probabilistic game result with the 'Winrate', although in the starting position after 1. e4, for example, this 'Winrate' exceeds 50%, while the actual win probability in the 'WDL' display is only a modest 15%. But the programme author apparently does not notice this.

It applies lapidary:

$win probability = probabilistic game result - \frac{draw probability}{2}$

In order to fulfill the duty of a chronicler also the game result equations:

1st positive und negative sector:

$y_{PGR} = \frac{1}{2} + \frac{x}{4 \cdot e_{=0.75}}$ linear equation with 𝔻 {x | -e_=0.75 ≤ x ≤ e_=0.75}

2nd positive sector:

$y_{PGR} = \frac{2 \cdot e_{=0.75} \cdot r_{>0.75} - 2 \cdot x \cdot r_{>0.75} + 3 \cdot e_{>0.75} - 4 \cdot e_{=0.75} + x}{4 \cdot (e_{>0.75} - e_{=0.75})}$ linear equation with 𝔻 {x | e_=0.75 ≤ x ≤ e_>0.75}

2nd negative sector:

$y_{PGR} = - \frac{2 \cdot e_{=0.75} \cdot r_{>0.75} + 2 \cdot x \cdot r_{>0.75} - e_{>0.75} - x}{4 \cdot (e_{>0.75} - e_{=0.75})}$ linear equation with 𝔻 {x | -e_>0.75 ≤ x ≤ -e_=0.75}

3rd positive sector:

$y_{PGR} = - \frac{{r_{>0.75}}^{\frac{x}{e_{>0.75}}} - 2}{2}$ exponential equation with 𝔻 {x | e_>0.75 ≤ x < ∞}

3rd negative sector:

$y_{PGR} = \frac{1}{2 \cdot {r_{>0.75}}^{\frac{x}{e_{>0.75}}}}$ exponential equation with 𝔻 {x | -∞ < x ≤ -e_>0.75}

Of course, the probabilistic game results can also be found in the interactive form.

How you should not do it though:

Sune Fischer and Pradu Kannan have examined the mathematical relation between 'winning probability W and the pawn advantage P' in the article 'Pawn Advantage, Win Percentage, and Elo'.

Whether 'winning probability' really means the real (lower) ‘winning probability’ or perhaps only the (higher, since draws are taken into account) probabilistic game result can be deduced from the article elsewhere:

'When applying the condition that the win probability is 0.5 if there is no pawn advantage …'

If ‘the win probability is 0.5’ and the 'pawn advantage' is zero, the loss probability would necessarily also have to be 0.5 in order to evaluate the position as balanced. But where then are the draws, which should approach with a win probability of 50% this mark, with low loss probability?! It seems that the authors' knowledge of chess game is quite limited. This nonsense must therefore be corrected to the effect that the authors are not referring to the 'win probability’ but to the probabilistic game result discussed in this article, which includes draws and losses. This is how the calculation works: A probabilistic game result of 0.5 is equivalent to an evaluation – or if you like a 'pawn advantage' – of 0.00.

'Data was taken from a collection of 405,460 computer games in PGN format. Whenever exactly 5 plys in a game had gone by without captures, the game result was accumulated twice in a table indexed by the material configuration. … Only data pertaining to the material configuration was taken. This was considered reasonable because the material configuration is the most important quantity that affects the result of a game.'

If by 'material configuration' the material balance is meant as the difference of the mutual figure values is to be assumed, because it is stated elsewhere:

'For each material configuration, a pawn value was computed using conventional pawn-normalized material ratios that are close to those used in strong chess programs (P=1, N=4, B=4.1, R=6, Q=12).'

Apart from the fact that these figure values seem to be quite generous, the material balance is very coarse compared to the evaluations of chess engines, which are based on much more difficile criteria and last but not least on considerable search depths. But all this would still be bearable if the relation between win probability and figure balance presented by the authors were stringent. Meanwhile, an ominous parameter 'K' appears in their ultimate formula:

$W = \frac{1}{1 + 10^{\frac{- P}{K}}} or y_{PGR} = \frac{1}{1 + 10^{\frac{- x}{K}}}$

And they estimate this parameter 'K' at '4' – roughly.

If you resolve this formula to K, you get:

$K = \frac{\ln (10) \cdot P}{\ln (- \frac{W}{W - 1})} or K = \frac{\ln (10) \cdot x}{\ln (- \frac{y_{PGR}}{y_{PGR} - 1})}$

And if you insert into this formula, for example, the Ps and Ws determined above for the winning engines of TCEC 17 (LCZero) and 18 (Stockfish), you get very different Ks between 1.7 and 3.2.

Conversely, a K of 4 with a probabilistic game result of 0.75 would result in an evaluation of 1.91, which is not very realistic according to the above table values. This assessment is confirmed by the following test: Determine within the Stockfish WDL calculation the evaluations for different half-moves, each with a probabilistic game result of 0.75. One obtains
in half-move 1 an evaluation of 1.50,
in half-move 10 an evaluation of 1.40,
in half-move 100 an evaluation of 1.15
and never an evaluation of 1.91.

Obviously, it is illusory to try to mathematically force the desired relation into a single sigmoid function with only one parameter ('K'). In contrast, the form 'Interactive Evaluation Relevance Reduction' presented at the beginning of this article works to calculate the probabilistic game results in the user ERR with a total of 5 formulas and 3 parameters and in the Stockfish WDL with very accurate win, draw and loss probabilities. Precision instead of simplification!

Concretisation of the move evaluation sectors

It may seem tasteless to derive in the following the move evaluation symbols quasi automated from engine evaluations, as they are often chosen based on a deeper understanding of the position and are not oriented towards engine evaluations. Example: In a position there is quite clearly only one reasonable move that every child can find, all other moves would be miserable. It would be more than stupid to attest this one move the quality feature '‼'. Or a little more subtle: In lost position, a move that is objectively weak, i.e. theoretically refutable, is setting a trap that holds the chance of revival. A typical "interesting move (!?) - NAG $5", which should perhaps not be characterized with "?" or the like. Nevertheless, in many cases it can make sense by all means to determine such move evaluation symbols from a comparison of the engine evaluations for two alternative moves, especially if there is no opportunity to examine a position more carefully, for example in automatic game analyses.

The intention of Grandmaster Robert Hübner cannot be followed in this way. In the English-speaking Wikipedia he is quoted as follows:

'German grandmaster Robert Hübner prefers an even more specific and restrained use of move evaluation symbols: 'I have attached question marks to the moves which change a winning position into a drawn game, or a drawn position into a losing one, according to my judgment; a move which changes a winning game into a losing one deserves two question marks ...‘'

Uncertain assessments such as 'winning position', 'drawn game', 'drawn position' or 'losing one' do not become more suitable for programming by the addition 'according to my judgment'.

The starting point for the classification of the move evaluation symbol is, of course, the real made move, on the other hand the best alternative move for bad moves and the second-best alternative move for good moves. For these two moves - as explained above - the relevance reduced evaluation difference has to be determined and this in turn has to be translated into the move evaluation symbol. Thereby the definite integral of the entire evaluation range from -∞ to +∞ divided into not only 6, but 7 or 8 sectors of equal size. There are not only the 6 sectors for which a move evaluation symbol is to be assigned, but also the neutral sector of a move that is approximately equal to the best or second-best move. Half of this neutral sector comes in the positive evaluation direction and half in the negative. One can either use a neutral sector with the same integral size as the remaining sectors or a neutral sector twice as large, consisting of 2 sectors with the usual integral size, one for each evaluation direction. This would total to either 7 or 8 equal integral areas (on the latter variant 2 integral sectors for the neutral sector).

Mind you: We are talking here about integral sectors and sizes respectively in the sense of definite integrals, i.e. the relevant evaluation differences, not to be confused with the absolute differences between 2 move evaluations on the x-axis. For a given relevant evaluation difference, the latter are quite different, depending on where the move evaluations are located on the x-axis. The further they move away from the y-axis, i.e. from the move evaluation 0.00, the more their distance to each other increases with a given relevant evaluation difference.

Mathematically it is even possible - based on von e_=0.75, e_>0.75, r_>0.75 as well as a given move evaluation - to calculate that limit value of a new move evaluation which would result in case of a move with any move evaluation symbol. Hard to understand, therefore an example: Given is a faulty move of White with an evaluation of -0.30 and a e_=0.75 of 1.50, a e_>0.75 of 3.00 and a probabilistic game result at e_>0.75 of 0.875. From which evaluation would an alternative good move of White compared to this weak and at the same time next best move deserve the move evaluation symbol '‼'? Depending on the used scheme, the answer will be for example 1.52 or 1.62.

Of course, such move evaluation symbols only come into effect if correspondingly high definite integrals - pardon: relevant evaluation differences - are available at all. A correct move of white with an engine evaluation of 100.00 will hardly earn a'!?','!' or '‼', even if the second-best move is only 10.00. This positive evaluation difference is simply irrelevant and is therefore confirmed with a relevant evaluation difference of almost 0.00. A won position can usually not be spoiled with the second best moves. That is just the effect of the evaluation relevance reduction.

How big should these relevant evaluation differences for the move evaluation symbols turn out now? Possibly with the exception of the neutral sector, the entire integral area could be divided into equal parts, or the subdivision could be aligned in that way that a brilliant move can already be stated if it exceeds the win draw balance and the next best move has to make do with an evaluation of 0.00. The first alternative deals with the move evaluation symbols more economically, the second is more generous.

Extensive move evaluation symbol scheme
1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7:

Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is

brilliant move (‼) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a better evaluation,
impressive move (!) - 1/14 + 1/7 = 3/14 of the total integral towards a better evaluation,
attractive move (!?) - 1/14 of the total integral towards a better evaluation,
questionable move (?!) - 1/14 of the total integral towards a suboptimal evaluation,
weak move (?) - 1/14 + 1/7 = 3/14 of the total integral towards a suboptimal evaluation and
miserable move (??) - 1/14 + 1/7 + 1/7 = 5/14 of the total integral towards a suboptimal evaluation.

From this, the thresholds of the move evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code.

Generally, move evaluation symbols are assigned more generously here than in the following scheme '1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8'.

Restrictive move evaluation symbol scheme
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8:

Here the relevant evaluation difference between the initial evaluation and the threshold for reaching the move evaluation symbol is

brilliant move (‼) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a better evaluation,
impressive move (!) - 1/8 + 1/8 = 1/4 of the total integral towards a better evaluation,
attractive move (!?) - 1/8 of the total integral towards a better evaluation,
questionable move (?!) - 1/8 of the total integral towards a suboptimal evaluation,
weak move (?) - 1/8 + 1/8 = 1/4 of the total integral towards a suboptimal evaluation and
miserable move (??) - 1/8 + 1/8 + 1/8 = 3/8 of the total integral towards a suboptimal evaluation.

Generally, move evaluation symbols are assigned less generously here than in the previous scheme '1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 1/7'.

In the interactive form, the thresholds between the symbols in both scheme tables are listed, as far as algebra allows it, i.e. as far as the margin of relevant evaluation difference remaining after the initial evaluation allows an award. If not, the character string '-----' is output.

Optimum rate:

In the results under the form 'Interactive Evaluation Relevance Reduction' you will also find the 'optimum rate of the suboptimal move evaluation'. This contains the precise numerical expression for the move evaluation symbol of the suboptimal move evaluation (nothing, ?!, ?, ??).

It is calculated as follows:

1 - (relevant evaluation difference / total integral)

The total integral is the definite integral over the entire x-axis with the evaluations from -∞ to +∞.

So the optimum rate is regularly below 100% and reaches the optimum of 100% only exceptionally with 2 move evaluations without relevant evaluation difference.

Concretisation of the position evaluation sectors

The 9 evaluation sectors listed at the beginning of the article can now be described in more detail using the developed mathematical foundations. Four evaluation sectors in each case are positive and negative. The balanced position shall apply to minimal advantages for White and Black around the value zero. The sector of the minimum advantage for White or Black is 50% of the total balanced sector.

9 position evaluation sectors with threshold adjustment at the probabilistic game results:

Here an assumption takes place that is not mandatory, but very plausible: The end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' should coincide exactly with the evaluation e_=0.75 for which the probabilistic game result amounts to 0.75. Conversely for Black: The end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' should coincide exactly with the evaluation -e_=0.75, for which the probabilistic game result amounts to 0.25 from White's point of view. With this basic assumption is accompanied that a slight or moderate advantage probabilistically represents a tendency to draw and a clear or extreme advantage probabilistically represents a tendency to win.

Further assumption: The end of the sector 'clear advantage for White' and the beginning of the sector 'extreme advantage for White' should exactly coincide with the evaluation e_>0.75. Conversely for Black: The end of the sector 'clear advantage for Black' and the beginning of the sector 'extreme advantage for Black' should coincide exactly with the evaluation -e_>0.75.

When using this scheme, it would probably be useful to adjust the probabilistic game result at e_>0.75 to 0.875, so that it lies exactly in the middle between 0.75 and 1.00.

Now some mathematics again:

The task now is to quantify these individual advantage sectors. For example if one would compare a white move with an overwhelming advantage of 100.00 to a patzer move leading to a draw (0.00), the absolute evaluation difference would be 100.00, but the relevant valuation difference would be only the practically complete definite integral of all functions in the exclusively positive range of the x-axis (which in turn is identical to the definite integral in the exclusively negative range of the x-axis).

By the way, the mathematical formula for the complete integral from -∞ to +∞ is:

$\frac{2 \cdot e_{>0.75} \cdot r_{>0.75} \cdot \ln (r_{>0.75}) - 2 \cdot e_{=0.75} \cdot r_{>0.75} \cdot \ln (r_{>0.75}) + e_{>0.75} \cdot \ln (r_{>0.75}) + 2 \cdot e_{=0.75} \cdot \ln (r_{>0.75}) - 4 \cdot e_{>0.75} \cdot r_{>0.75}}{2 \cdot \ln (r_{>0.75})}$

Next thought experiment: If one would now compare a white move with an advantage of e_=0.75 exactly at the border between moderate and clear advantage with a patzer move that leads to a draw (0.00), the absolute evaluation difference would be e_=0.75, but the relevant evaluation difference would only be the complete definite integral in the 1st positive sector of the x-axis. As mathematical formula: 0.75 * e_=0.75.

If one now sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage, between the latter and begin of the moderate advantage and again between the latter and the begin of the clear advantage in each case for White/Black, the integral value of 0,75 * e_=0.75 would have to be divided into 3 sectors:

20% = 0.15 * e_=0.75 for the sector balanced position from 0.00,
40% = 0.30 * e_=0.75 for the sector slight advantage for White/Black and
40% = 0.30 * e_=0.75 for the sector moderate advantage for White/Black.

From this, the thresholds of the position evaluations for White and Black can now be calculated with formulas which are not shown here, but are available in a browser inspector via Javascript code.

7 position evaluation sectors with threshold adjustment at the probabilistic game results:

'Extreme advantage for White (+--) or Black (-++) - NAG $20/$21' may not be everyone’s cup of tea. For these contemporaries now a now a repetition of the previous proposal, but this time with only 7 evaluation sectors without extremes.

Here, the end of the sector 'slight advantage for White' and the beginning of the sector 'moderate advantage for White' coincide exactly with e_=0.75, for which the probabilistic game result amounts to 0.75, and the end of the sector 'moderate advantage for White' and the beginning of the sector 'clear advantage for White' coincide exactly with e_>0.75. Conversely for Black: The end of the sector 'slight advantage for Black' and the beginning of the sector 'moderate advantage for Black' coincide exactly with -e_=0.75, for which the probabilistic game result amounts to 0.25 from the white point of view, and the end of the sector 'moderate advantage for Black' and the beginning of the sector 'clear advantage for Black' coincide exactly with the evaluation -e_>0.75. This basic assumption is accompanied by the fact that slight or moderate advantage probabilistically represents a tendency to draw and clear advantage probabilistically represents a tendency to win.

If one here sets to work to quantify the definite integrals between x = 0 and begin of the slight advantage and between the latter and begin of the moderate advantage in each case for White/Black, the integral value of 0,75 * e_=0.75 would have to be divided into 2 sectors:

1/3 = 0.25 * e_=0.75 for the sector balanced position from 0.00 and
2/3 = 0.50 * e_=0.75 for the sector slight advantage for White/Black.

9 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/9 1/9 1/9 1/9 1/18 1/18 1/9 1/9 1/9 1/9 of the total integral:

If one discards the above guideline of threshold adjustment at the probabilistic game results and again prefers 4.5 positive or negative position evaluation sectors this time, however, of equal quantity, the evaluation areas would turn out as shares of the total integral as follows:

1/18 for the sector balanced position from 0.00,
1/9 for the sector slight advantage for White/Black,
1/9 for the sector moderate advantage for White/Black,
1/9 for the sector clear advantage for White/Black and
1/9 for the sector extreme advantage for White/Black.

7 position evaluation sectors with threshold adjustment at identical evaluation sectors 1/7 1/7 1/7 1/14 1/14 1/7 1/7 1/7 of the total integral:

If one discards the above guideline of threshold adjustment at the probabilistic game results and is also not a friend of 4.5 positive or negative position evaluation sectors with extremes, this scheme with sectors of equal quantity remains:

1/14 for the sector balanced position from 0.00,
1/7 for the sector slight advantage for White/Black,
1/7 for the sector moderate advantage for White/Black and
1/7 for the sector clear advantage for White/Black.

The interactive form lists the position evaluation symbols and the limit values between the symbols, the latter in a separate line for each of the 4 schemes.

A tip by the way: If the well-disposed reader strives to use the position evaluation symbols, however not getting hold of them, the following link to the AqChessUnicode font could be helpful. This by the way is also attached to the chess GUI Aquarium.

And those who would not be averse from entering these special chess characters directly with the keyboard for commenting in texts and call a Windows operating system their own may find out about the 'Keyboard layout for chess annotation with special symbols in Windows programs via AutoHotkey'.

The human factor

An evaluation of around 1.50 pawn units to achieve an average game result of 0.75 applies to a largely optimal chess play, as the best chess engines practice it nowadays, but not necessarily also for chess players, not even for grandmasters, who also play far too often bullshit and should therefore theoretically make do with a clearly higher e_=0.75. The reason for this would be their tendency to make mistakes, which lets them draw or even lose games which were believed to be already won. One objection to this, however, is the fact that this measured value would be pushed down again by the mistakes of their opponents of the genus Homo sapiens, because their mistakes often lead to wins which were not necessarily inevitable and for good chess engines such positions under pressure might have been defensible. In this way, many actual draws with temporarily high evaluations could be statistically included in the number of wins without pushing the e_=0.75 up or, vice versa, even minimizing it, since with every additional win a lower evaluation in the waiting list rises to the new e_=0.75. In this respect a suboptimal chess play would be upgraded by the suboptimal opponent's play. Which impact of the chess playing Homo sapiens for the e_=0.75 will take more effect is uncertain.

If chess grandmasters still had the guts to face the best chess engines, their true e_=0.75 might not be determined either. After all, when would they have a clear advantage in such games or even carry off wins? Maybe in extreme handicap games? With them it could be tested how many pawns would have to be taken away from the computer opponent in the starting position in order to wangle wins and draws on a significant scale for the got off scot-free master. Or how a given opening would have to be constructed to release the chess engine into a questionable position. So the grandmasterly e_=0.75 could be determined after all. But since contemporary chess luminaries have been avoiding such comparisons more and more for a long time in order to escape disgrace, such a question hardly arises any more.

Since such game material from matches between man and machine is hardly available, there is currently and presumably also for eternal times only left the half-baked possibility to evaluate games between humans. Whereby one should always keep in mind that the resulting scores were diluted by the dubious playing style of the opponent. Forget it.

No sooner said than done by analysis of 144 world championship matches between Karpov and Kasparov in the years 1984 to 1990. The very last game remains unconsidered, since Kasparov settled a draw with Karpov there with a clear advantage, although the win - as it says in chess slang - was only a question of technique. A draw was enough for him to win the world championship title. All games were superficially analyzed by Stockfish with a short reflection time and an average depth of just over 20 half moves.

To make a long story short: Kasparov won 21 times, Karpov 19 times. The 21 and 19 highest evaluations respectively in draw games were between 3.67 and 1.00 for Kasparov and between 7.80 and 1.04 for Karpov. If you like, you can read from this a win draw balance of at least 1.00 …

Despite a positive evaluation of at least 1.26, the game was still set in the sand in 5 games. Kasparov even messed up the 18th match in the 1986 world championship fight despite a clear 3.67!

Excursus: "Draw range"

At this point the term "draw range", which is wandering around like a ghost from time to time, should be critically scrutinized a little. Because it suggests wrongly that it would coincide with the evaluation range "balanced position or draw (=) - NAG $10". To the reader's chagrin, however, there comes to light a pretty different understanding of this term.

Variant 1:

"Houdini insists on Txc6 and specified at depth 25 an evaluation of 0.76, which probably does not exceed the draw range yet." (Thema "Endspielkönnen gefragt" von Joe Boden Datum 2013-02-09 13:03)

"Therefore one believes with Houdini that a (won) endgame is still in the draw range when it shows +0.80..." (Schachfeld).

This suggests that on the basis of a position evaluation of a chess engine in the low range a statement could be made about the draw outcome of the game. Clearly every game win starts small, namely with a minimal advantage, even maybe after the first move. And if one position the chess engine on the first moves after a game won in this way and let yourself be convinced that the game did by no way start with an initial advantage of significantly more than +0.80, you might start to think long and hard. And the argumentative counter attack by later failures, which are said to have caused the disaster, won’t work, if the patzer is called Stockfish for example and has an ELO of approximately 3500. Take to heart the TCEC loss games of Stockfish. There you will find a lot of games that ended in disaster for this engine despite a negative "draw range" of about -0.76 or -0.80, although it is not well-known for negligently dealing with its positions within the alleged "draw range". Who else but Stockfish should be able to keep such positions in a draw?

Variant 2:

"If during a game no side has a winning advantage, it is also said that "the game is within the draw range"." (Wikipedia).

'Draw range
Scope for a position evaluation, which will lead in the end with the best possible play on both sides to a draw. In the example, White is worse, but is still in the draw range, because he can prevent the pawn from promotion with his king. But if he had the idea to play 1.Kh1, e.g. hoping for 1...f2 and stalemate, he would have left the draw range and Black could now force the victory with best play, namely by 1...Kg4 including gain of the opposition. Whether the starting position of the chess game is in the draw range, or whether perhaps White could force the victory, is too complex to be answered." (www.schwachspieler.de).

Here, the term "draw range" is associated with an ominous "scope for a position evaluation" in the case of a forced draw by certain moves with the best play, which can apparently be proven. In connection with a demonstrable draw, however, even to utter the word "range" is a sign of distorted logic. The draw is 0.00, nothing else. In this case a chess program would have to deliver not only a position evaluation of 0.00, but also one or more draw variants, which are mandatory according to the rules of logic or according to endgame tablebases. This only works in special positions, especially in all maximum 7-man positions, which are completely analysed, all others are simply so complex that one has to be content with a position evaluation between zero and checkmate without being able to draw any compelling conclusions about the outcome of the game. And if a chess program in a real draw position would show a rubbish evaluation differing from 0.00, the program would have a code problem and this would not justify the alogical term "draw range".

If, as usual, a draw would not be provable, one should certainly not use the term "draw range" to lead the reader to believe in would-be knowledge that one cannot have in view of the complexity of a chess game. Then only statistics/probabilistics (the actual topic of this article) govern with regard to all considerations about the outcome of the game and opening databases with win, draw and loss rates of one and the same position can tell a tale about it.

Contact: mail@konrod.info

external link:

internal links:

article links:

Move and position evaluations
together with NAG and Informator symbols

Blunder relevance or more polite: evaluation relevance

Computer chess statistics

Mathematical evaluation relevance reduction

Probabilistic game results

Concretisation of the move evaluation sectors

Concretisation of the position evaluation sectors

The human factor

Excursus: "Draw range"

Ende Gelände ♦ Aus die Maus ♦ Schicht im Schacht ♦ Klappe zu - Affe tot

So long ♦ See You Later, Alligator - In A While, Crocodile ♦ Over And Out

external link:

internal links:

article links:

Move and position evaluationstogether with NAG and Informator symbols

Blunder relevance or more polite: evaluation relevance

Computer chess statistics

Mathematical evaluation relevance reduction

Probabilistic game results

Concretisation of the move evaluation sectors

Concretisation of the position evaluation sectors

The human factor

Excursus: "Draw range"

Ende Gelände ♦ Aus die Maus ♦ Schicht im Schacht ♦ Klappe zu - Affe tot

So long ♦ See You Later, Alligator - In A While, Crocodile ♦ Over And Out

Move and position evaluations
together with NAG and Informator symbols