### Grading

### Basic principles

In all Science Fights, the Jury evaluates the Team performances by publicly showing integer scores called Grades or *G*.

The Grades *G* from all *n* Jurors in a Group are used to calculate the Average Point *P*. Two extreme Grades, one maximum and one minimum, are replaced with one grade equal to their arithmetic mean. In the next step, *P* is determined as the arithmetic mean of the new data set of *n*−1 Grades. This procedure has the advantage of weighting outliers less heavily. *P* is rounded to the nearest 0.1 points. An example below (see a pdf file) illustrates this procedure with some real data from the 2nd IYNT 2014.

Each Team completes three performances (Reporter, Opponent, and Reviewer) in each Science Fight and earns three Average Points *P* which are summed up to obtain the Sum of Points *SP*. In case the Team violates particular rules of the IYNT, Yellow Cards are issued and the *SP* is consequently reduced.

The values of *SP* in each Group are used to calculate the Criterion of Victory *V* which is set to *V*=1 for the Team with the highest *SP* and for one or two Teams which have an *SP* that differs from the top result by no more than 2 points (*SP*≥*SP*_{max}−2.) For the Teams in the Group which have (*SP*_{max}−10)≤*SP*<(*SP*_{max}−2), the Criterion of Victory is set to *V*=½. For the Teams which have *SP*<*SP*_{max}−10, *V*=0. The Criterion of Victory, the primary parameter that determines the placing and rank of each Team, minimizes effects of statistical noise and Juror-to-Juror differences in grading.

### Distributions of *G* and *P*

During the 1st, 2nd, 3rd, and 4th IYNTs taken together, 4098 Grades were delivered in 219 stages. Each of the 657 performances therefore obtained its Grades *G* from an average of *n*=6.2 Jurors. A total of 81 round-robin matches (Science Fights in a Group) were played with three or two Teams each. In them, the Teams collected 219 values of *SP* and 219 values of *V* (98 instances of *V*=1, 84 instances of *V*=½, and 37 instances of *V*=0; including the 1st IYNT 2013 retrospectively.)

The graph (hi-res image, raw ASCII data) shows a fitted histogram of the Grades for each type of performance (Report, Opposition, and Review.)

The spread in the *G* along the X-axis is broad, indicating that nearly a whole spectrum of available *G* is used by the Jury, however the extreme *G* are inherently less frequent. We encourage each juror to stay centered around 15 points for a Report, around 10 points for an Opposition, and around 5 points for a Review, each time weighting their *G* against what they believe an *average* IYNT performance is. In reality however the actual mean and standard deviation are 18.7±5.1 for a Report, 12.6±3.5 for an Opposition, and 6.8±1.9 for a Review, indicating that an average Juror shifts all their *G* to the right as compared to our guidelines. The distributions are moderately asymmetric, with the median Grades of 19 for a Report, 13 for an Opposition, and 7 for a Review.

In turn, the mean and standard deviation for the Average Points *P* are 18.6±4.0 for a Report, 12.6±2.8 for an Opposition, and 6.8±1.4 for a Review (raw ASCII data.) Such distributions of *P* are narrower than the respective distributions of *G*.

### Spread of the Grades

At this point it is important to realize that the IYNT requires a direct comparison of results from parallel Groups, whilst each Team does not play every other Team during one Science Fight. We should therefore be aware of the extent to which parallel boards of Jury can be influenced by fluctuations and what grading parameters are uniformly objective indications of the relative strengths of all IYNT contestants.

A consistent grading is extremely important for the IYNT as it must allow reliable identification of winners in each Group and the ultimate winners of the competition.

Let us consider three hypothetical and extreme cases, *Example 1*, *Example 2*, and *Example 3*.

In the *Example 1*, all Jurors agree with each other and there is high confidence that Average Points *P* reflect objective differences between the two Teams. For each Juror and each Grade, *G−P*=0.

The *Example 2* shows a situation where both Teams obtain equal Grades from the middle of the spectrum. Although it is natural that some games can end in a tie, this scenario is less advantageous. If no Team can impress the Jury or the Jury lacks sensitivity to hidden variances between the Teams, it is difficult to rank all Teams from top to bottom.

The *Example 3* depicts an undesired event where different Jurors use radically different grading criteria. Team 1 earns 0.1 points more than Team 2, however the level of confidence to this difference is low because *σ*_{G−P}, or the spread in *G* given to one performance by several Jurors, is much wider than these eventual 0.1 points. Albeit both Average Points *P* are very close or equal to the *Example 2*, these two *P* do not reflect the serious differences between the Teams that each Juror has noticed and highlighted in their grading. Concluding that Team 1 shows a *better* performance than Team 2, or either Team in *Example 2*, is inconsistent with this set of Grades *G*.

**Our aim is that different Jurors put very similar Grades for one performance, however each Juror puts distinctly different Grades for different performances.**

### Grading criteria

During each IYNT, we brief Jurors and Teams on our grading and scoring criteria. Our guidelines have evolved since 2013, and as of now consist of four partial grading criteria. Our aim is to keep the guidelines clear and simple, and make sure that any Juror relies on the fixed, common criteria when evaluating performances across the parallel Groups. The criteria are printed directly on the individual Juror's protocols.

The Jurors are asked to add or subtract points from a *starting grade* (15, 10, or 5) and decide on their final grade *G*. Such a decision is individual, and in an upwards of 99% cases there is a spread of the Grades. No Grade can be corrected retroactively, and each Juror must justify any of their Grades upon the request of Team Captains or the Chairperson. Each *G* is public.

Find below the blank Juror's protocols used at the 4th IYNT 2016, as well as the slides from the most recent introductory briefing for Jurors and Teams.

- Blank individual Juror's Protocol, A4 size (2016/03/29) [pdf]
- Briefing for Jurors and Teams, slides by Ilya Martchenko (2016/07/17) [pdf]

### Distributions of *G−P*

It is now good to look at the real data from the four previous IYNTs and analyze the spread in the *G* given by different Jurors to one performance in one Science Fight.

The graph (hi-res image, raw ASCII data) shows each of the 4098 Grades *G* given during the first four IYNTs. X-coordinate indicates which of the 30 possible Grades *G* was given. Y-coordinate indicates the difference between this particular *G* and the Average Point *P* that was calculated on its basis.

To interpret the spread in the Y-axis, one should remember that if *Example 1* would take place in each IYNT Stage, each *G* would be equal to its respective *P*, and *G−P* would globally collapse to zero. Luckily, the IYNT is not a paper-and-pencil exam, and its Jurors have opinions which result in a distribution of individual *G* around the *P* in each Stage.

The standard deviations of the distributions of *G−P* for the three types of performances are found as follows: 3.26 for a Report, 2.25 for an Opposition, and 1.21 for a Review. These particular values are calculated with two extreme *G* in each Group taken with the weight of 1, rather than ½.

### Statistical significance of SF results

Since the extreme *G* contribute to statistics of *P* with the weight of ½, we can prepare the working dataset of 4098−3×219=3441 processed grades that we label *g* (raw ASCII data from three years only.) *g*=*G* if the respective *G* is not extreme in the Group, and *g*=½(*G*_{max}+*G*_{min}) for the pairs of extreme *G*_{max} and *G*_{min} in each Group.

The statistical parameters of the distributions of residuals *g−P* now provide crucial information on the statistical significance of each *P* and in turn the IYNT rankings.

**In case the SF results do not permit rejecting the null hypothesis that a slightly higher SP in a round-robin Science Fight is observed by chance, more than one Team earns V=1. Unlike TSP, the Sum of Victories SV keeps track of such statistically significant cases wherin one or several SF winners step forward.**

We can define the significance of *V*=1 as the level of statistical confidence for the interval [*SP*−2...60] in one Science Fight Group. This statistical significance depends only on the values of *g−P* and number of Jurors *n* in the Group, and does not directly depend on the absolute magnitudes of *G*. This has the advantage of placing focus on congruence between opinions of Jurors, rather than ranking of Teams.

To illustrate how to compute the confidence of [*SP*−2...60] and [*SP*−10...60], let us analyze in depth one example of a round-robin Science Fight (Finals of the most recent 4th IYNT 2016, see a hi-res pdf file.)

The standard deviations *σ* of the distributions of *g−P* for each of three types of performances in this SF are found as follows: *σ*_{REP}=1.91 for a Report, *σ*_{OPP}=1.36 for an Opposition, and *σ*_{REV}=0.65 for a Review (each from a sample of 27 residuals *g−P*.) As per the IYNT procedure, *SP* is calculated as a sum of Average Points *P* for these three performances. By assuming *σ*^{2}_{SP}=*σ*^{2}_{REP}+*σ*^{2}_{OPP}+*σ*^{2}_{REV}, we can easily find *σ*_{SP}=2.43 in these Finals of the 4th IYNT. It is now easy to determine the root-mean-square deviation *ρ*_{SP} by defining *ρ*^{2}_{SP}=*σ*^{2}_{SP}/(*n*−1), where number of Jurors is *n*=10.

The value of *ρ*_{SP}=0.81 evaluates the standard error of the mean, i.e. the statistical uncertainty of *SP* earned by any Team in the SF. This standard error is inherent in *estimating* whatever *true* value of *SP* from limited statistics. If the difference between any two sample-based *SP*_{i} and *SP*_{j} is comparable to or less than *ρ*_{SP}, they can be assumed statistically indistinguishable.

In the next step, we can find the confidence level for interval [*SP*−2...60] as a function of the number of degrees of freedom in a representative sample (i.e. *n*−1) and Student's *t*-score (i.e. 2/*ρ*_{SP}.) This is based on assuming that Student's t-distribution is a valid approximation.

This calculation yields the confidence level of 98.2% for the interval [*SP*−2...60], well above a two-sigma significance threshold. In the Finals of other previous IYNTs, confidence levels for the interval [*SP*−2...60] were 94.8% in 2015, 98.8% in 2014, and 98.2% in 2013. A similar calculation yields the confidence level of 99.99997 for the interval [*SP*−10...60] in 2016 (well above five-sigma), 99.99986% in 2015, 99.99971% in 2014, and 99.99827% in 2013 (well above four-sigma in each case.)

The Table below summarizes the paramateres calculated in the similar manner for all 81 round-robin Science Fights of past four IYNTs. We cordially acknowledge Dmitriy Baranov for his help in processing the data. Click on the headers to have the table sorted by any desired parameter.

SF | σ_{REP} | σ_{OPP} | σ_{REV} | σ_{SF} | ρ_{SP} | t_{2} | n | V=1 | V=½ |
---|---|---|---|---|---|---|---|---|---|

2013-1-A | 1.60 | 1.39 | 0.72 | 2.24 | 0.91 | 2.19 | 7 | 96.4% | 100.0% |

2013-1-B | 3.02 | 2.34 | 0.70 | 3.88 | 1.74 | 1.15 | 6 | 84.9% | 99.89% |

2013-1-C | 0.99 | 1.26 | 0.47 | 1.67 | 0.84 | 2.39 | 5 | 96.3% | 99.99% |

2013-1-D | 1.18 | 1.08 | 0.53 | 1.69 | 0.75 | 2.65 | 6 | 97.7% | 100.0% |

2013-1-E | 1.16 | 2.78 | 0.42 | 3.04 | 1.52 | 1.31 | 5 | 87.1% | 99.86% |

2013-1-F | 1.23 | 1.41 | 0.35 | 1.91 | 0.85 | 2.34 | 6 | 96.7% | 100.0% |

2013-2-A | 1.44 | 0.77 | 0.69 | 1.77 | 0.79 | 2.52 | 6 | 97.4% | 100.0% |

2013-2-B | 1.67 | 0.96 | 0.77 | 2.08 | 0.93 | 2.15 | 6 | 95.8% | 99.99% |

2013-2-C | 2.29 | 1.29 | 0.82 | 2.76 | 1.23 | 1.62 | 6 | 91.7% | 99.98% |

2013-2-D | 1.32 | 0.85 | 0.68 | 1.71 | 0.86 | 2.34 | 5 | 96.0% | 99.98% |

2013-2-E | 1.44 | 1.10 | 0.46 | 1.87 | 0.84 | 2.39 | 6 | 96.9% | 100.0% |

2013-2-F | 0.99 | 1.61 | 0.80 | 2.05 | 0.84 | 2.39 | 7 | 97.3% | 100.0% |

2013-3-A | 1.08 | 0.71 | 0.55 | 1.40 | 0.70 | 2.85 | 5 | 97.7% | 99.99% |

2013-3-B | 0.98 | 0.85 | 0.30 | 1.33 | 0.67 | 3.00 | 5 | 98.0% | 99.99% |

2013-3-C | 1.78 | 0.98 | 1.06 | 2.29 | 1.15 | 1.74 | 5 | 92.2% | 99.95% |

2013-3-D | 0.88 | 0.66 | 0.70 | 1.30 | 0.65 | 3.07 | 5 | 98.1% | 99.99% |

2013-3-E | 0.48 | 2.21 | 0.46 | 2.30 | 1.15 | 1.74 | 5 | 92.1% | 99.95% |

2013-3-F | 1.46 | 0.87 | 0.77 | 1.87 | 0.84 | 2.39 | 6 | 96.9% | 100.0% |

2013-4-A | 1.34 | 0.95 | 0.53 | 1.73 | 0.77 | 2.59 | 6 | 97.6% | 100.0% |

2013-4-B | 0.20 | 0.86 | 0.35 | 0.95 | 0.47 | 4.22 | 5 | 99.3% | 100.0% |

2013-4-C | 1.12 | 1.23 | 0.74 | 1.82 | 0.91 | 2.20 | 5 | 95.3% | 99.98% |

2013-4-D | 1.21 | 0.77 | 0.37 | 1.47 | 0.66 | 3.03 | 6 | 98.6% | 100.0% |

2013-4-E | 1.43 | 1.00 | 0.46 | 1.81 | 0.90 | 2.21 | 5 | 95.4% | 99.98% |

2013-4-F | 1.89 | 0.77 | 0.88 | 2.23 | 1.00 | 2.01 | 6 | 94.9% | 99.99% |

2013-S-A | 1.36 | 0.97 | 0.91 | 1.90 | 0.85 | 2.35 | 6 | 96.7% | 100.0% |

2013-S-B | 1.64 | 1.44 | 0.79 | 2.32 | 0.88 | 2.28 | 8 | 97.2% | 100.0% |

2013-S-C | 1.50 | 1.08 | 0.66 | 1.96 | 0.74 | 2.70 | 8 | 98.5% | 100.0% |

2013-F-A | 1.81 | 1.26 | 1.02 | 2.43 | 0.81 | 2.47 | 10 | 98.2% | 100.0% |

2014-1-A | 2.42 | 1.95 | 0.71 | 3.18 | 1.42 | 1.40 | 6 | 89.0% | 99.95% |

2014-1-B | 2.55 | 0.87 | 0.67 | 2.78 | 1.24 | 1.61 | 6 | 91.6% | 99.98% |

2014-2-A | 1.75 | 1.18 | 0.52 | 2.17 | 0.97 | 2.06 | 6 | 95.3% | 99.99% |

2014-2-B | 1.20 | 0.84 | 0.69 | 1.62 | 0.72 | 2.76 | 6 | 98.0% | 100.0% |

2014-3-A | 2.32 | 1.17 | 0.35 | 2.62 | 1.17 | 1.71 | 6 | 92.6% | 99.98% |

2014-3-B | 2.04 | 1.03 | 0.68 | 2.39 | 1.07 | 1.87 | 6 | 94.0% | 99.99% |

2014-4-A | 1.02 | 1.15 | 0.77 | 1.72 | 0.77 | 2.60 | 6 | 97.6% | 100.0% |

2014-4-B | 2.04 | 1.15 | 0.45 | 2.39 | 1.07 | 1.87 | 6 | 94.0% | 99.99% |

2014-F-A | 1.29 | 0.86 | 0.56 | 1.65 | 0.67 | 2.97 | 7 | 98.8% | 100.0% |

2015-1-A | 2.96 | 1.23 | 0.67 | 3.28 | 1.47 | 1.36 | 6 | 88.5% | 99.95% |

2015-1-B | 1.37 | 0.95 | 0.97 | 1.93 | 0.86 | 2.32 | 6 | 96.6% | 100.0% |

2015-1-C | 1.42 | 0.94 | 0.57 | 1.80 | 0.80 | 2.49 | 6 | 97.2% | 100.0% |

2015-1-D | 1.06 | 0.82 | 0.58 | 1.46 | 0.73 | 2.74 | 5 | 97.4% | 99.99% |

2015-2-A | 3.02 | 3.28 | 1.23 | 4.62 | 1.89 | 1.06 | 7 | 83.5% | 99.91% |

2015-2-B | 1.57 | 1.01 | 0.83 | 2.05 | 0.84 | 2.39 | 7 | 97.3% | 100.0% |

2015-2-C | 1.77 | 1.26 | 0.75 | 2.30 | 1.03 | 1.95 | 6 | 94.5% | 99.99% |

2015-2-D | 0.77 | 1.43 | 0.68 | 1.77 | 0.79 | 2.53 | 6 | 97.4% | 100.0% |

2015-3-A | 2.49 | 1.26 | 0.41 | 2.82 | 1.15 | 1.74 | 7 | 93.4% | 99.99% |

2015-3-B | 0.90 | 1.42 | 0.59 | 1.78 | 0.80 | 2.51 | 6 | 97.3% | 100.0% |

2015-3-C | 2.14 | 1.16 | 0.71 | 2.54 | 1.13 | 1.76 | 6 | 93.1% | 99.98% |

2015-3-D | 2.03 | 1.09 | 0.55 | 2.37 | 1.06 | 1.89 | 6 | 94.1% | 99.99% |

2015-4-A | 1.12 | 1.13 | 0.89 | 1.82 | 0.81 | 2.45 | 6 | 97.1% | 100.0% |

2015-4-B | 1.47 | 0.99 | 0.63 | 1.88 | 0.84 | 2.38 | 6 | 96.8% | 100.0% |

2015-4-C | 1.95 | 1.34 | 0.62 | 2.45 | 1.09 | 1.83 | 6 | 93.6% | 99.99% |

2015-4-D | 2.24 | 0.73 | 0.63 | 2.44 | 1.09 | 1.83 | 6 | 93.7% | 99.99% |

2015-S-A | 2.30 | 0.95 | 1.12 | 2.73 | 1.11 | 1.79 | 7 | 93.9% | 99.99% |

2015-S-B | 2.34 | 0.84 | 1.00 | 2.68 | 1.09 | 1.83 | 7 | 94.1% | 100.0% |

2015-F-A | 3.21 | 1.76 | 0.82 | 3.75 | 1.13 | 1.77 | 12 | 94.8% | 100.0% |

2016-1-A | 3.45 | 1.96 | 0.84 | 4.06 | 1.82 | 1.10 | 6 | 84.0% | 99.87% |

2016-1-B | 2.66 | 1.70 | 0.76 | 3.25 | 1.45 | 1.38 | 6 | 88.7% | 99.95% |

2016-1-C | 2.62 | 1.52 | 0.95 | 3.17 | 1.42 | 1.41 | 6 | 89.1% | 99.96% |

2016-1-D | 1.90 | 1.58 | 0.66 | 2.56 | 1.15 | 1.75 | 6 | 92.9% | 99.98% |

2016-1-E | 2.29 | 1.54 | 1.26 | 3.04 | 1.36 | 1.47 | 6 | 90.0% | 99.96% |

2016-1-F | 2.40 | 2.13 | 1.15 | 3.41 | 1.52 | 1.31 | 6 | 87.7% | 99.94% |

2016-2-A | 1.97 | 1.33 | 0.66 | 2.47 | 1.11 | 1.81 | 6 | 93.5% | 99.99% |

2016-2-B | 0.95 | 0.69 | 0.48 | 1.27 | 0.57 | 3.52 | 6 | 99.2% | 100.0% |

2016-2-C | 1.79 | 1.00 | 0.38 | 2.08 | 1.04 | 1.92 | 5 | 93.6% | 99.97% |

2016-2-D | 2.04 | 1.77 | 0.72 | 2.80 | 1.25 | 1.60 | 6 | 91.5% | 99.98% |

2016-2-E | 2.40 | 1.45 | 0.66 | 2.88 | 1.44 | 1.39 | 5 | 88.1% | 99.89% |

2016-2-F | 1.59 | 2.19 | 1.30 | 3.00 | 1.34 | 1.49 | 6 | 90.2% | 99.97% |

2016-3-A | 1.99 | 1.40 | 0.60 | 2.51 | 1.02 | 1.95 | 7 | 95.1% | 100.0% |

2016-3-B | 3.09 | 2.55 | 1.25 | 4.19 | 1.88 | 1.07 | 6 | 83.3% | 99.84% |

2016-3-C | 2.37 | 1.58 | 0.71 | 2.94 | 1.20 | 1.68 | 7 | 92.7% | 99.99% |

2016-4-A | 3.68 | 1.44 | 0.69 | 4.01 | 2.00 | 1.00 | 5 | 81.3% | 99.62% |

2016-4-B | 2.33 | 1.24 | 0.78 | 2.75 | 1.23 | 1.63 | 6 | 91.8% | 99.98% |

2016-4-C | 2.39 | 1.73 | 0.57 | 3.01 | 1.50 | 1.33 | 5 | 87.3% | 99.87% |

2016-4-D | 2.52 | 1.65 | 0.70 | 3.09 | 1.38 | 1.45 | 6 | 89.6% | 99.96% |

2016-4-E | 0.76 | 1.06 | 1.05 | 1.67 | 0.74 | 2.67 | 6 | 97.8% | 100.0% |

2016-4-F | 1.38 | 1.46 | 0.82 | 2.17 | 1.08 | 1.84 | 5 | 93.1% | 99.96% |

2016-S-A | 2.19 | 1.63 | 0.88 | 2.87 | 1.17 | 1.71 | 7 | 93.1% | 99.99% |

2016-S-B | 3.48 | 2.03 | 1.22 | 4.21 | 1.72 | 1.16 | 7 | 85.6% | 99.94% |

2016-S-C | 2.10 | 1.10 | 0.60 | 2.45 | 1.00 | 2.00 | 7 | 95.4% | 100.0% |

2016-F-A | 1.91 | 1.36 | 0.65 | 2.43 | 0.81 | 2.47 | 10 | 98.2% | 100.0% |

These results justify the importance of the Criterion of Victory *V* and importance of the fact that no IYNT Stage has ever been graded by less than 5 Jurors. As argued below, besides having different opinions, individual Jurors may also work on different grading scales. At all times, when looking at the IYNT scores, we ask whether their difference is representative of a real difference between Teams or whether it is a statistical fluke. An especially high level of significance is demanded if grading parameters are used to resolve the placing of eventual Semi-Finalists and Finalists.

**In a typical Science Fight, earning a V=½ or above is a four-sigma event, with an expected confidence level of (99.97±0.05)% for the interval [SP−10...60]. Earning a much more rewarding V=1 is a two-sigma event, with an expected confidence level of (94±4)% for the interval [SP−2...60]. Earning each V is a statistically independent event, and earning several V=1 further contributes to the confidence in the placing of top IYNT Teams.**

By assuming that the grading parameters of Jurors would not improve considerably before the 5th IYNT 2017, we may estimate the *average* expected level of confidence for [*SP*−2...60] as a function of the number of Jurors *n* randomly selected to one Group. In this table, we re-calculate *ρ*_{SP} from a historically global *σ*_{SP}=2.5618.

No. Jurors, n | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

DFs, n−1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

Std error, ρ_{SP} | 2.6 | 1.8 | 1.5 | 1.3 | 1.1 | 1.0 | 1.0 | 0.9 | 0.9 | 0.8 | 0.8 |

[SP−2...60], % | 71 | 81 | 87 | 90 | 93 | 95 | 96 | 97 | 98 | 98 | 99 |

### Grading parameters of individual Jurors

For reference purposes, this table summarizes the individual grading parameters (within one IYNT) of all existing 100 Jurors. We cordially acknowledge Dmitriy Baranov for his help in processing the data. The presented parameters give a glimpse of the Jurors' perceptions of the grading scale in the IYNT. Click on the headers to have the table sorted by any desired parameter.

Yr, St | Name | ⟨G⟩ | σ_{G} | σ_{G−P} | ⟨G−P⟩ | n_{G} | n_{SF} | n_{T} | κ |
---|---|---|---|---|---|---|---|---|---|

2016 | Ilya Martchenko | 10.3 | 6.1 | 3.0 | -1.9 | 42 | 5 | 8 | +1.9 |

2015 | Ilya Martchenko | 12.7 | 6.1 | 2.8 | -0.1 | 48 | 6 | 8 | +0.9 |

2014 | Ilya Martchenko | 12.2 | 6.7 | 2.1 | -1.0 | 39 | 5 | 5 | +1.7 |

2013 | Ilya Martchenko | 12.5 | 6.0 | 2.1 | -0.3 | 48 | 6 | 10 | +0.9 |

2016 | Mladen Matev | 11.8 | 5.6 | 2.4 | -1.1 | 51 | 6 | 11 | +0.8 |

2015 | Mladen Matev | 12.1 | 5.1 | 2.1 | -1.3 | 48 | 6 | 9 | +0.2 |

2014 | Mladen Matev | 10.9 | 5.3 | 2.5 | -1.4 | 30 | 4 | 4 | +0.9 |

2013 | Mladen Matev | 12.5 | 5.5 | 3.0 | +0.0 | 39 | 5 | 10 | +0.4 |

2015 | Gur. Mikaberidze | 11.9 | 4.6 | 1.9 | -0.9 | 39 | 5 | 7 | -0.3 |

2014 | Gur. Mikaberidze | 11.0 | 5.6 | 2.2 | -1.6 | 24 | 4 | 4 | +1.1 |

2013 | Gur. Mikaberidze | 11.1 | 5.1 | 2.0 | -0.3 | 33 | 4 | 10 | +0.6 |

2016 | Evgeny Yunosov | 13.3 | 5.5 | 1.6 | +0.1 | 18 | 2 | 5 | +0.1 |

2015 | Evgeny Yunosov | 14.3 | 6.1 | 1.8 | +0.4 | 33 | 4 | 9 | +0.3 |

2014 | Evgeny Yunosov | 14.4 | 6.7 | 1.7 | +1.2 | 39 | 5 | 5 | +0.8 |

2016 | Andrei Klishin | 10.8 | 6.0 | 3.0 | -1.8 | 51 | 6 | 10 | +1.6 |

2015 | Andrei Klishin | 13.5 | 6.9 | 3.5 | -0.4 | 48 | 6 | 9 | +1.4 |

2016 | Dina Izadi | 11.9 | 7.2 | 1.6 | +0.8 | 9 | 1 | 3 | +2.3 |

2013 | Dina Izadi | 10.4 | 4.9 | 2.1 | -1.0 | 48 | 5 | 11 | +0.7 |

2016 | Alena Kastenka | 10.5 | 5.9 | 2.5 | -1.0 | 30 | 4 | 8 | +1.6 |

2015 | Alena Kastenka | 11.5 | 5.6 | 2.2 | -0.5 | 33 | 4 | 6 | +0.9 |

2016 | Dmitry Zhukalin | 12.8 | 6.9 | 2.9 | +0.1 | 24 | 3 | 6 | +1.7 |

2015 | Dmitry Zhukalin | 13.6 | 5.7 | 1.4 | +0.7 | 36 | 5 | 7 | +0.1 |

2015 | Aleks. Dimić | 12.6 | 5.9 | 1.4 | -0.3 | 33 | 4 | 7 | +0.8 |

2014 | Aleks. Dimić | 12.9 | 6.1 | 1.9 | -0.3 | 30 | 4 | 4 | +0.8 |

2016 | Danko Marušić | 11.2 | 5.1 | 3.7 | -1.8 | 39 | 5 | 7 | +0.5 |

2015 | Danko Marušić | 11.5 | 5.3 | 2.8 | -0.7 | 39 | 5 | 8 | +0.6 |

2015 | Milen Kadiyski | 14.6 | 6.6 | 1.9 | +1.1 | 48 | 6 | 9 | +0.6 |

2014 | Milen Kadiyski | 16.3 | 6.9 | 1.8 | +1.1 | 42 | 5 | 5 | +0.2 |

2016 | Nika Sabashvili | 11.5 | 5.0 | 1.9 | -1.0 | 30 | 4 | 8 | +0.3 |

2015 | Nika Sabashvili | 12.6 | 5.4 | 1.9 | -0.1 | 33 | 4 | 7 | +0.3 |

2014 | Dmitriy Agarkov | 14.6 | 6.2 | 1.3 | -0.9 | 39 | 5 | 4 | +0.2 |

2013 | Dmitriy Agarkov | 12.7 | 5.6 | 1.4 | -0.6 | 45 | 5 | 10 | +0.4 |

2015 | Andrey Kravtsov | 13.6 | 5.6 | 2.1 | +1.1 | 21 | 3 | 7 | +0.0 |

2014 | Andrey Kravtsov | 15.1 | 6.7 | 2.1 | -0.4 | 39 | 5 | 4 | +0.5 |

2016 | Som. Mahmoodi | 12.6 | 7.1 | 1.3 | +0.7 | 33 | 4 | 9 | +2.0 |

2016 | N. Seliverstova | 12.8 | 7.2 | 1.5 | +0.2 | 12 | 2 | 4 | +2.0 |

2016 | Nikita Datsuk | 11.8 | 6.7 | 2.8 | -1.0 | 33 | 4 | 8 | +1.9 |

2015 | D. Radovanović | 12.3 | 6.8 | 3.6 | -0.4 | 39 | 5 | 8 | +1.8 |

2016 | Ivan Syulzhyn | 9.8 | 5.6 | 1.3 | -1.7 | 33 | 4 | 7 | +1.6 |

2015 | Ivan Reznikov | 11.7 | 6.4 | 3.3 | -1.1 | 48 | 6 | 9 | +1.6 |

2016 | Jalil Sedaghat | 11.7 | 6.4 | 3.1 | +0.2 | 24 | 3 | 7 | +1.6 |

2016 | Samuel Byland | 11.1 | 5.9 | 2.2 | -1.4 | 51 | 6 | 10 | +1.4 |

2013 | Igor Evtodiev | 10.1 | 5.4 | 2.0 | -1.4 | 42 | 5 | 9 | +1.3 |

2013 | Alina Astakhova | 11.8 | 6.1 | 2.2 | +0.1 | 57 | 6 | 14 | +1.3 |

2013 | Naime Arslan | 12.4 | 6.4 | 1.9 | +1.9 | 9 | 1 | 3 | +1.3 |

2016 | Ahmad Sheikhi | 13.1 | 6.6 | 3.0 | +1.4 | 36 | 4 | 10 | +1.3 |

2013 | Ismail Kiran | 11.7 | 6.0 | 1.8 | -0.1 | 42 | 5 | 8 | +1.2 |

2016 | Af. Montakhab | 12.0 | 6.1 | 1.8 | -0.1 | 21 | 3 | 6 | +1.2 |

2015 | Aleks. Suvorova | 14.2 | 7.0 | 3.0 | +0.8 | 45 | 6 | 7 | +1.2 |

2016 | Roya Radgohar | 14.3 | 7.0 | 2.7 | +1.9 | 30 | 4 | 8 | +1.2 |

2016 | Azizolah Azizi | 14.7 | 7.2 | 2.2 | +1.7 | 48 | 6 | 10 | +1.2 |

2016 | Laura Guerrini | 10.4 | 5.3 | 2.3 | -2.2 | 45 | 5 | 10 | +1.1 |

2013 | Celalettin Baykul | 12.7 | 6.3 | 2.9 | +0.4 | 18 | 2 | 5 | +1.1 |

2013 | Jeyhun Jabarov | 14.6 | 7.1 | 2.5 | +0.7 | 39 | 5 | 10 | +1.1 |

2016 | Dmitii Dorofeev | 10.2 | 5.2 | 2.3 | -2.1 | 51 | 6 | 9 | +1.0 |

2013 | Jevhen Olijnyk | 10.8 | 5.4 | 1.5 | -0.7 | 39 | 5 | 8 | +1.0 |

2013 | Ersin Karademir | 11.8 | 5.8 | 1.6 | +0.3 | 39 | 5 | 9 | +1.0 |

2016 | Marzieh Afkhami | 13.4 | 6.5 | 1.9 | +1.5 | 24 | 3 | 6 | +1.0 |

2016 | M. Sadat Tahami | 13.5 | 6.5 | 2.7 | +1.1 | 42 | 5 | 10 | +1.0 |

2014 | D. Karashanova | 15.4 | 7.3 | 1.6 | +0.6 | 33 | 4 | 5 | +1.0 |

2013 | Diana Kovtunova | 13.0 | 6.2 | 1.2 | -0.1 | 42 | 5 | 10 | +0.9 |

2013 | Ahmet Çabuk | 13.1 | 6.2 | 1.5 | +0.3 | 27 | 3 | 9 | +0.9 |

2013 | Antoan. Nikolova | 13.3 | 6.3 | 1.9 | +0.7 | 42 | 5 | 10 | +0.9 |

2015 | Vesna Vasić | 15.7 | 7.3 | 1.7 | +2.3 | 9 | 1 | 3 | +0.9 |

2013 | Aliaks. Mamoika | 12.2 | 5.8 | 1.6 | -0.4 | 42 | 5 | 10 | +0.8 |

2013 | Vlad. Vanovskiy | 13.2 | 6.2 | 2.2 | +0.0 | 54 | 6 | 12 | +0.8 |

2013 | Val. Lobyshev | 13.5 | 6.3 | 2.0 | +1.3 | 36 | 4 | 10 | +0.8 |

2015 | Dušan Dimić | 14.9 | 6.9 | 2.2 | +1.0 | 48 | 6 | 9 | +0.8 |

2013 | Buras Boljiev | 11.4 | 5.4 | 1.7 | -0.3 | 30 | 4 | 6 | +0.7 |

2016 | Ban. Rastegari | 13.9 | 6.4 | 1.9 | +1.3 | 27 | 3 | 7 | +0.7 |

2016 | Tatiana Fursova | 11.2 | 5.2 | 2.2 | -0.6 | 27 | 3 | 7 | +0.6 |

2015 | Jelena Vračević | 11.8 | 5.4 | 2.0 | +0.5 | 24 | 3 | 7 | +0.6 |

2016 | Jaf. Vatanparast | 12.3 | 5.6 | 1.6 | +1.1 | 36 | 4 | 8 | +0.6 |

2015 | Viktor Nechaev | 13.3 | 6.0 | 1.9 | +0.1 | 27 | 4 | 8 | +0.6 |

2016 | Sed. Forootan | 13.9 | 6.3 | 1.2 | +2.8 | 9 | 1 | 3 | +0.6 |

2013 | Ek. Mendeleeva | 14.3 | 6.4 | 1.4 | +0.7 | 48 | 5 | 11 | +0.6 |

2016 | Giorgi Khomeriki | 7.7 | 3.6 | 2.3 | -3.1 | 24 | 3 | 7 | +0.5 |

2013 | Timothy Timur | 9.8 | 4.5 | 2.1 | +0.3 | 15 | 2 | 5 | +0.5 |

2014 | Nasko Stamenov | 11.6 | 5.2 | 2.2 | -0.6 | 30 | 4 | 4 | +0.5 |

2013 | Alexander Sigeev | 11.7 | 5.3 | 1.7 | -0.9 | 54 | 6 | 9 | +0.5 |

2013 | Emel Alğin | 11.9 | 5.4 | 2.0 | -0.3 | 57 | 6 | 13 | +0.5 |

2013 | Sergey Sabaev | 12.0 | 5.4 | 1.7 | -0.1 | 9 | 1 | 3 | +0.5 |

2013 | Özge Özşen | 12.4 | 5.6 | 2.0 | +0.2 | 36 | 4 | 10 | +0.5 |

2013 | Ayset Yurt Ece | 13.0 | 5.8 | 2.6 | +0.7 | 33 | 4 | 9 | +0.5 |

2015 | Wang Sihui | 13.3 | 5.9 | 1.5 | -0.3 | 27 | 4 | 6 | +0.5 |

2016 | Som. Haj. Gooki | 15.1 | 6.7 | 2.3 | +2.4 | 45 | 5 | 9 | +0.5 |

2013 | Siarhei Seniuk | 11.2 | 5.0 | 1.6 | -0.8 | 42 | 5 | 13 | +0.4 |

2016 | Roya Pournejati | 12.5 | 5.5 | 1.5 | +0.2 | 33 | 4 | 10 | +0.4 |

2013 | Dursun Eser | 12.8 | 5.6 | 1.2 | +0.6 | 36 | 4 | 10 | +0.4 |

2016 | Zahra Yazdgerdi | 12.8 | 5.6 | 2.9 | -0.2 | 24 | 3 | 7 | +0.4 |

2013 | Louis D. Heyns | 13.2 | 5.8 | 1.6 | -0.3 | 42 | 5 | 12 | +0.4 |

2016 | Afshan Mohajeri | 14.1 | 6.2 | 2.8 | +1.4 | 27 | 3 | 7 | +0.4 |

2013 | Hakan Dal | 16.7 | 7.2 | 2.3 | +2.3 | 6 | 1 | 2 | +0.4 |

2013 | Sergey Zelenin | 12.8 | 5.5 | 1.5 | +0.1 | 42 | 5 | 9 | +0.3 |

2015 | Kirill Volosnikov | 12.9 | 5.6 | 2.3 | -0.5 | 45 | 6 | 9 | +0.3 |

2013 | Ebru Ataşlar | 14.9 | 6.4 | 2.3 | +1.7 | 36 | 4 | 9 | +0.3 |

2014 | Vasilka Krasteva | 17.1 | 7.3 | 1.8 | +1.5 | 39 | 5 | 4 | +0.3 |

2015 | Tatiana Besedina | 12.3 | 5.2 | 1.9 | -0.5 | 39 | 5 | 7 | +0.2 |

2016 | Mas. Tor. Azad | 13.3 | 5.6 | 2.2 | -0.1 | 51 | 6 | 12 | +0.2 |

2015 | Elena Chernova | 13.4 | 5.7 | 1.5 | +0.9 | 36 | 4 | 8 | +0.2 |

2016 | Maryam Bahrami | 14.4 | 6.1 | 2.1 | +1.8 | 45 | 5 | 10 | +0.2 |

2016 | Stan. Krasulin | 10.1 | 4.2 | 2.5 | -2.3 | 18 | 2 | 6 | +0.1 |

2016 | Milad Zangiabadi | 13.3 | 5.5 | 2.0 | +0.6 | 24 | 3 | 7 | +0.1 |

2015 | Yury Kartynnik | 10.4 | 4.2 | 1.3 | -0.9 | 27 | 3 | 8 | +0.0 |

2016 | M. R. Moghadam | 11.2 | 4.6 | 1.2 | +0.4 | 15 | 2 | 4 | +0.0 |

2013 | Fatih Akay | 12.8 | 5.2 | 1.7 | -0.3 | 42 | 5 | 10 | +0.0 |

2013 | Vladimir Shiltsev | 13.4 | 5.5 | 2.6 | -0.5 | 12 | 1 | 4 | +0.0 |

2015 | Drag. Jovković | 15.4 | 6.3 | 1.9 | +2.1 | 27 | 4 | 7 | +0.0 |

2016 | M. D. Aseman | 13.2 | 5.3 | 1.7 | +1.1 | 27 | 3 | 8 | -0.1 |

2013 | Marina Sergeeva | 13.5 | 5.4 | 1.5 | +0.3 | 51 | 6 | 12 | -0.1 |

2016 | A. Poostforush | 14.8 | 5.9 | 2.0 | +2.1 | 39 | 5 | 7 | -0.1 |

2014 | Elena Trufanova | 15.3 | 6.1 | 2.0 | +0.5 | 39 | 5 | 4 | -0.1 |

2015 | Lucija Papa | 13.1 | 5.1 | 1.3 | +0.1 | 21 | 3 | 5 | -0.2 |

2013 | Pınar Aytar | 13.5 | 5.3 | 1.3 | +0.2 | 30 | 4 | 10 | -0.2 |

2016 | Hassan Eslahi | 13.6 | 5.4 | 1.8 | +0.8 | 36 | 4 | 9 | -0.2 |

2015 | Oscar Rabinovich | 12.7 | 4.9 | 1.7 | -0.5 | 48 | 6 | 7 | -0.3 |

2013 | Natalia Borodina | 12.3 | 4.6 | 1.1 | +0.0 | 30 | 4 | 8 | -0.4 |

2013 | Necmettin Caner | 12.8 | 4.7 | 1.2 | -0.3 | 9 | 1 | 3 | -0.5 |

2013 | Sertaç Eroğlu | 11.7 | 3.9 | 1.9 | +0.5 | 15 | 2 | 5 | -0.9 |

2013 | Sabahattin Esen | 15.0 | 4.8 | 1.1 | +1.6 | 6 | 1 | 2 | -1.3 |

--- | Average | 12.7 | 5.8 | 2.0 | +0.1 | 34 | 4 | 8 | +0.6 |

--- | Best records | 10.1 | 7.3 | 1.1 | +0.0 | 57 | 6 | 14 | +2.3 |

- Color-coded
**status tags**reflect various roles of the IYNT Juror: violet tag is the Juror who acted as Chairperson while green tag is the Juror who did not act as Chairperson; blue tag is the independent Juror while red tag is the Team Leader was acted as Juror; **⟨**⟨*G*⟩ is arithmetic mean of all Grades delivered during the IYNT;*G*⟩=(5+10+15)/3=10 is our target for any Juror to allow for uniform and equal weight grading scales throughout the parallel Groups; only 3 Jurors went below the target, while 97 Jurors went above the target;if*σ*_{G}is standard deviation of all Grades delivered during the IYNT;*σ*_{G}is large, the Juror uses a broader spectrum of Grades and has more differentiation within their evaluation scale; note that*σ*_{G}does not necessarily reflect clearer separation of the Teams, cf. two notable hypothetical extremes of*σ*_{G}=13.7 for the set {30; 1; 1; 30; 1; 1} and*σ*_{G}=11.1 for the set {30; 1; 1; 1; 20; 10} in a two-Team SF; various limits for three types of performances result in a trend that a higher*σ*_{G}is more likely to appear for Jurors with a higher ⟨*G*⟩; theoretical slope of a trend, or baseline, is*σ*_{G}/⟨*G*⟩ for the set {30; 20; 10} or 0.40825; parameter*κ*corrects for this baseline;if*σ*_{G−P}is standard deviation of all residuals*G−P*;*σ*_{G−P}is small, the Grades are less scattered respective to the Grades of other Jurors and contribute to a smaller*ρ*; an implausible theoretical minimum is*σ*_{G−P}=0; real-life maximum and minimum records are 3.7 and 1.1;**⟨**if ⟨*G−P*⟩ is arithmetic mean of all residuals*G−P*;*G−P*⟩ is close to zero, the grading scale is less shifted respective to the individual scales of other Jurors; it is easy to notice moderate statistical noise that hinders the inherent correlation between ⟨*G−P*⟩ and ⟨*G*⟩;greater*n*_{G}is the number of Grades given;*n*_{G}means better statistics; theoretical caps depend on exact tournament brackets and were*n*_{G}=60 in 2013,*n*_{G}=45 in 2014, and*n*_{G}=54 in 2015 and 2016, though not accessible even for the Jurors working in each Science Fight due to such constraints as distribution of two-Team Groups;greater*n*_{SF}is the number of Science Fights judged;*n*_{SF}means better statistics; a theoretical cap with Semi-Finals is*n*_{SF}=6;greater*n*_{T}is the number of Teams judged;*n*_{T}means more opportunities to observe stronger and weaker Teams and thus have a more comparative judgment; a theoretical cap is number of Teams*N*but no more than 18; note that some Teams are judged more than once by the same Juror within one IYNT;*κ*is standard deviation of all delivered Grades corrected for the average Grade ⟨*G*⟩ via*κ*=*σ*_{G}−0.40825×⟨*G*⟩;*κ*reflects*relative*width of the spectrum of Grades used by the Juror and can be more suitable for comparison of Jurors with distinctly different ⟨*G*⟩; note that*κ*is linearly proportional to the relative value of*σ*_{G}/⟨*G*⟩.

The names are initially sorted by number of IYNTs judged, then by *κ*, then by ⟨*G*⟩. Click on the headers to have the table sorted by any desired parameter.

There is a complex interplay between each of the calculated parameters, and some of the crucial parameters depend not only on what *G* each Juror gives, but also on what *G* other Jurors in the same Group give. Other parameters depend on the lot, tournament brackets, or appointing decisions of the General Council, and are beyond control of the individual Juror. Persisting regularities seen for Jurors who worked at more that one IYNT suggest that any shifts in *σ*_{G} or ⟨*G*⟩, observed consistently in several Jurors, may reflect objective differences between separate IYNTs, viz. in *diversity* or *average strength* of participants.

It is interesting to notice that although many listed Jurors demonstrate similar values, there are particular Juror-to-Juror differences which are recognizable in separate IYNTs and not obscured by limited statistics. These differences in particular explain why Jurors and Teams rotate between the Groups, and *V* is a more representative derivative grading parameter than *SP*.

Whilst the Criterion of Victory *V* already alleviates any scaling differences, it would allow for extracting further fine-grained data if each future IYNT Juror

- is comfortable with lower Grades for weaker performances, and therefore stays centered closer to ⟨
*G*⟩=10, with preferably ⟨*G*⟩<13; - at the same time, works in a broader spectrum of high and low Grades, and thus has a larger
*σ*_{G}, with preferably*σ*_{G}>6; - at the same time, is balanced to have a moderate
*σ*_{G−P}, with preferably*σ*_{G−P}<3.

These three goals can naturally clash with each other. At this point it is important to realize that each Juror must be focused only on assessing immediate performances and sticking to uniform, scientific, merit-based grading criteria, and that furthermore each *G* must be independent and given individually.

### Effects of individual grading parameters

To illustrate the potential consequences of the spread in these grading parameters, let us consider a Gedankenexperiment with three Teams competing in one Science Fight.

Team 1 shows a relatively strong performance and receives the Grades which sit on the upper end of the [⟨*G*⟩−½*σ*_{G}…⟨*G*⟩+½*σ*_{G}] interval. In other words, should *G* be distributed normally for each selected Juror, the performances of Team 1 would be better than *Ф*(⟨*G*⟩+½*σ*_{G})=0.69 of all performances the Juror grades in the IYNT. Such a Team would potentially end up as a Finalist.

Team 2 shows an average performance and receives average Grades ⟨*G*⟩ from each Juror.

Team 3 shows a relatively weak performance and receives the Grades which sit on the lower end of the [⟨*G*⟩−½*σ*_{G}…⟨*G*⟩+½*σ*_{G}] interval. In other words, their performances are weaker than very approximately *Ф*=0.69 of all IYNT performances graded by the Juror. Such a Team would potentially not qualify for Semi-Finals.

These three Teams are graded simultaneously by two boards of selected Jurors. One board is composed of six Jurors with some of the lowest observed ⟨*G*⟩, while the other board is composed of six Jurors with some of the highest observed ⟨*G*⟩. It is easy to determine the results of this hypothetical Science Fight because ⟨*G*⟩ and *σ*_{G} are publicly known for each Juror.

These results illustrate the level of tolerance of the Criterion of Victory *V* and Sum of Points *SP* to the most severe effects of improbably unbalanced boards of Jurors. As seen from this calculation, the strong Team 1 graded by low-⟨*G*⟩ Jurors obtains less points than the average Team 2 graded by high-⟨*G*⟩ Jurors and ties with the weak Team 3. The weak Team 3 graded by high-⟨*G*⟩ Jurors, respectively, earns more points than the average Team 2 graded by low-⟨*G*⟩ Jurors. An artificial selection of Jurors in this test leads to unrealistically small *σ*_{g−P} and *ρ*_{SP} in both boards of Jurors.

There is however *no negative effect* on the Criterion of Victory *V* and consequently the results of the Science Fight.

In the next Gedankenexperiment, let us rotate and evenly distribute the same Jurors as routinely made before each real Science Fight.

Though the Jurors give the same Grades *G* as in the first experiment, their balanced distribution now mitigates the effects on *SP* of Juror-to-Juror differences in grading. Note that this happens at the cost of increased *σ*_{g−P} and *ρ*_{SP}, which both now fall in the range of typical IYNT values despite an articifial, bimodal distribution of ⟨*G*⟩. Although we cannot generalize from one example, the respective Sums of Points *SP* from both boards of Jurors now differ by only 1.5, 0.6, and 0.6 points.

In this *extreme value analysis*, we test a statistically improbable scenario which would have some of the worst impacts on the stability of Science Fight results. We test extreme values of ⟨*G*⟩ and the most unrealistic distribution of Jurors, and observe the amplitude of *fluctuations* in *SP* which always fall within a 2 points threshold.

### Grading parameters of separate IYNTs

The table below provides an overview of statistical parameters for the Grades given by all Jurors within one IYNT, and the overall statistics for all four IYNTs.

Year | ⟨G⟩ | σ_{G} | s_{J} | s_{SV4} | s_{TSP4} | σ_{G−P} | n_{G} | n_{st} | n_{T} | n_{J} | n_{ch} | V=1 | V=½ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

2013 | 12.5 | 5.8 | 0.11 | 0.43 | 0.12 | 2.0 | 1422 | 78 | 16 | 41 | 11 | (96±3)% | (99.98±0.03)% |

2014 | 14.1 | 6.7 | 0.15 | 0.52 | 0.20 | 2.2 | 423 | 23 | 5 | 12 | 5 | (95±3)% | (99.99±0.01)% |

2015 | 13.0 | 6.0 | 0.10 | 0.29 | 0.09 | 2.4 | 969 | 49 | 10 | 27 | 10 | (94±3)% | (99.99±0.02)% |

2016 | 12.3 | 6.2 | 0.13 | 0.58 | 0.24 | 2.8 | 1284 | 69 | 16 | 40 | 9 | (91±5)% | (99.94±0.08)% |

All | 12.7 | 6.1 | 0.12 | 0.47 | 0.19 | 2.4 | 4098 | 219 | 47 | 100 | 23 | (94±4)% | (99.97±0.05)% |

- ⟨
*G*⟩ is arithmetic mean of all Grades delivered during the IYNT by all Jurors; *σ*_{G}is standard deviation of all Grades delivered during the IYNT by all Jurors;*s*_{J}=*σ*_{⟨G⟩J}/⟨⟨*G*⟩_{J}⟩ is relative spread in ⟨*G*⟩ between all Jurors at the IYNT; lower values correspond to less Juror-to-Juror grading differences;*s*_{SV4}=*σ*_{SV4}/⟨*SV*_{4}⟩ is relative spread in the Sum of Victories after SF 4; higher values correspond to stronger diversity between Teams and better separation in the ranking;*s*_{TSP4}=*σ*_{TSP4}/⟨*TSP*_{4}⟩ is relative spread in Total Sum of Points after SF 4; higher values correspond to stronger diversity between Teams in terms of points and contribute to better separation in the ranking;*σ*_{G−P}is standard deviation of all residuals*G−P*during the IYNT;*n*_{G}is the number of Grades given;*n*_{st}is the number of Stages in the IYNT;*n*_{T}is the number of Teams in the IYNT;*n*_{J}is the number of individual Jurors in the IYNT;*n*_{ch}is the number of individual Chairpersons in the IYNT;*V*=1 is the mean and standard deviation of the confidence levels for the interval [*SP*−2...60] in all SFs;*V*=½ is the mean and standard deviation of the confidence levels for the interval [*SP*−10...60] in all SFs.

By comparing *s*_{SV4} with *s*_{TSP4}, we can see that the Criterion of Victory prevents a *melting pot* effect in TSP where Total Sums of Points can converge to rather similar values for many Teams.

### Grading parameters of separate SFs (upd 2015)

Teams and Jurors alike rapidly learn from SF to SF. One may argue that the Teams may start showing less diverge performances, or the Jurors may start giving less diverge Grades *G*. The following data, for three IYNTs together, is of interest to assess the importance of these two effects. Note that Semi-Finals and Finals obviously include the data for the respective participants only.

SF No. | ⟨SP⟩ | σ_{SP} | s_{SP} | ⟨V⟩ | σ_{V} | s_{V} |
---|---|---|---|---|---|---|

Selective SF 1 | 35.7 | 9.1 | 0.26 | 0.63 | 0.34 | 0.53 |

Selective SF 2 | 38.0 | 6.9 | 0.18 | 0.66 | 0.39 | 0.59 |

Selective SF 3 | 37.8 | 7.0 | 0.18 | 0.63 | 0.36 | 0.57 |

Selective SF 4 | 38.3 | 6.1 | 0.16 | 0.68 | 0.35 | 0.52 |

Semi-Finals | 38.9 | 3.3 | 0.09 | 0.69 | 0.30 | 0.44 |

Finals | 44.8 | 4.7 | 0.11 | 0.70 | 0.40 | 0.57 |

### Summary

Overall, the presented results define the extent to which the results of single paired comparisons of *SP* are not yet obscured by statistical noise. With the available data, we can conclude that the IYNT procedures and in particular the Criterion of Victory *V* alleviate Group-to-Group and Juror-to-Juror scaling differences, and allow separation of each Team in the IYNT with a two-sigma significance threshold.

### Comparative results of real IYNT Teams

Click on the headers to have the table sorted by any desired parameter. The Teams are initially sorted by Criterion of Victory in the Finals (*V*_{F}), then by Sum of Points in the Finals (*SP*_{F}), then by Criterion of Victory in the Semi-Finals (*V*_{sF}), then by Sum of Points in the Semi-Finals (*SP*_{sF}), then by Sum of Victories after Selective SF 4 (*SV*_{4}), then by Total Sum of Points after Selective SF 4 (*TSP*_{4}). Final Rank (*R _{F}*) and the type of Medal (M) reflect the results of each IYNT according to the regulations valid at the time.

Year | Team name | SV_{4} |
TSP_{4} |
V_{sF} |
SP_{sF} |
V_{F} |
SP_{F} |
R_{F} |
M |
---|---|---|---|---|---|---|---|---|---|

2013 | Belarus-Universum | 4 | 177.1 | 1 | 43.3 | 1 | 50.8 | 1 | |

2014 | Georgia "Georgians" | 3 | 190.6 | — | — | 1 | 50.5 | 1 | |

2014 | Bulgaria-Sofia | 4 | 206.1 | — | — | 1 | 49.3 | 2 | |

2016 | Georgia-Georgians | 4 | 172.4 | 1 | 45.0 | 1 | 46.7 | 1 | |

2015 | China | 3 | 163.0 | 1 | 40.6 | 1 | 45.3 | 1 | |

2015 | Georgia-Georgians | 3½ | 165.4 | 1 | 41.4 | 1 | 45.1 | 2 | |

2015 | Croatia | 3½ | 175.6 | 1 | 40.7 | 1 | 44.2 | 3 | |

2014 | Serbia | 3½ | 184.3 | — | — | ½ | 46.5 | 3 | |

2013 | Georgia-Raveko | 3 | 174.4 | 1 | 45.1 | ½ | 42.8 | 2 | |

2016 | Belarus-Pahonia | 4 | 165.1 | 1 | 42.5 | ½ | 42.4 | 2 | |

2016 | Croatia | 4 | 173.5 | 1 | 40.1 | ½ | 41.1 | 3 | |

2013 | Turkey-Bahçeşehir PES | 4 | 173.8 | ½ | 38.3 | 0 | 37.2 | 3 | |

2013 | Moldova-Eco Generation | 3½ | 154.6 | 1 | 41.7 | 0 | 36.4 | 4 | |

2013 | Russia-TMOLimpiycy | 3½ | 167.5 | 1 | 40.0 | — | — | 7 | |

2016 | Iran-Gifted | 2½ | 132.7 | 1 | 38.8 | — | — | 4 | |

2013 | Bulgaria-Bulgaria | 3 | 170.0 | ½ | 39.1 | — | — | 6 | |

2015 | Serbia-1 | 3½ | 174.3 | ½ | 39.0 | — | — | 4 | |

2013 | Russia-MG 12 | 3 | 160.1 | ½ | 39.0 | — | — | 8 | |

2016 | Iran-Khashayar | 2 | 115.5 | ½ | 37.4 | — | — | 8 | |

2013 | Russia-RLC | 3 | 173.0 | ½ | 37.1 | — | — | 5 | |

2015 | Serbia-2 | 2½ | 154.4 | ½ | 36.0 | — | — | 6 | |

2013 | Afghanistan-Ariana | 3 | 155.5 | ½ | 35.8 | — | — | 9 | |

2016 | Iran-Mehr | 2½ | 155.4 | ½ | 35.7 | — | — | 5 | |

2015 | Belarus-Spectrum | 3 | 174.4 | ½ | 32.9 | — | — | 5 | |

2016 | Iran-Maple | 2½ | 145.3 | ½ | 31.4 | — | — | 7 | |

2013 | Bulgaria-Science Girls | 3 | 153.4 | 0 | 32.5 | — | — | 10 | |

2016 | Russia-Voronezh | 3 | 147.7 | 0 | 30.4 | — | — | 6 | |

2016 | Iran-Besat 1 | 2½ | 113.0 | 0 | 30.2 | — | — | 9 | |

2014 | Russia "Vinegret" | 1½ | 146.4 | — | — | — | — | 4 | |

2014 | Bulgaria-Kyustendil "R/" | ½ | 113.4 | — | — | — | — | 5 | |

2016 | Iran-Black Intelligence | 3 | 161.9 | — | — | — | — | 10 | — |

2013 | Turkey-Fatih Eskişehir | 2½ | 146.9 | — | — | — | — | 11 | — |

2015 | Bulgaria | 2½ | 140.9 | — | — | — | — | 7 | — |

2013 | Kazakhstan-NIS D/team | 2½ | 130.3 | — | — | — | — | 15 | — |

2015 | Russia-Voronezh-1 | 2 | 161.6 | — | — | — | — | 8 | — |

2015 | Russia-Voronezh-2 | 2 | 143.3 | — | — | — | — | 9 | — |

2013 | Iran-Iran | 1½ | 145.9 | — | — | — | — | 12 | — |

2016 | Iran-Besat 2 | 1½ | 116.7 | — | — | — | — | 11 | — |

2015 | Russia-MG 12 | 1 | 138.4 | — | — | — | — | 10 | — |

2013 | Kyrgyzstan-Alatoo | 1 | 137.6 | — | — | — | — | 13 | — |

2013 | Ukraine-Richelieu | 1 | 137.5 | — | — | — | — | 14 | — |

2016 | Iran-Free Thought | 1 | 87.9 | — | — | — | — | 12 | — |

2016 | Iran-Amordad | ½ | 118.2 | — | — | — | — | 13 | — |

2016 | Iran-Paramount Notion | ½ | 106.4 | — | — | — | — | 14 | — |

2016 | Iran-Farhang | ½ | 85.2 | — | — | — | — | 15 | — |

2016 | Iran-Velayat | ½ | 75.5 | — | — | — | — | 16 | — |

2013 | Turkey-Samanyolu | 0 | 107.9 | — | — | — | — | 16 | — |