Grading

Basic principles

In all Science Fights, the Jury evaluates the Team performances by publicly showing integer scores called Grades or G.

The Grades G from all n Jurors in a Group are used to calculate the Average Point P. Two extreme Grades, one maximum and one minimum, are replaced with one grade equal to their arithmetic mean. In the next step, P is determined as the arithmetic mean of the new data set of n−1 grades. This procedure has the advantage of weighting outliers less heavily. P is rounded to the nearest 0.1 points. An example below (see a pdf file) illustrates this procedure with some real data from the 2nd IYNT 2014 (2014-1-A-II-Rep.)



Each Team completes three performances (Reporter, Opponent, and Reviewer) in each Science Fight and earns three Average Points P which are summed up to obtain the Sum of Points SP. In case the Team violates particular rules of the IYNT, Yellow Cards are issued and the SP is consequently reduced.

The values of SP in each Group are used to calculate the Criterion of Victory V which is set to V=1 for the Team with the highest SP and for one or two Teams which have an SP that differs from the top result by no more than 2 points (SPSPmax−2.) For the Teams in the Group which have (SPmax−10)≤SP<(SPmax−2), the Criterion of Victory is set to V=½. For the Teams which have SP<SPmax−10, V=0. The Criterion of Victory is the primary parameter that determines the placing and rank of each Team. It minimizes effects of statistical noise and Juror-to-Juror differences in grading.


Distributions of G and P

During the IYNTs 2013 through 2017 taken together, 6042 Grades were delivered in 303 stages. Each of the 303×3=909 performances therefore obtained its Grades G from an average of n=6.6 Jurors. A total of 109 round-robin matches (Science Fights in a Group) were played with three or two Teams each. In them, the Teams collected 303 values of SP and 303 values of V.



The graph (hi-res image, raw ASCII data) shows a fitted histogram of the Grades for each type of performance (Report, Opposition, and Review.)

The spread in the G along the X-axis is broad, indicating that nearly a whole spectrum of available G is used by the Jury, however the extreme G are inherently less frequent. We encourage each Juror to stay centered around 15 points for a Report, around 10 points for an Opposition, and around 5 points for a Review, each time weighting their G against what they believe an average IYNT performance is. In reality however the actual mean and standard deviation are 17.9±5.1 for a Report, 12.3±3.4 for an Opposition, and 6.7±1.8 for a Review, indicating that an average Juror shifts all their G to the right as compared to our guidelines. The distributions are moderately asymmetric, with the median Grades of 18 for a Report, 13 for an Opposition, and 7 for a Review.

In turn, the mean and standard deviation for the Average Points P are 17.9±4.1 for a Report, 12.3±2.8 for an Opposition, and 6.7±1.4 for a Review (raw ASCII data.) Such distributions of P are narrower than the respective distributions of G.


Spread of the Grades

At this point it is important to realize that the IYNT requires a direct comparison of results from parallel Groups, whilst each Team does not play every other Team during one Science Fight. We should therefore be aware of the extent to which parallel boards of Jury can be influenced by fluctuations and what grading parameters are uniformly objective indications of the relative strengths of all IYNT contestants.

A consistent grading is extremely important for the IYNT as it must allow reliable identification of winners in each Group and the ultimate winners of the competition.

Let us consider three hypothetical and extreme scenarios, Example 1, Example 2, and Example 3.



In the Example 1, all Jurors agree with each other and there is high confidence that Average Points P reflect objective differences between the two Teams. For each Juror and each Grade, G−P=0.

The Example 2 shows a situation where both Teams obtain equal Grades from the middle of the spectrum. Although it is natural that some games can end in a tie, this scenario is less advantageous. If no Team can impress the Jury or the Jury lacks sensitivity to hidden variances between the Teams, it is difficult to rank all Teams from top to bottom.

The Example 3 depicts an undesired event where different Jurors use radically different grading criteria. Team 1 earns 0.1 points more than Team 2, however the level of confidence to this difference is low because σG−P, or the spread in G given to one performance by several Jurors, is much wider than these eventual 0.1 points. Albeit both Average Points P are nearly equal to the pair of P in the Example 2, these two P do not reflect the serious differences between the Teams that each Juror has noticed and highlighted in their grading. Concluding that Team 1 shows a better performance than Team 2, or either Team in Example 2, is inconsistent with this set of Grades G.

Our aim is that separate Jurors put very similar Grades for one performance, however each Juror puts distinctly different Grades for different performances.


Grading criteria

During each IYNT, we brief Jurors and Teams on our grading and scoring criteria. Our guidelines have evolved since 2013, and as of now consist of four partial grading criteria. Our aim is to keep the guidelines clear and simple, and make sure that any Juror relies on the fixed, common criteria when evaluating performances across the parallel Groups. The criteria are printed directly on the individual Juror's protocols.



The Jurors are asked to add or subtract points from a starting grade (15, 10, or 5) and decide on their final Grade G. Such a decision is individual, and an upwards of 99.7% performances cause come disagreement in the Grades given by Jurors and a spread of these Grades. No Grade can be corrected retroactively, and each Juror must justify any of their Grades upon the request of Team Captains or the Chairperson. Each G is public.

Find below the blank Juror's protocols used at the 5th IYNT 2017, as well as the slides from the most recent introductory briefing for Jurors and Teams.

  • Blank individual Juror's Protocol, A4 size (2017/05/29) [pdf]
  • Briefing for Jurors and Teams, slides by Ilya Martchenko (2017/06/30) [pdf]


Distributions of G−P

It is now good to look at the real data from the five previous IYNTs and analyze the spread in the G given by different Jurors to one performance in one Science Fight.



The graph (hi-res image, raw ASCII data) shows each of the 6042 Grades G given during the first five IYNTs. X-coordinate indicates which of the 30 possible Grades G was given. Y-coordinate indicates the difference between this particular G and the Average Point P that was calculated on its basis.

To interpret the spread in the Y-axis, one should remember that if Example 1 would take place in each IYNT Stage, each G would be equal to its respective P, and G−P would globally collapse to zero. Luckily, the IYNT is not a paper-and-pencil exam, and its Jurors have opinions which result in a distribution of individual G around the P in each Stage.

The standard deviations of the distributions of G−P for the three types of performances are found as follows: 3.23 for a Report, 2.18 for an Opposition, and 1.23 for a Review. These particular values are calculated with two extreme G in each Group taken with the weight of 1, rather than ½.


Statistical significance of SF results

Since the extreme G contribute to statistics of P with the weight of ½, we can prepare the working dataset of 6042−3×303=5133 processed grades that we label g (raw ASCII data for 2013, 2014, 2015, 2016, and 2017.) g=G if the respective G is not extreme in the Group, and g=½(Gmax+Gmin) for the pairs of extreme Gmax and Gmin in each Group.

The statistical parameters of the distributions of residuals g−P now provide crucial information on the statistical significance of each P, and in turn SP, and the IYNT rankings.

In case the SF results do not permit rejecting the null hypothesis that a slightly higher SP in a round-robin Science Fight is observed by chance, more than one Team earns V=1. The Sum of Victories SV keeps track of such statistically significant cases wherin one or several SF winners step forward.

We can define the significance of V=1 as the level of statistical confidence for the interval [SP−2...60] in one Science Fight Group. This statistical significance depends only on the values of g−P and number of Jurors n in the Group, and does not directly depend on the absolute magnitudes of G. This has the advantage of placing focus on congruence between opinions of Jurors, rather than ranking of Teams.

To illustrate how to compute the confidence of [SP−2...60] and [SP−10...60], let us analyze in depth one example of a round-robin Science Fight (Finals of the 4th IYNT 2016, see a hi-res pdf file.)



The standard deviations σ of the distributions of g−P for each of three types of performances in this SF are found as follows: σREP=1.91 for a Report, σOPP=1.36 for an Opposition, and σREV=0.65 for a Review (each from a sample of 27 residuals g−P.) As per the IYNT procedure, SP is calculated as a sum of Average Points P for these three performances. By assuming σ2SP=σ2REP+σ2OPP+σ2REV, we can easily find σSP=2.43 in these Finals of the 4th IYNT. It is now easy to determine the root-mean-square deviation ρSP by defining ρ2SP=σ2SP/(n−1), where number of Jurors is n=10.

The value of ρSP=0.81 evaluates the standard error of the mean, i.e. the statistical uncertainty of SP earned by any Team in the SF. This standard error is inherent in estimating whatever true value of SP from limited statistics. If the difference between any two sample-based SPi and SPj is comparable to or less than ρSP, they can be assumed statistically indistinguishable.

In the next step, we can find the confidence level for interval [SP−2...60] as a function of the number of degrees of freedom in a representative sample (i.e. n−1) and Student's t-score (i.e. 2/ρSP.) This is based on assuming that Student's t-distribution is a valid approximation.

This calculation yields the confidence level of 98.2% for the interval [SP−2...60], above a two-sigma significance threshold. In the Finals of other previous IYNTs, confidence levels for the interval [SP−2...60] were 96.1% in 2017, 94.8% in 2015, 98.8% in 2014, and 98.2% in 2013. A similar calculation yields the confidence level of 99.99999% for the interval [SP−10...60] in 2017 (above five-sigma), 99.99997% in 2016 (above five-sigma), and 99.99986% in 2015, 99.99971% in 2014, and 99.99827% in 2013 (above four-sigma in each case.)

The Table below summarizes the paramateres calculated in the similar manner for all 109 round-robin Science Fights of past five IYNTs. We cordially acknowledge Dmitriy Baranov for his help in processing the data. Click on the headers to have the table sorted by any desired parameter.

SFσREPσOPPσREVσSFρSPt2nV=1V
2013-1-A1.601.390.722.240.912.19796.4%100.0%
2013-1-B3.022.340.703.881.741.15684.9%99.89%
2013-1-C0.991.260.471.670.842.39596.3%99.99%
2013-1-D1.181.080.531.690.752.65697.7%100.0%
2013-1-E1.162.780.423.041.521.31587.1%99.86%
2013-1-F1.231.410.351.910.852.34696.7%100.0%
2013-2-A1.440.770.691.770.792.52697.4%100.0%
2013-2-B1.670.960.772.080.932.15695.8%99.99%
2013-2-C2.291.290.822.761.231.62691.7%99.98%
2013-2-D1.320.850.681.710.862.34596.0%99.98%
2013-2-E1.441.100.461.870.842.39696.9%100.0%
2013-2-F0.991.610.802.050.842.39797.3%100.0%
2013-3-A1.080.710.551.400.702.85597.7%99.99%
2013-3-B0.980.850.301.330.673.00598.0%99.99%
2013-3-C1.780.981.062.291.151.74592.2%99.95%
2013-3-D0.880.660.701.300.653.07598.1%99.99%
2013-3-E0.482.210.462.301.151.74592.1%99.95%
2013-3-F1.460.870.771.870.842.39696.9%100.0%
2013-4-A1.340.950.531.730.772.59697.6%100.0%
2013-4-B0.200.860.350.950.474.22599.3%100.0%
2013-4-C1.121.230.741.820.912.20595.3%99.98%
2013-4-D1.210.770.371.470.663.03698.6%100.0%
2013-4-E1.431.000.461.810.902.21595.4%99.98%
2013-4-F1.890.770.882.231.002.01694.9%99.99%
2013-S-A1.360.970.911.900.852.35696.7%100.0%
2013-S-B1.641.440.792.320.882.28897.2%100.0%
2013-S-C1.501.080.661.960.742.70898.5%100.0%
2013-F-A1.811.261.022.430.812.471098.2%100.0%
2014-1-A2.421.950.713.181.421.40689.0%99.95%
2014-1-B2.550.870.672.781.241.61691.6%99.98%
2014-2-A1.751.180.522.170.972.06695.3%99.99%
2014-2-B1.200.840.691.620.722.76698.0%100.0%
2014-3-A2.321.170.352.621.171.71692.6%99.98%
2014-3-B2.041.030.682.391.071.87694.0%99.99%
2014-4-A1.021.150.771.720.772.60697.6%100.0%
2014-4-B2.041.150.452.391.071.87694.0%99.99%
2014-F-A1.290.860.561.650.672.97798.8%100.0%
2015-1-A2.961.230.673.281.471.36688.5%99.95%
2015-1-B1.370.950.971.930.862.32696.6%100.0%
2015-1-C1.420.940.571.800.802.49697.2%100.0%
2015-1-D1.060.820.581.460.732.74597.4%99.99%
2015-2-A3.023.281.234.621.891.06783.5%99.91%
2015-2-B1.571.010.832.050.842.39797.3%100.0%
2015-2-C1.771.260.752.301.031.95694.5%99.99%
2015-2-D0.771.430.681.770.792.53697.4%100.0%
2015-3-A2.491.260.412.821.151.74793.4%99.99%
2015-3-B0.901.420.591.780.802.51697.3%100.0%
2015-3-C2.141.160.712.541.131.76693.1%99.98%
2015-3-D2.031.090.552.371.061.89694.1%99.99%
2015-4-A1.121.130.891.820.812.45697.1%100.0%
2015-4-B1.470.990.631.880.842.38696.8%100.0%
2015-4-C1.951.340.622.451.091.83693.6%99.99%
2015-4-D2.240.730.632.441.091.83693.7%99.99%
2015-S-A2.300.951.122.731.111.79793.9%99.99%
2015-S-B2.340.841.002.681.091.83794.1%100.0%
2015-F-A3.211.760.823.751.131.771294.8%100.0%
2016-1-A3.451.960.844.061.821.10684.0%99.87%
2016-1-B2.661.700.763.251.451.38688.7%99.95%
2016-1-C2.621.520.953.171.421.41689.1%99.96%
2016-1-D1.901.580.662.561.151.75692.9%99.98%
2016-1-E2.291.541.263.041.361.47690.0%99.96%
2016-1-F2.402.131.153.411.521.31687.7%99.94%
2016-2-A1.971.330.662.471.111.81693.5%99.99%
2016-2-B0.950.690.481.270.573.52699.2%100.0%
2016-2-C1.791.000.382.081.041.92593.6%99.97%
2016-2-D2.041.770.722.801.251.60691.5%99.98%
2016-2-E2.401.450.662.881.441.39588.1%99.89%
2016-2-F1.592.191.303.001.341.49690.2%99.97%
2016-3-A1.991.400.602.511.021.95795.1%100.0%
2016-3-B3.092.551.254.191.881.07683.3%99.84%
2016-3-C2.371.580.712.941.201.68792.7%99.99%
2016-4-A3.681.440.694.012.001.00581.3%99.62%
2016-4-B2.331.240.782.751.231.63691.8%99.98%
2016-4-C2.391.730.573.011.501.33587.3%99.87%
2016-4-D2.521.650.703.091.381.45689.6%99.96%
2016-4-E0.761.061.051.670.742.67697.8%100.0%
2016-4-F1.381.460.822.171.081.84593.1%99.96%
2016-S-A2.191.630.882.871.171.71793.1%99.99%
2016-S-B3.482.031.224.211.721.16785.6%99.94%
2016-S-C2.101.100.602.451.002.00795.4%100.0%
2016-F-A1.911.360.652.430.812.471098.2%100.0%
2017-1-A2.121.861.002.991.131.77894.0%100.0%
2017-1-B2.541.480.783.041.151.74893.7%100.0%
2017-1-C2.281.830.753.011.231.63792.2%99.99%
2017-1-D2.371.190.642.731.121.79793.8%99.99%
2017-1-E2.291.170.842.701.101.81794.0%99.99%
2017-1-F2.361.210.752.761.131.77793.7%99.99%
2017-2-A2.521.790.933.221.321.52791.0%99.99%
2017-2-B3.802.291.354.641.751.14885.4%99.96%
2017-2-C1.831.421.222.621.071.87794.5%100.0%
2017-2-D2.591.440.673.031.241.62792.1%99.99%
2017-2-E1.481.240.832.110.802.51898.0%100.0%
2017-2-F2.551.160.732.891.181.69792.9%99.99%
2017-3-A2.040.951.172.541.041.93794.9%100.0%
2017-3-B2.611.220.953.031.241.62792.1%99.99%
2017-3-C2.021.961.243.071.261.59791.9%99.99%
2017-3-D3.091.180.753.391.381.45790.1%99.98%
2017-3-E1.841.390.762.430.922.18896.7%100.0%
2017-3-F2.761.621.003.351.371.46790.3%99.98%
2017-4-A1.840.890.562.120.872.31797.0%100.0%
2017-4-B2.051.370.992.661.081.84794.3%100.0%
2017-4-C2.241.961.053.161.291.55791.4%99.99%
2017-4-D2.331.290.612.731.121.79793.8%99.99%
2017-4-E2.221.340.912.751.121.78793.7%99.99%
2017-4-F2.481.340.972.981.221.64792.4%99.99%
2017-S-A2.041.771.112.921.031.94995.6%100.0%
2017-S-B3.561.981.094.221.491.34989.2%99.99%
2017-S-C3.491.911.074.121.461.37989.6%99.99%
2017-F-A3.431.961.044.091.061.891696.1%100.0%

In the IYNTs 2013 through 2017, there were 134 instances of V=1, 117 instances of V=½, and 52 instances of V=0 (including the 1st IYNT 2013 retrospectively.) If compared to the total number of round-robin Science Fights (109), these figures suggest that (134−109)/109=23% of cases reflect the event that a second of third Team in a Science Fight is indistinguishable from the Team with the maximum SP and also earns their V=1 (raw ASCII data.)

In a typical Science Fight, earning a V=½ or above is a four-sigma event, with an expected confidence level of (99.98±0.05)% for the interval [SP−10...60]. Earning a much more rewarding V=1 is a two-sigma event, with an expected confidence level of (94±4)% for the interval [SP−2...60]. Earning each V is a statistically independent event, and earning several V=1 further contributes to the confidence in the placing of top IYNT Teams.

These results justify the importance of the Criterion of Victory V and importance of the fact that no IYNT Stage has ever been graded by less than 5 Jurors. As argued below, besides having different opinions, individual Jurors may also work on different grading scales. At all times, when looking at the IYNT scores, we ask whether their difference is representative of a real difference between Teams or whether it is a statistical fluke. An especially high level of significance is demanded if grading parameters are used to resolve the placing of eventual Semi-Finalists and Finalists.

By assuming that the grading parameters of Jurors would not improve considerably before the 6th IYNT 2018, we may estimate the average expected level of confidence for [SP−2...60] as a function of the number of Jurors n randomly selected to one Group. To do so, we can re-calculate ρSP from a historically global σSP.



Grading parameters of individual Jurors

For reference purposes, this table summarizes the individual grading parameters (within one IYNT) of all existing 134 Jurors. We cordially acknowledge Dmitriy Baranov for his help in processing the data. The presented parameters give a glimpse of the Jurors' perceptions of the grading scale in the IYNT. Click on the headers to have the table sorted by any desired parameter.

Yr, StNameGσGσG−PG−PnGnSFnTκ
2017 Ilya Martchenko12.15.82.4-0.254612+0.9
2016 Ilya Martchenko10.36.13.0-1.94258+1.9
2015 Ilya Martchenko12.76.12.8-0.14868+0.9
2014 Ilya Martchenko12.26.72.1-1.03955+1.7
2013 Ilya Martchenko12.56.02.1-0.348610+0.9
2017 Mladen Matev11.14.72.7-0.954612+0.2
2016 Mladen Matev11.85.62.4-1.151611+0.8
2015 Mladen Matev12.15.12.1-1.34869+0.2
2014 Mladen Matev10.95.32.5-1.43044+0.9
2013 Mladen Matev12.55.53.0+0.039510+0.4
2017 Evgeny Yunosov14.26.72.9+2.345513+0.9
2016 Evgeny Yunosov13.35.51.6+0.11825+0.1
2015 Evgeny Yunosov14.36.11.8+0.43349+0.3
2014 Evgeny Yunosov14.46.71.7+1.23955+0.8
2017 Andrei Klishin12.66.93.2+0.054611+1.8
2016 Andrei Klishin10.86.03.0-1.851610+1.6
2015 Andrei Klishin13.56.93.5-0.44869+1.4
2017 Alena Kastenka10.85.81.9-0.345512+1.4
2016 Alena Kastenka10.55.92.5-1.03048+1.6
2015 Alena Kastenka11.55.62.2-0.53346+0.9
2015 Gur. Mikaberidze11.94.61.9-0.93957-0.3
2014 Gur. Mikaberidze11.05.62.2-1.62444+1.1
2013 Gur. Mikaberidze11.15.12.0-0.333410+0.6
2017 Danko Marušić10.64.32.4-0.545511+0.0
2016 Danko Marušić11.25.13.7-1.83957+0.5
2015 Danko Marušić11.55.32.8-0.73958+0.6
2017 Milen Kadiyski13.15.61.6+1.154611+0.3
2015 Milen Kadiyski14.66.61.9+1.14869+0.6
2014 Milen Kadiyski16.36.91.8+1.14255+0.2
2017 Dmitriy Agarkov12.34.91.7-0.31825-0.1
2014 Dmitriy Agarkov14.66.21.3-0.93954+0.2
2013 Dmitriy Agarkov12.75.61.4-0.645510+0.4
2017 Andrey Kravtsov14.86.01.8+1.43649+0.0
2015 Andrey Kravtsov13.65.62.1+1.12137+0.0
2014 Andrey Kravtsov15.16.72.1-0.43954+0.5
2016 Dina Izadi11.97.21.6+0.8913+2.3
2013 Dina Izadi10.44.92.1-1.048511+0.7
2017 Ivan Syulzhyn10.34.82.3-1.445512+0.6
2016 Ivan Syulzhyn9.85.61.3-1.73347+1.6
2017 Val. Lobyshev11.86.12.0+0.454612+1.3
2013 Val. Lobyshev13.56.32.0+1.336410+0.8
2016 Dmitry Zhukalin12.86.92.9+0.12436+1.7
2015 Dmitry Zhukalin13.65.71.4+0.73657+0.1
2015 Aleks. Dimić12.65.91.4-0.33347+0.8
2014 Aleks. Dimić12.96.11.9-0.33044+0.8
2017 Wang Sihui10.45.01.4-0.136410+0.8
2015 Wang Sihui13.35.91.5-0.32746+0.5
2017 Giorgi Khomeriki10.04.61.6-0.645513+0.5
2016 Giorgi Khomeriki7.73.62.3-3.12437+0.5
2017 Stan. Krasulin9.44.42.0-0.736411+0.6
2016 Stan. Krasulin10.14.22.5-2.31826+0.1
2016 Nika Sabashvili11.55.01.9-1.03048+0.3
2015 Nika Sabashvili12.65.41.9-0.13347+0.3
2016 Som. Mahmoodi12.67.11.3+0.73349+2.0
2016 N. Seliverstova12.87.21.5+0.21224+2.0
2016 Nikita Datsuk11.86.72.8-1.03348+1.9
2015 D. Radovanović12.36.83.6-0.43958+1.8
2016 Jalil Sedaghat11.76.43.1+0.22437+1.6
2015 Ivan Reznikov11.76.43.3-1.14869+1.6
2017 Klim Sladkov9.05.13.0-2.245513+1.4
2016 Samuel Byland11.15.92.2-1.451610+1.4
2013 Igor Evtodiev10.15.42.0-1.44259+1.3
2013 Alina Astakhova11.86.12.2+0.157614+1.3
2013 Naime Arslan12.46.41.9+1.9913+1.3
2016 Ahmad Sheikhi13.16.63.0+1.436410+1.3
2017 Marc Bitterli11.15.72.1-0.52737+1.2
2013 Ismail Kiran11.76.01.8-0.14258+1.2
2017 Wang Lin11.76.01.7+1.02739+1.2
2016 Af. Montakhab12.06.11.8-0.12136+1.2
2015 Aleks. Suvorova14.27.03.0+0.84567+1.2
2016 Roya Radgohar14.37.02.7+1.93048+1.2
2016 Azizolah Azizi14.77.22.2+1.748610+1.2
2016 Laura Guerrini10.45.32.3-2.245510+1.1
2013 Celalettin Baykul12.76.32.9+0.41825+1.1
2013 Jeyhun Jabarov14.67.12.5+0.739510+1.1
2016 Dmitii Dorofeev10.25.22.3-2.15169+1.0
2013 Jevhen Olijnyk10.85.41.5-0.73958+1.0
2013 Ersin Karademir11.85.81.6+0.33959+1.0
2016 Marzieh Afkhami13.46.51.9+1.52436+1.0
2016 M. Sadat Tahami13.56.52.7+1.142510+1.0
2014 D. Karashanova15.47.31.6+0.63345+1.0
2017 Gal. Onoprienko10.25.12.7-0.545510+0.9
2017 Song Yi11.65.61.7+0.836410+0.9
2013 Diana Kovtunova13.06.21.2-0.142510+0.9
2013 Ahmet Çabuk13.16.21.5+0.32739+0.9
2013 Antoan. Nikolova13.36.31.9+0.742510+0.9
2015 Vesna Vasić15.77.31.7+2.3913+0.9
2017 Luisa Schrempf10.85.22.1-0.854613+0.8
2013 Aliaks. Mamoika12.25.81.6-0.442510+0.8
2017 Alexandr Nadeev12.25.82.1+0.454614+0.8
2013 Vlad. Vanovskiy13.26.22.2+0.054612+0.8
2017 Liu Lisa13.16.12.0+1.454614+0.8
2015 Dušan Dimić14.96.92.2+1.04869+0.8
2013 Buras Boljiev11.45.41.7-0.33046+0.7
2016 Ban. Rastegari13.96.41.9+1.32737+0.7
2017 Nurzada Beissen10.95.01.5+0.03649+0.6
2016 Tatiana Fursova11.25.22.2-0.62737+0.6
2015 Jelena Vračević11.85.42.0+0.52437+0.6
2016 Jaf. Vatanparast12.35.61.6+1.13648+0.6
2017 Sergei Kozelkov12.45.71.3+0.136411+0.6
2017 Xiaobin Chen12.65.72.0+1.336411+0.6
2015 Viktor Nechaev13.36.01.9+0.12748+0.6
2017 Kseniia Wang13.86.21.6+1.036410+0.6
2016 Sed. Forootan13.96.31.2+2.8913+0.6
2013 Ek. Mendeleeva14.36.41.4+0.748511+0.6
2013 Timothy Timur9.84.52.1+0.31525+0.5
2017 Volha Uhnachova10.84.91.4+0.23649+0.5
2017 Michelle De Kock10.84.91.9-1.045511+0.5
2017 Chrisy Xiyu Du10.84.92.3-0.345510+0.5
2017 Polina Deviatova 11.05.02.1-0.745513 +0.5
2014 Nasko Stamenov11.65.22.2-0.63044+0.5
2013 Alexander Sigeev11.75.31.7-0.95469+0.5
2013 Emel Alğin11.95.42.0-0.357613+0.5
2013 Sergey Sabaev12.05.41.7-0.1913+0.5
2013 Özge Özşen12.45.62.0+0.236410+0.5
2013 Ayset Yurt Ece13.05.82.6+0.73349+0.5
2016 Som. Haj. Gooki15.16.72.3+2.44559+0.5
2013 Siarhei Seniuk11.25.01.6-0.842513+0.4
2016 Roya Pournejati12.55.51.5+0.233410+0.4
2013 Dursun Eser12.85.61.2+0.636410+0.4
2016 Zahra Yazdgerdi12.85.62.9-0.22437+0.4
2013 Louis D. Heyns13.25.81.6-0.342512+0.4
2016 Afshan Mohajeri14.16.22.8+1.42737+0.4
2013 Hakan Dal16.77.22.3+2.3612+0.4
2017 Murray Chisholm9.74.32.0-1.545513+0.3
2017 D. Permatasari12.05.22.3+0.754614+0.3
2013 Sergey Zelenin12.85.51.5+0.14259+0.3
2015 Kirill Volosnikov12.95.62.3-0.54569+0.3
2017 Qu Yanfu13.15.61.7+0.736411+0.3
2013 Ebru Ataşlar14.96.42.3+1.73649+0.3
2014 Vasilka Krasteva17.17.31.8+1.53954+0.3
2017 Ch.-Eung Ahn9.94.22.5-1.654611+0.2
2017 Li Ying10.24.41.5-0.71826+0.2
2017 Mar. Yavahchova11.95.11.8+0.845514+0.2
2015 Tatiana Besedina12.35.21.9-0.53957+0.2
2016 Mas. Tor. Azad13.35.62.2-0.151612+0.2
2015 Elena Chernova13.45.71.5+0.93648+0.2
2016 Maryam Bahrami14.46.12.1+1.845510+0.2
2017 Florian Koch9.23.92.3-1.745511+0.1
2016 Milad Zangiabadi13.35.52.0+0.62437+0.1
2015 Yury Kartynnik10.44.21.3-0.92738+0.0
2016 M. R. Moghadam11.24.61.2+0.41524+0.0
2017 A. Chervinskaya11.94.91.8+0.045512+0.0
2013 Fatih Akay12.85.21.7-0.342510+0.0
2013 Vladimir Shiltsev13.45.52.6-0.51214+0.0
2015 Drag. Jovković15.46.31.9+2.12747+0.0
2017 T. Gachechiladze10.64.22.2-0.136410-0.1
2017 Bi Jun12.04.81.4+0.83649-0.1
2016 M. D. Aseman13.25.31.7+1.12738-0.1
2013 Marina Sergeeva13.55.41.5+0.351612-0.1
2017 Wang Jin13.55.41.8+1.554613-0.1
2016 A. Poostforush14.85.92.0+2.13957-0.1
2014 Elena Trufanova15.36.12.0+0.53954-0.1
2017 Cao Xuewei10.64.11.7+0.52738-0.2
2017 Li Bin11.14.32.0-0.42738-0.2
2015 Lucija Papa13.15.11.3+0.12135-0.2
2013 Pınar Aytar13.55.31.3+0.230410-0.2
2016 Hassan Eslahi13.65.41.8+0.83649-0.2
2017 Thomas Broger9.83.71.5-1.31826-0.3
2017 Chen Xi10.23.91.9-0.136411-0.3
2015 Oscar Rabinovich12.74.91.7-0.54867-0.3
2017 Song Feng11.84.41.3+0.32739-0.4
2013 Natalia Borodina12.34.61.1+0.03048-0.4
2017 Li Hong12.54.61.5+0.736411-0.5
2013 Necmettin Caner12.84.71.2-0.3913-0.5
2017 Li Peng10.13.21.8-0.52739-0.9
2013 Sertaç Eroğlu11.73.91.9+0.51525-0.9
2013 Sabahattin Esen15.04.81.1+1.6612-1.3
---Average12.35.62.0+0.03649+0.6
---Best records10.07.31.1+0.057614+2.3

  • Color-coded status tags reflect various roles of the IYNT Juror: violet tag is the Juror who acted as Chairperson while green tag is the Juror who did not act as Chairperson; blue tag is the independent Juror while red tag is the Team Leader who acted as Juror;
  • G⟩ is arithmetic mean of all Grades delivered during the IYNT;G⟩=(5+10+15)/3=10 is our target for any Juror to allow for uniform and equal weight grading scales throughout the parallel Groups; a majority of Jurors go above the taget;
  • σG is standard deviation of all Grades delivered during the IYNT; if σG is large, the Juror uses a broader spectrum of Grades and has more differentiation within their evaluation scale; note that σG does not necessarily reflect clearer separation of the Teams, cf. two notable hypothetical extremes of σG=13.7 for the set {30; 1; 1; 30; 1; 1} and σG=11.1 for the set {30; 1; 1; 1; 20; 10} in a two-Team SF; various limits for three types of performances result in a trend that a higher σG is more likely to appear for Jurors with a higher ⟨G⟩; theoretical slope of a trend, or baseline, is σG/⟨G⟩ for the set {30; 20; 10} or 0.40825; parameter κ corrects for this baseline;
  • σG−P is standard deviation of all residuals G−P; if σG−P is small, the Grades are less scattered respective to the Grades of other Jurors and contribute to a smaller ρ; an implausible theoretical minimum is σG−P=0; real-life maximum and minimum records are 3.7 and 1.1;
  • G−P⟩ is arithmetic mean of all residuals G−P; if ⟨G−P⟩ is close to zero, the grading scale is less shifted respective to the individual scales of other Jurors; it is easy to notice moderate statistical noise that hinders the inherent correlation between ⟨G−P⟩ and ⟨G⟩;
  • nG is the number of Grades given; greater nG means better statistics; theoretical caps depend on exact tournament brackets and were nG=60 in 2013, nG=45 in 2014, and nG=54 in 2015, 2016, and 2017, though not accessible even for the Jurors working in each Science Fight due to such constraints as distribution of two-Team Groups;
  • nSF is the number of Science Fights judged; greater nSF means better statistics; a theoretical cap with Semi-Finals is nSF=6;
  • nT is the number of Teams judged; greater nT means more opportunities to observe stronger and weaker Teams and thus have a more comparative judgment; a theoretical cap is number of Teams N but no more than 18; note that some Teams are judged more than once by the same Juror within one IYNT;
  • κ is standard deviation of all delivered Grades corrected for the average Grade ⟨G⟩ via κ=σG−0.40825×⟨G⟩; κ reflects relative width of the spectrum of Grades used by the Juror and can be more suitable for comparison of Jurors with distinctly different ⟨G⟩; note that κ is linearly proportional to the relative value of σG/⟨G⟩.

The names are initially sorted by number of IYNTs judged, then by κ, then by ⟨G⟩. Click on the headers to have the table sorted by any desired parameter.

There is a complex interplay between each of the calculated parameters, and some of the crucial parameters depend not only on what G each Juror gives, but also on what G other Jurors in the same Group give. Other parameters depend on the lot, tournament brackets, or appointing decisions of the General Council, and are beyond control of the individual Juror. Persisting regularities seen for Jurors who worked at more that one IYNT suggest that any shifts in σG or ⟨G⟩, observed consistently in several Jurors, may reflect objective differences between separate IYNTs, viz. in diversity or average strength of participants.

Particular grading preferences for a Juror are sometimes clearly recognizable in separate IYNTs and not obscured by limited statistics. These differences in particular explain why Jurors and Teams rotate between the Groups, and V is a more representative derivative grading parameter than SP.

Whilst the Criterion of Victory V already alleviates any scaling differences, it would allow for extracting further fine-grained data if each future IYNT Juror

  • is comfortable with lower Grades for weaker performances, and therefore stays centered closer to ⟨G⟩=10, with preferably ⟨G⟩<13;
  • at the same time, works in a broader spectrum of high and low Grades, and thus has a larger σG, with preferably σG>6;
  • at the same time, is balanced to have a moderate σG−P, with preferably σG−P<3.

These three goals can naturally clash with each other. At this point it is important to realize that each Juror must be focused only on assessing immediate performances and sticking to uniform, scientific, merit-based grading criteria, and that furthermore each G must be given independently.


Effects of individual grading parameters

To illustrate the potential consequences of the spread in these grading parameters, let us consider a Gedankenexperiment with three Teams competing in one Science Fight.

Team 1 shows a relatively strong performance and receives the Grades which sit on the upper end of the [⟨G⟩−½σG…⟨G⟩+½σG] interval. In other words, should G be distributed normally for each selected Juror, the performances of Team 1 would be better than Ф(⟨G⟩+½σG)=0.69 of all performances the Juror grades in the IYNT. Such a Team would potentially end up as a Finalist.

Team 2 shows an average performance and receives average Grades ⟨G⟩ from each Juror.

Team 3 shows a relatively weak performance and receives the Grades which sit on the lower end of the [⟨G⟩−½σG…⟨G⟩+½σG] interval. In other words, their performances are weaker than very approximately Ф=0.69 of all IYNT performances graded by the Juror. Such a Team would potentially not qualify for Semi-Finals.

These three Teams are graded simultaneously by two boards of selected Jurors. One board is composed of six Jurors with some of the lowest observed ⟨G⟩, while the other board is composed of six Jurors with some of the highest observed ⟨G⟩. It is easy to determine the results of this hypothetical Science Fight because ⟨G⟩ and σG are publicly known for each Juror.



These results illustrate the level of tolerance of the Criterion of Victory V and Sum of Points SP to the most severe effects of improbably unbalanced boards of Jurors. As seen from this calculation, the strong Team 1 graded by low-⟨G⟩ Jurors obtains less points than the average Team 2 graded by high-⟨G⟩ Jurors and ties with the weak Team 3. The weak Team 3 graded by high-⟨G⟩ Jurors, respectively, earns more points than the average Team 2 graded by low-⟨G⟩ Jurors. An artificial selection of Jurors in this test leads to unrealistically small σg−P and ρSP in both boards of Jurors.

There is however no negative effect on the Criterion of Victory V and consequently the results of the Science Fight.

In the next Gedankenexperiment, let us rotate and evenly distribute the same Jurors as routinely made before each real Science Fight.



Though the Jurors give the same Grades G as in the first experiment, their balanced distribution now mitigates the effects on SP of Juror-to-Juror differences in grading. Note that this happens at the cost of increased σg−P and ρSP, which both now fall in the range of typical IYNT values despite an articifial, bimodal distribution of ⟨G⟩. Although we cannot generalize from one example, the respective Sums of Points SP from both boards of Jurors now differ by only 1.5, 0.6, and 0.6 points.

In this extreme value analysis, we test a statistically improbable scenario which would have some of the worst impacts on the stability of Science Fight results. We test extreme values of ⟨G⟩ and the most unrealistic distribution of Jurors, and observe the amplitude of fluctuations in SP which always fall within a 2 points threshold.


Grading parameters of separate IYNTs

The table below provides an overview of statistical parameters for the Grades given by all Jurors within one IYNT, and the overall statistics for all four IYNTs.

YearGσGsJsSV4sTSP4σG−PnGnstnTnJnchV=1V
201312.55.80.110.430.122.0142278164111(96±3)%(99.98±0.03)%
201414.16.70.150.520.202.2423235125(95±3)%(99.99±0.01)%
201513.06.00.100.290.092.496949102710(94±3)%(99.99±0.02)%
201612.36.20.130.580.242.812846916409(91±5)%(99.94±0.08)%
201711.45.40.120.430.182.3194484184816(93±3)%(99.99±0.01)%
All12.35.90.130.460.192.460423036513430(94±4)%(99.98±0.05)%

  • G⟩ is arithmetic mean of all Grades delivered during the IYNT by all Jurors;
  • σG is standard deviation of all Grades delivered during the IYNT by all Jurors;
  • sJ=σGJ/⟨⟨GJ⟩ is relative spread in ⟨G⟩ between all Jurors at the IYNT; lower values correspond to less Juror-to-Juror grading differences;
  • sSV4=σSV4/⟨SV4⟩ is relative spread in the Sum of Victories after SF 4; higher values correspond to stronger diversity between Teams and better separation in the ranking;
  • sTSP4=σTSP4/⟨TSP4⟩ is relative spread in Total Sum of Points after SF 4; higher values correspond to stronger diversity between Teams in terms of points and contribute to better separation in the ranking;
  • σG−P is standard deviation of all residuals G−P during the IYNT;
  • nG is the number of Grades given;
  • nst is the number of Stages in the IYNT;
  • nT is the number of Teams in the IYNT;
  • nJ is the number of individual Jurors in the IYNT;
  • nch is the number of individual Chairpersons in the IYNT;
  • V=1 is the mean and standard deviation of the confidence levels for the interval [SP−2...60] in all SFs;
  • V=½ is the mean and standard deviation of the confidence levels for the interval [SP−10...60] in all SFs.

By comparing sSV4 with sTSP4, we can see that the Criterion of Victory prevents a melting pot effect in TSP where Total Sums of Points can converge to rather similar values for many Teams.


Grading parameters of separate SFs

Teams and Jurors alike rapidly learn from SF to SF. One may argue that the Teams may start showing less diverge performances, or the Jurors may start giving less diverge Grades G.

Confidence levels for the interval [SP−2...60] in each SF, as well as distributions of G, G−P, and SP, are of interest to assess the importance of these effects.



Summary

Overall, the presented results define the extent to which the results of single paired comparisons of SP are not yet obscured by statistical noise. With the available data, we can conclude that the IYNT procedures and in particular the Criterion of Victory V alleviate Group-to-Group and Juror-to-Juror scaling differences, and allow separation of each Team in the IYNT with a two-sigma significance threshold.



Comparative results of real IYNT Teams

Click on the headers to have the table sorted by any desired parameter. The Teams are initially sorted by Criterion of Victory in the Finals (VF), then by Sum of Points in the Finals (SPF), then by Criterion of Victory in the Semi-Finals (VsF), then by Sum of Points in the Semi-Finals (SPsF), then by Sum of Victories after Selective SF 4 (SV4), then by Total Sum of Points after Selective SF 4 (TSP4). Final Rank (RF) and the type of Medal (M) reflect the results of each IYNT according to the regulations valid at the time.

Year Team name SV4 TSP4 VsF SPsF VF SPF RF M
2013Belarus-Universum4177.1143.3150.81
2014Georgia "Georgians"3190.6150.51
2014Bulgaria-Sofia4206.1149.32
2016Georgia-Georgians4172.4145.0146.71
2017New Zealand-Wellington155.4142.8145.81
2015China3163.0140.6145.31
2015Georgia-Georgians165.4141.4145.12
2017Switzerland4177.8146.7144.62
2015Croatia175.6140.7144.23
2014Serbia184.3½46.53
2013Georgia-Raveko3174.4145.1½42.82
2016Belarus-Pahonia4165.1142.5½42.42
2016Croatia4173.5140.1½41.13
2013Turkey-Bahçeşehir PES4173.8½38.3037.23
2013Moldova-Eco Generation154.6141.7036.44
2017China-NFLS LaplaceW/169.2137.1034.13
2013Russia-TMOLimpiycy167.5140.07
2016Iran-Gifted133.3138.84
2013Bulgaria-Bulgaria3170.0½39.16
2015Serbia-1174.3½39.04
2013Russia-MG 123160.1½39.08
2017Georgia-Georgians4153.6½37.54
2016Iran-Khashayar2115.5½37.48
2013Russia-RLC3173.0½37.15
2017Croatia3153.8½36.56
2015Serbia-2154.4½36.06
2013Afghanistan-Ariana3155.5½35.89
2016Iran-Mehr155.4½35.75
2017Bulgaria3140.3½33.97
2015Belarus-Spectrum3174.4½32.95
2016Iran-Maple145.3½31.47
2017China-Beijing RDFZ157.0½28.85
2017Indonesia-Labsky3155.5034.08
2013Bulgaria-Science Girls3153.4032.510
2017China-NFLS Unique3154.2032.39
2016Russia-Voronezh3147.7030.46
2016Iran-Besat 1113.0030.29
2014Russia "Vinegret"146.44
2014Bulgaria-Kyustendil "R/"½113.45
2016Iran-Black Intelligence3161.910
2013Turkey-Fatih Eskişehir146.911
2017China-Qingdao No. 2143.610
2015Bulgaria140.97
2017Belarus-Pahonia133.511
2013Kazakhstan-NIS D/team130.315
2015Russia-Voronezh-12161.68
2015Russia-Voronezh-22143.39
2017Russia-Novosib/ 12FM2126.212
2017China-Shenzhen 2 M/S2121.313
2013Iran-Iran145.912
2017Russia-Voron/ Izolenta130.314
2017Russia-Dolgoprudny 5th129.715
2017China-Shenzhen 1 M/S119.816
2016Iran-Besat 2116.711
2015Russia-MG 121138.410
2013Kyrgyzstan-Alatoo1137.613
2013Ukraine-Richelieu1137.514
2017Iran-AYIMI1112.417
2016Iran-Free Thought188.512
2016Iran-Amordad½118.213
2016Iran-Paramount Notion½106.414
2016Iran-Farhang½85.215
2016Iran-Velayat½76.216
2013Turkey-Samanyolu0107.916
2017Kazakhstan-Bobek.kz068.318