Basic principles

In all Science Fights, the Jury evaluates the Team performances by publicly showing integer scores called Grades or G.

The Grades G from all n Jurors in a Group are used to calculate the Average Point P. Two extreme Grades, one maximum and one minimum, are replaced with one grade equal to their arithmetic mean. In the next step, P is determined as the arithmetic mean of the new data set of n−1 grades. This procedure has the advantage of weighting outliers less heavily. P is rounded to the nearest 0.1 points. An example below (see a pdf file) illustrates this procedure with some real data from the 2nd IYNT 2014 (2014-1-A-II-Rep.)

Each Team completes three performances (Reporter, Opponent, and Reviewer) in each Science Fight and earns three Average Points P which are summed up to obtain the Sum of Points SP. In case the Team obtains a Yellow Card, the SP is consequently reduced.

The values of SP in each Group are used to calculate the Criterion of Victory V which is set to V=1 for the Team with the highest SP and for one or two Teams which have an SP that differs from the top result by no more than 2 points (SPSPmax−2.) For the Teams in the Group which have (SPmax−10)≤SP<(SPmax−2), the Criterion of Victory is set to V=½. For the Teams which have SP<SPmax−10, V=0. The Criterion of Victory is the primary parameter that determines the placing and rank of each Team. It minimizes effects of statistical noise and Juror-to-Juror differences in grading.

Distributions of G and P

During the IYNTs 2013 through 2019 taken together, 10290 Grades were delivered in 471 stages. Each of the 471×3=1413 performances therefore obtained its Grades G from an average of n=7.3 Jurors. A total of 165 round-robin meetings (Science Fights in a Group) were played with three or two Teams each. In them, the Teams collected 471 values of SP and 471 values of V.

The graph (hi-res image, raw ASCII data) shows a fitted histogram of the Grades for each type of performance (Report, Opposition, and Review.)

The spread in the G along the X-axis is broad, indicating that nearly a whole spectrum of available G is used by the Jury, however the extreme G are inherently less frequent. We encourage each Juror to stay centered around 15 points for a Report, around 10 points for an Opposition, and around 5 points for a Review, each time weighting their G against what they believe an average IYNT performance is. In reality however the actual mean and standard deviation are 17.6±5.3 for a Report, 12.2±3.5 for an Opposition, and 6.6±1.8 for a Review, indicating that an average Juror shifts all their G to the right as compared to our guidelines. The distributions are moderately asymmetric, with the median Grades of 18 for a Report, 12 for an Opposition, and 7 for a Review.

In turn, the mean and standard deviation for the Average Points P are 17.6±4.2 for a Report, 12.1±2.8 for an Opposition, and 6.6±1.4 for a Review (raw ASCII data.) Such distributions of P are narrower than the respective distributions of G.

Spread of the Grades

At this point it is important to realize that the IYNT requires a direct comparison of results from parallel Groups, whilst each Team does not play every other Team during one Science Fight. We should therefore be aware of the extent to which parallel boards of Jury can be influenced by fluctuations and what grading parameters are uniformly objective indications of the relative strengths of all IYNT contestants.

A consistent grading is extremely important for the IYNT as it must allow reliable identification of winners in each Group and the ultimate winners of the competition.

Let us consider three hypothetical and extreme scenarios, Example 1, Example 2, and Example 3.

In the Example 1, all Jurors agree with each other and there is high confidence that Average Points P reflect objective differences between the two Teams. For each Juror and each Grade, G−P=0.

The Example 2 shows a situation where both Teams obtain equal Grades from the middle of the spectrum. Although it is natural that some games can end in a tie, this scenario is less advantageous. If no Team can impress the Jury or the Jury lacks sensitivity to hidden variances between the Teams, it is difficult to rank all Teams from top to bottom.

The Example 3 depicts an undesired event where different Jurors use radically different grading criteria. Team 1 earns 0.1 points more than Team 2, however the level of confidence to this difference is low because σG−P, or the spread in G given to one performance by several Jurors, is much wider than these eventual 0.1 points. Albeit both Average Points P are nearly equal to the pair of P in the Example 2, these two P do not reflect the serious differences between the Teams that each Juror has noticed and highlighted in their grading. Concluding that Team 1 shows a better performance than Team 2, or either Team in Example 2, is inconsistent with this set of Grades G.

Our aim is that separate Jurors put very similar Grades for one performance, however each Juror puts distinctly different Grades for different performances.

Grading criteria

During each IYNT, we brief Jurors and Teams on our grading and evaluation criteria. Our guidelines have evolved since 2013, and as of now consist of four partial grading criteria. Our aim is to keep the guidelines clear and simple, and make sure that any Juror relies on the fixed, common criteria when evaluating performances across the parallel Groups. The criteria are printed directly on the individual Juror's protocols.

The Jurors are asked to add or subtract points from a starting grade (15, 10, or 5) and decide on their final Grade G. Such a decision is individual, and an upwards of 99.8% performances cause come disagreement in the Grades given by Jurors and a spread of these Grades. No Grade can be corrected retroactively, and each Juror must justify any of their Grades upon the request of Team Captains or the Chairperson. Each G is public.

Find below the blank Juror's protocols used at the 7th IYNT 2019, as well as the slides from the most recent introductory briefing for Jurors and Teams.

  • Blank individual Juror's Protocol, A4 size (2019/07/30) [pdf]
  • Briefing for Jurors and Teams, slides by Ilya Martchenko (2019/08/19) [pdf]

Distributions of G−P

It is now good to look at the real data from all previous IYNTs and analyze the spread in the G given by different Jurors to one performance in one Science Fight.

The graph (hi-res image, raw ASCII data) shows each of the 10290 Grades G given during all previous IYNTs. X-coordinate indicates which of the 30 possible Grades G was given. Y-coordinate indicates the difference between this particular G and the Average Point P that was calculated on its basis.

To interpret the spread in the Y-axis, one should remember that if Example 1 would take place in each IYNT Stage, each G would be equal to its respective P, and G−P would globally collapse to zero. Luckily, the IYNT is not a paper-and-pencil exam, and its Jurors have opinions which result in a distribution of individual G around the P in each Stage.

Despite being statistically unlikely, extreme G−P occur at the IYNT as well. The table below reflects each of the 16 rare events wherein G−P was equal or exceeded 10 points in absolute value.

2015-F-A-II722.0-15.0Andrei KlishinCroatia[pdf]
2016-2-E-I1022.1-12.1Danko MarušićIran-Khashayar---
2015-1-A-III2614.3+11.7Dušan RadovanovićSerbia-2[pdf]
2019-1-C-I920.6-11.6Domagoj PluščecRomania-Sol. Sq.[pdf]
2016-4-A-I516.3-11.3Ilya MartchenkoIran-Amordad---
2016-3-C-II1121.9-10.9Danko MarušićIran-Maple[pdf]
2019-F-A-III1323.5-10.5Giorgi KhomerikiNew Zealand[pdf]
2019-1-G-I2918.5+10.5Hanna KarpenkaChina-PML---
2018-F-A-III1323.5-10.5Ivan SyulzhynSwitzerland[pdf]
2016-2-D-III717.5-10.5Zahra YazdgerdiCroatia---
2019-S-B-III515.4-10.4Florian KochChina-PML[pdf]
2016-4-A-III2514.6+10.4Roya RadgoharGeorgia---
2016-2-A-I1121.2-10.2Giorgi KhomerikiIran-Mehr[pdf]
2019-S-B-II2413.9+10.1Al. FalchevskayaGreece-Anat. Col.[pdf]
2019-1-G-III1222.0-10.0Mladen MatevRussia-Moonlight[pdf]
2016-1-A-I1020.0-10.0Andrei KlishinGeorgia[pdf]

The standard deviations of the distributions of G−P for the three types of performances are found as follows: 3.21 for a Report, 2.18 for an Opposition, and 1.21 for a Review. These particular values are calculated with two extreme G in each Group taken with the weight of 1, rather than ½.

Statistical significance of SF results

Since the extreme G contribute to statistics of P with the weight of ½, we can prepare the working dataset of 10290−3×471=8877 processed grades that we label g (raw ASCII data for 2013, 2014, 2015, 2016, and 2017.)

g=G if the respective G is not extreme in the Group, and g=½(Gmax+Gmin) for the pairs of extreme Gmax and Gmin in each Group. The statistical parameters of the distributions of residuals g−P now provide crucial information on the statistical significance of each P, and in turn SP, and the IYNT rankings.

In case the SF results do not permit rejecting the null hypothesis that a slightly higher SP in a round-robin Science Fight is observed by chance, more than one Team earns V=1. The Sum of Victories SV keeps track of such statistically significant cases wherin one or several SF winners step forward.

We can define the significance of V=1 as the level of statistical confidence for the interval [SP−2...60] in one Science Fight Group. This statistical significance depends only on the values of g−P and number of Jurors n in the Group, and does not directly depend on the absolute magnitudes of G. This has the advantage of placing focus on congruence between opinions of Jurors, rather than ranking of Teams.

To illustrate how to compute the confidence of [SP−2...60] and [SP−10...60], let us analyze in depth one example of a round-robin Science Fight (2016-F-A, see a hi-res pdf file.)

The standard deviations σ of the distributions of g−P for each of three types of performances in this SF are found as follows: σREP=1.91 for a Report, σOPP=1.36 for an Opposition, and σREV=0.65 for a Review (each from a sample of 27 residuals g−P.) As per the IYNT procedure, SP is calculated as a sum of Average Points P for these three performances. By assuming σ2SP=σ2REP+σ2OPP+σ2REV, we can easily find σSP=2.43 in these Finals of the 4th IYNT 2016. It is now easy to determine the root-mean-square deviation ρSP by defining ρ2SP=σ2SP/(n−1), where number of Jurors is n=10.

The value of ρSP=0.81 evaluates the standard error of the mean, i.e. the statistical uncertainty of SP earned by any Team in the SF. This standard error is inherent in estimating whatever true value of SP from limited statistics. If the difference between any two sample-based SPi and SPj is comparable to or less than ρSP, they can be assumed statistically indistinguishable.

In the next step, we can find the confidence level for interval [SP−2...60] as a function of the number of degrees of freedom in a representative sample (i.e. n−1) and Student's t-score (i.e. 2/ρSP.) This is based on assuming that Student's t-distribution is a valid approximation.

This calculation yields the confidence level of 98.2% for the interval [SP−2...60], above a two-sigma significance threshold. In the Finals of other previous IYNTs, confidence levels for the interval [SP−2...60] were 99.1% in 2019, 95.9% in 2018, 96.1% in 2017, 94.8% in 2015, 98.8% in 2014, and 98.2% in 2013. A similar calculation yields the confidence level of 99.99999% for the interval [SP−10...60] in 2017 (above five-sigma), 99.99997% in 2016 (above five-sigma), and 99.99986% in 2015, 99.99971% in 2014, and 99.99827% in 2013 (above four-sigma in each case.)

The Table below summarizes the paramateres calculated in the similar manner for all 165 round-robin Science Fights of past IYNTs. We cordially acknowledge Hieorhi Liaśnieŭski and Dmitriy Baranov for their help in processing the data. Click on the headers to have the table sorted by any desired parameter.


In the IYNTs 2013 through 2019, there were 203 instances of V=1, 170 instances of V=½, and 98 instances of V=0 (including the 1st IYNT 2013 retrospectively.) If compared to the total number of round-robin Science Fights (165), these figures suggest that in (203−165)/165=23% of cases, second or third Teams in a Science Fight are indistinguishable from the Team with the maximum SP and also earn their V=1 (raw ASCII data.)

In a typical Science Fight, earning a V=½ or above is a four-sigma event, with an expected confidence level of (99.98±0.04)% for the interval [SP−10...60]. Earning a much more rewarding V=1 is a two-sigma event, with an expected confidence level of (95±4)% for the interval [SP−2...60]. Earning each V is a statistically independent event, and collecting several V=1 further contributes to the confidence in the placing of top IYNT Teams.

These results justify the importance of the Criterion of Victory V and importance of the fact that no Science Fight has ever been graded by less than 5 Jurors (while 90% of SFs, a total of 148, have been graded by 6 jurors or more.) As argued below, besides having different opinions, individual Jurors may also work on different grading scales. At all times, when looking at the IYNT scores, we ask whether their difference is representative of a real difference between Teams or whether it is a statistical fluke. An especially high level of significance is demanded if grading parameters are used to resolve the placing of eventual Semi-Finalists and Finalists.

By assuming that the grading parameters of Jurors would not improve considerably before the 8th IYNT 2020, we may estimate the average expected level of confidence for [SP−2...60] as a function of the number of Jurors n randomly selected to one Group. To do so, we can re-calculate ρSP from a historically global σSP.

Grading parameters of individual Jurors

For reference purposes, this table summarizes the individual grading parameters (within one IYNT) of all existing 209 Jurors. We cordially acknowledge Dmitriy Baranov and Hieorhi Liaśnieŭski for their help in processing the data. The presented parameters give a glimpse of the Jurors' perceptions of the grading scale in the IYNT. Click on the headers to have the table sorted by any desired parameter.

As of 2019, following 9 Jurors are ex-participants: Anton Khvalyuk (2013), Giorgi Tsereteli (2013), Ekaterine Dadiani (2014), Giorgi Khomeriki (2014), Diana Sokhashvili (2015), Toma Rtveliashvili (2015), Sofia Anisimova (2016), Sofiya Guzik (2016), Ekaterina Rosnovskaia (2018.)

Yr, StNameGσGσG−PG−PnGnSFnTκ
2019 Ilya Martchenko
2018 Ilya Martchenko
2017 Ilya Martchenko
2016 Ilya Martchenko
2015 Ilya Martchenko
2014 Ilya Martchenko
2013 Ilya Martchenko
2019 Mladen Matev
2018 Mladen Matev
2017 Mladen Matev
2016 Mladen Matev
2015 Mladen Matev
2014 Mladen Matev
2013 Mladen Matev
2019 Evgeny Yunosov
2018 Evgeny Yunosov
2017 Evgeny Yunosov
2016 Evgeny Yunosov
2015 Evgeny Yunosov
2014 Evgeny Yunosov
2019 Milen Kadiyski
2018 Milen Kadiyski
2017 Milen Kadiyski
2015 Milen Kadiyski
2014 Milen Kadiyski
2019 Andrey Kravtsov
2018 Andrey Kravtsov
2017 Andrey Kravtsov
2015 Andrey Kravtsov
2014 Andrey Kravtsov
2018 Alena Kastenka
2017 Alena Kastenka
2016 Alena Kastenka
2015 Alena Kastenka
2019 Giorgi Khomeriki
2018 Giorgi Khomeriki
2017 Giorgi Khomeriki
2016 Giorgi Khomeriki
2017 Andrei Klishin
2016 Andrei Klishin
2015 Andrei Klishin
2018 Dina Izadi
2016 Dina Izadi
2013 Dina Izadi
2018 Ivan Syulzhyn
2017 Ivan Syulzhyn
2016 Ivan Syulzhyn
2018 Val. Lobyshev
2017 Val. Lobyshev
2013 Val. Lobyshev
2018 Stan. Krasulin
2017 Stan. Krasulin
2016 Stan. Krasulin
2019 Florian Koch
2018 Florian Koch
2017 Florian Koch
2019 Volha Uhnachova
2018 Volha Uhnachova
2017 Volha Uhnachova
2019 Murray Chisholm
2018 Murray Chisholm
2017 Murray Chisholm
2015 Gur. Mikaberidze
2014 Gur. Mikaberidze
2013 Gur. Mikaberidze
2017 Danko Marušić
2016 Danko Marušić
2015 Danko Marušić
2017 Dmitriy Agarkov
2014 Dmitriy Agarkov
2013 Dmitriy Agarkov
2018 Klim Sladkov
2017 Klim Sladkov
2019 Domagoj Pluščec
2018 Domagoj Pluščec
2019 Giorgi Bakhtadze
2018 Giorgi Bakhtadze
2018 Marc Bitterli
2017 Marc Bitterli
2018 A. Chervinskaia
2017 A. Chervinskaia
2018 Alexandr Nadeev
2017 Alexandr Nadeev
2018 Kseniia Wang
2017 Kseniia Wang
2016 Dmitry Zhukalin
2015 Dmitry Zhukalin
2015 Aleks. Dimić
2014 Aleks. Dimić
2019 Usev. Gaponenko
2018 Usev. Gaponenko
2019 Irina Valtcheva
2018 Irina Valtcheva
2017 Wang Sihui
2015 Wang Sihui
2019 Ivan Klimenko
2018 Ivan Klimenko
2018 Dmitrii Dorofeev
2016 Dmitrii Dorofeev
2019 Nasko Stamenov
2014 Nasko Stamenov
2019 K. Turekhanova
2018 K. Turekhanova
2019 Aleks. Zubankov
2018 Aleks. Zubankov
2019 Nik. Karavasilev
2018 Nik. Karavasilev
2016 Nika Sabashvili
2015 Nika Sabashvili
2019 Yury Kartynnik
2015 Yury Kartynnik
2018 Mar. Yavahchova
2017 Mar. Yavahchova
2019 Gerg. Visarieva
2018 Gerg. Visarieva
2019 Oscar Rabinovich
2015 Oscar Rabinovich
2019 Sofia Anisimova
2018 Sofia Anisimova
2019 Li Peng
2017 Li Peng
2018 B. Tevdorashvili
2016 N. Seliverstova
2016 Nikita Datsuk
2016 Som. Mahmoodi
2019 Yulia Yuts
2015 D. Radovanović
2018 Eduard Stefanov
2019 Z. Jelić Matošević
2016 Jalil Sedaghat
2015 Ivan Reznikov
2019 Anton Golyshev
2018 Andria Rogava
2018 Giorgi Tsereteli
2016 Samuel Byland
2019 Darya Snitavets
2013 Igor Evtodiev
2018 Rich. Fitzpatrick
2018 Niko Giorgadze
2013 Alina Astakhova
2019 Katerina Gotina
2013 Naime Arslan
2016 Ahmad Sheikhi
2013 Ismail Kiran
2016 Af. Montakhab
2015 Aleks. Suvorova
2016 Roya Radgohar
2016 Azizolah Azizi
2018 N. Tsimakuridze
2018 Manon Geijsen
2013 Celalettin Baykul
2018 Z. Vardanashvili
2013 Jeyhun Jabarov
2019 Hanna Karpenka
2016 Laura Guerrini
2013 Jevhen Olijnyk
2019 Valeria Burianova
2018 Domagoj Gajski
2018 Kirill Belyaev
2018 Tsotne Dadiani
2013 Ersin Karademir
2019 Maria Karaveli
2019 Konst. Vourlias
2019 Ap. Michaloudis
2016 Marzieh Afkhami
2016 M. Sadat Tahami
2019 Ivan Bosko
2014 D. Karashanova
2017 Gal. Onoprienko
2018 Ekaterine Dadiani
2017 Song Yi
2013 Diana Kovtunova
2013 Ahmet Çabuk
2013 Antoan. Nikolova
2015 Vesna Vasić
2019 Daniil Mnevets
2018 Vsev. Zhdanov
2019 Nastassia Lishai
2017 Luisa Schrempf
2018 Daria Vetoshkina
2017 Wang Lin
2013 Aliaks. Mamoika
2019 Denis Kolchanov
2018 Ilia Lomidze
2013 Vlad. Vanovskiy
2019 Artem Glukharev
2015 Dušan Dimić
2019 Pavel Kviatko
2018 Anton Khvalyuk
2013 Buras Boljiev
2018 Elena Zhurikova
2019 Artemii Kolenko
2017 Sergei Kozelkov
2017 Liu Lisa
2019 Anna Nikitina
2016 Ban. Rastegari
2017 Nurzada Beissen
2019 Phyllis Barth
2019 Zekai Wang
2015 Jelena Vračević
2017 Xiaobin Chen
2015 Viktor Nechaev
2016 Sed. Forootan
2013 Ek. Mendeleeva
2016 Som. Haj. Gooki
2013 Timothy Timur
2018 Grigol Peradze
2017 Michelle De Kock
2017 Chrisy Xiyu Du
2017 Polina Deviatova +0.5
2016 Tatiana Fursova
2019 Ek. Rosnovskaia
2013 Alexander Sigeev
2018 Diana Sokhashvili
2013 Emel Alğin
2013 Sergey Sabaev
2019 Vasileios Nousis
2016 Jaf. Vatanparast
2013 Özge Özşen
2019 Effimia Mavidou
2019 Svet. Izraileva
2013 Ayset Yurt Ece
2019 Tatiana Lyalina
2016 Afshan Mohajeri
2019 Elena Koida
2013 Siarhei Seniuk
2019 M.-N. Parpalea
2016 Roya Pournejati
2013 Dursun Eser
2016 Zahra Yazdgerdi
2013 Louis D. Heyns
2013 Hakan Dal
2018 Y. M. Sefidkhani
2018 Nat. Semenikhina
2019 H. Chubarova
2017 D. Permatasari
2019 Al. Falchevskaya
2019 Todor Todorov
2013 Sergey Zelenin
2015 Kirill Volosnikov
2013 Ebru Ataşlar
2014 Vasilka Krasteva
2017 Ch.-Eung Ahn
2017 Li Ying
2018 Pavel Kostryukov
2015 Tatiana Besedina
2017 Qu Yanfu
2015 Elena Chernova
2019 Kats. Korsak
2019 Aksana Hubich
2019 Bauyr. Smailov
2016 Maryam Bahrami
2016 Mas. Tor. Azad
2016 Milad Zangiabadi
2019 C.-F. Tanasescu
2016 M. R. Moghadam
2013 Fatih Akay
2016 M. D. Aseman
2013 Vladimir Shiltsev
2015 Drag. Jovković
2017 T. Gachechiladze
2018 Ramaz Khomeriki
2017 Bi Jun
2013 Marina Sergeeva
2016 Hassan Eslahi
2018 Saba Kharabadze
2014 Elena Trufanova
2019 Vera Somova
2017 Chen Xi
2017 Cao Xuewei
2018 Sergei Yudin
2019 Sofiya Guzik
2015 Lucija Papa
2013 Pınar Aytar
2017 Wang Jin
2016 A. Poostforush
2017 Thomas Broger
2017 Li Bin
2019 T. Rtveliashvili 12.14.652.0-0.82739-0.3
2019 Artsiom Bury
2017 Song Feng
2013 Natalia Borodina
2017 Li Hong
2013 Necmettin Caner
2019 Hanna Bareika
2013 Sertaç Eroğlu
2013 Sabahattin Esen
---Best records10.07.61.0+0.057616+2.3

  • Color-coded status tags reflect various roles of the IYNT Juror: violet tag is the Juror who acted as Chairperson while green tag is the Juror who did not act as Chairperson; blue tag is the independent Juror while red tag is the Team Leader who acted as Juror;
  • G⟩ is arithmetic mean of all Grades delivered during the IYNT;G⟩=(5+10+15)/3=10 is our target for any Juror to allow for uniform and equal weight grading scales throughout the parallel Groups; a majority of Jurors go above the taget;
  • σG is standard deviation of all Grades delivered during the IYNT; if σG is large, the Juror uses a broader spectrum of Grades and has more differentiation within their evaluation scale; note that σG does not necessarily reflect clearer separation of the Teams, cf. two notable hypothetical extremes of σG=13.7 for the set {30; 1; 1; 30; 1; 1} and σG=11.1 for the set {30; 1; 1; 1; 20; 10} in a two-Team SF; various limits for three types of performances result in a trend that a higher σG is more likely to appear for Jurors with a higher ⟨G⟩; theoretical slope of the trend, or baseline, is σG/⟨G⟩ for the set {30; 20; 10} or 0.40825; parameter κ corrects for this baseline;
  • σG−P is standard deviation of all residuals G−P; if σG−P is small, the Grades are less scattered respective to the Grades of other Jurors and contribute to a smaller ρ; an implausible theoretical minimum is σG−P=0; real-life maximum and minimum records are 3.7 and 1.0;
  • G−P⟩ is arithmetic mean of all residuals G−P; if ⟨G−P⟩ is close to zero, the grading scale is less shifted respective to the individual scales of other Jurors; it is easy to notice moderate statistical noise that hinders the inherent correlation between ⟨G−P⟩ and ⟨G⟩;
  • nG is the number of Grades given; greater nG means better statistics; theoretical caps depend on exact tournament brackets and were nG=60 in 2013, nG=45 in 2014, and nG=54 in 2015...2019, though often not accessible even for the Jurors working in each Science Fight due to such constraints as distribution of two-Team Groups;
  • nSF is the number of Science Fights judged; greater nSF means better statistics; a theoretical cap with Semi-Finals is nSF=6;
  • nT is the number of Teams judged; greater nT means more opportunities to observe stronger and weaker Teams and thus have a more comparative judgment; a theoretical cap is number of Teams N but no more than 18; note that some Teams are judged more than once by the same Juror within one IYNT;
  • κ is standard deviation of all delivered Grades corrected for the average Grade ⟨G⟩ via κ=σG−0.40825×⟨G⟩; κ reflects relative width of the spectrum of Grades used by the Juror and can be more suitable for comparison of Jurors with distinctly different ⟨G⟩; note that κ is linearly proportional to the relative value of σG/⟨G⟩.

The names are initially sorted by number of IYNTs judged, then by κ, then by ⟨G⟩. Click on the headers to have the table sorted by any desired parameter.

There is a complex interplay between each of the calculated parameters, and some of the crucial parameters depend not only on what G each Juror gives, but also on what G other Jurors in the same Group give. Other parameters depend on the lot, tournament brackets, or appointing decisions of the General Council, and are beyond control of the individual Juror. Persisting regularities seen for Jurors who worked at more that one IYNT suggest that any shifts in σG or ⟨G⟩, observed consistently in several Jurors, may reflect objective differences between separate IYNTs, viz. in diversity or average strength of participants.

Particular grading preferences for a Juror are sometimes clearly recognizable in separate IYNTs and not obscured by limited statistics. These differences in particular explain why Jurors and Teams rotate between the Groups, and V is a more representative derivative grading parameter than SP.

Whilst the Criterion of Victory V already alleviates scaling differences, it would allow for extracting further fine-grained data if each future IYNT Juror

  • is comfortable with lower Grades for weaker performances, and therefore stays centered closer to ⟨G⟩=10, with preferably ⟨G⟩<13;
  • at the same time, works in a broader spectrum of high and low Grades, and thus has a larger σG, with preferably σG>6;
  • at the same time, is balanced to have a moderate σG−P, with preferably σG−P<3.

These three goals can naturally clash with each other. At this point it is important to realize that each Juror must be focused only on assessing immediate performances and sticking to uniform, scientific, merit-based grading criteria, and that furthermore each G must be given independently.

Composition of Jury

YearInt'lLocalTLsTotalChairsExperienced% Experienced

Juror retention and Zipf's law


Effects of individual grading parameters

To illustrate the potential consequences of the spread in these grading parameters, let us consider a Gedankenexperiment with three Teams competing in one Science Fight.

Team 1 shows a relatively strong performance and receives the Grades which sit on the upper end of the [⟨G⟩−½σG…⟨G⟩+½σG] interval. In other words, should G be distributed normally for each selected Juror, the performances of Team 1 would be better than Ф(⟨G⟩+½σG)=0.69 of all performances the Juror grades in the IYNT. Such a Team would potentially end up as a Finalist.

Team 2 shows an average performance and receives average Grades ⟨G⟩ from each Juror.

Team 3 shows a relatively weak performance and receives the Grades which sit on the lower end of the [⟨G⟩−½σG…⟨G⟩+½σG] interval. In other words, their performances are weaker than very approximately Ф=0.69 of all IYNT performances graded by the Juror. Such a Team would potentially not qualify for Semi-Finals.

These three Teams are graded simultaneously by two boards of selected Jurors. One board is composed of six Jurors with some of the lowest observed ⟨G⟩, while the other board is composed of six Jurors with some of the highest observed ⟨G⟩. It is easy to determine the results of this hypothetical Science Fight because ⟨G⟩ and σG are publicly known for each Juror.

These results illustrate the level of tolerance of the Criterion of Victory V and Sum of Points SP to the most severe effects of improbably unbalanced boards of Jurors. As seen from this calculation, the strong Team 1 graded by low-⟨G⟩ Jurors obtains less points than the average Team 2 graded by high-⟨G⟩ Jurors and ties with the weak Team 3. The weak Team 3 graded by high-⟨G⟩ Jurors, respectively, earns more points than the average Team 2 graded by low-⟨G⟩ Jurors. An artificial selection of Jurors in this test leads to unrealistically small σg−P and ρSP in both boards of Jurors.

There is however no negative effect on the Criterion of Victory V and consequently the results of the Science Fight.

In the next Gedankenexperiment, let us rotate and evenly distribute the same Jurors as routinely made before each real Science Fight.

Though the Jurors give the same Grades G as in the first experiment, their balanced distribution now mitigates the effects on SP of Juror-to-Juror differences in grading. Note that this happens at the cost of increased σg−P and ρSP, which both now fall in the range of typical IYNT values despite an articifial, bimodal distribution of ⟨G⟩. Although we cannot generalize from one example, the respective Sums of Points SP from both boards of Jurors now differ by only 1.5, 0.6, and 0.6 points.

In this extreme value analysis, we test a statistically improbable scenario which would have some of the worst impacts on the stability of Science Fight results. We test extreme values of ⟨G⟩ and the most unrealistic distribution of Jurors, and observe the amplitude of fluctuations in SP which always fall within a 2 points threshold.

Grading parameters of separate IYNTs

The table below provides an overview of statistical parameters for the Grades given by all Jurors within one IYNT, and the overall statistics for all previous IYNTs.


  • G⟩ is arithmetic mean of all Grades delivered during the IYNT by all Jurors;
  • σG is standard deviation of all Grades delivered during the IYNT by all Jurors;
  • σJ=σGJ is standard deviation of all averages ⟨G⟩ for all Jurors; lower values correspond to less Juror-to-Juror scaling differences;
  • σG−P is standard deviation of all residuals G−P during the IYNT;
  • nG is the number of Grades given;
  • nst is the number of Stages in the IYNT;
  • nT is the number of Teams in the IYNT;
  • nJ is the number of individual Jurors in the IYNT (totals for unique Jurors);
  • nch is the number of individual Jury Chairpersons in the IYNT (totals for unique Chairpersons);
  • V=1 is the mean and standard deviation of the confidence levels for the interval [SP−2...60] in all SFs, in %;
  • V=½ is the mean and standard deviation of the confidence levels for the interval [SP−10...60] in all SFs, in %.

Grading parameters of separate SFs

Teams and Jurors alike rapidly learn from SF to SF. One may argue that the Teams may start showing less diverge performances, or the Jurors may start giving less diverge Grades G.

Confidence levels for the interval [SP−2...60] in each SF, as well as distributions of G, G−P, and SP, are of interest to assess the importance of these effects.


Overall, the presented results define the extent to which the results of single paired comparisons of SP are not yet obscured by statistical noise. With the available data, we can conclude that the IYNT procedures and in particular the Criterion of Victory V alleviate Group-to-Group and Juror-to-Juror scaling differences, and allow separation of each Team in the IYNT with a well-defined significance threshold.

The Criterion of Victory prevents a melting pot effect in TSP where Total Sums of Points can converge to rather similar values for many Teams.

Comparative results of real IYNT Teams

Click on the headers to have the table sorted by any desired parameter. The Teams are initially sorted by Criterion of Victory in the Finals (VF), then by Sum of Points in the Finals (SPF), then by Criterion of Victory in the Semi-Finals (VsF), then by Sum of Points in the Semi-Finals (SPsF), then by Sum of Victories after Selective SF 4 (SV4), then by Total Sum of Points after Selective SF 4 (TSP4). Final Rank (RF) and the type of Medal (M) reflect the results of each IYNT according to the regulations valid at the time.

Year Team name SV4 TSP4 VsF SPsF VF SPF RF M
2014Georgia "Georgians"3190.6150.51
2019New Zealand4188.5145.1147.91
2017New Zealand-Wellington155.4142.8145.81
2013Turkey-Bahçeşehir PES4173.8½38.3037.23
2013Moldova-Eco Generation154.6141.7036.44
2017China-NFLS LaplaceW/169.2137.1034.13
2019Bulgaria-Sofia A3175.8145.55
2018New Zealand175.1½40.54
2013Russia-MG 123160.1½39.08
2019Romania-Sol/ Sq/4173.2½37.64
2018Russia-12 FM150.4½34.66
2017China-Beijing RDFZ157.0½28.85
2013Bulgaria-Science Girls3153.4032.510
2017China-NFLS Unique3154.2032.39
2016Iran-Besat 1113.0030.29
2019Greece-Anatolia C/168.3023.47
2018Iran-AYIMI 23127.4018.79
2014Russia "Vinegret"146.44
2014Bulgaria-Kyustendil "R/"½113.45
2016Iran-Black Intelligence3161.910
2013Turkey-Fatih Eskişehir146.911
2017China-Qingdao No. 2143.610
2013Kazakhstan-NIS D/team130.315
2017Russia-Novosib/ 12FM2126.212
2017China-Shenzhen 2 M/S2121.313
2019Greece-Anatolia HS154.314
2017Russia-Voron/ Izolenta130.314
2017Russia-Dolgoprudny 5th129.715
2017China-Shenzhen 1 M/S119.816
2016Iran-Besat 2116.311
2015Russia-MG 121137.410
2018Russia-12 kids of sci/1113.610
2016Iran-Free Thought188.512
2018Russia-Voronezh Kvant/½114.611
2018Iran-AYIMI 1½114.112
2016Iran-Paramount Notion½106.414
2019Russia-13 Element½102.321
2018Russia-Easy Science0113.013
2018Bulgaria-Awkw/ Turtles091.015