You are here: Home / FLL / Resources / 2005 / Comments on Judging By Fred

Comments on Judging By Fred

Fred Rose is the Program Director from the Minnesota FLL program with over 200 teams and he wrote these comments on judging after some of their 2005 regional qualifying tournaments. I have many of the same feelings, concerns, and thoughts that Fred describes in his message. I think it would be helpful if everyone took the time to read what Fred wrote and think about what he has said. I have immense respect for Fred and appreciate his contribution to FLL and him taking the time to write up his comments.

Date: Thu, 08 Dec 2005 17:23:06 +0000
Subject: Comments on judging (long)

I have some comments on the judging/scoring/award process. This is very long, and probably more than I have ever said publicly about judging before, and probably more than you want to know, but probably worth reading. At the end is something about where the value in this program lies.

Please read the scoring description “What the Heck do My Scores Mean” under downloads. This explains the process pretty thoroughly, and I am not going to repeat that here. In the email below, this document is referred to as the scoring document. You really need to read this to understand the scoring and award system. Then we can discuss it.

I’m always available by email. You may not always like my response, but I will always respond. :-)

One of my standard responses to constructive criticism is along these lines, namely this is a volunteer program, and things get done by someone with a passion to make a change and add some value to the program, so if you feel strongly, then volunteer for it. I don’t quote Gandhi on this for the heck of it. Gandhi said, you must be the change you seek in the world. We have made changes to program many times based on survey results, and feedback. EVERYTHING you see on our website, all the material, videos, etc. was created by someone coming forward and doing it. We don’t have an advisory board anymore but maybe we need to bring that back.

I’ll also say I’ve received much more feedback this year on judging. While we have made a couple of mistakes (see Crosswinds below), there has been more to it. I’m open to suggestions here. The quality and commitment of our judges is first rate, but getting better is a continuous process. We all have the same objectives here.

Here are some common questions:

One team won five awards at Crosswinds Div. 2, aren’t awards supposed to be balanced? You are right, we screwed up here. There is no short answer to this question in general, or to this specific case. Programming, Design, Teamwork, and Research are scored categories. A judge (or more than one) will judge every team, and give them a score, which is made up of 5 subscores. Remember judging is about assessment AND ranking. So the judge scores each team, independently of any other judge or category. The judge follows the rubrics but also has to rank teams, so the rubric is more of a guide. Remember rubrics are great for assessment but ranking can only be done relative to other teams. Judging is really more like grading on the curve. The head judge will talk to the judges during breaks, see which teams are doing well, see how they are doing at the performance table, look in on the judging, etc. So by the end of judging, the overall strength of teams is fairly clear. The judges from each division will discuss their top scores, and any other things for possible awards. If one team is on top for two of the four judged categories, that’s ok, but they shouldn’t win any other trophies, except for head to head. If they are on top of three or four, the judges should see if there are any close seconds, and the judge can modify the scores in that second team to bump it up. In some cases, this doesn’t work, because the one team is just so superior, that you are wrecking the integrity of the judging by doing that. Sometimes that happens in small regionals with one really dominant team. I don’t particularly like changing scores, nor do judges, but I don’t particularly like seeing one team win all the awards either. It’s a complicated thing, with lots of variables. We’ve had over 60 FLL tournaments in MN since the inception in 99, and just a few different people have been head judges at them. So we have the experience to make these decisions, and have seen many different circumstances. But there is a lot going on in the afternoon at these events, and even experienced people can make mistakes. In the case of Crosswinds, a combination of some unrelated issues consumed time of the head judge, we had a process breakdown, and there was a dominant team. When we have made errors, we always bend over backwards to fix them. We made an error here in awarding too many trophies to one team, but even if we hadn’t done that, it would have had no impact on which teams advanced. At this point, I’m really not sure how to fix this other than apologizing to the teams.

The FLL Handbook talks about a limit on awards, doesn’t MN follow that? I think the rubrics done by FIRST (actually they are done by committees formed from state partners) are pretty good and I think their handbook is excellent. The FLL handbook however is all about what is termed “Official” tournaments. In laymen’s terms that means state tournaments. Regional tournaments are called qualifiers in the FIRST vernacular, and these are not subject to the same terms and conditions. However, in MN we try as hard as possible to follow the same rigorous judging and organizational processes for regionals and states. But state tournaments are much bigger, which makes in extremely unlikely you would have one dominant team. We give out a ton of awards at State, as many as possible.

How many teams win awards? Statistically, 50% of teams in MN win trophies at a regional tournament, and about one third at the state tournament (even with some teams winning multiple trophies). MN also has two divisions, which means we give out twice as many trophies, per tournament, than any other state. We order 32 trophies for state. We order over 140 regional plaques. So where do we draw the line here? There are 230 teams in the state, do we give each team an award? They all work hard, don’t they? Do we scrap scoring teams in judging and only advance teams from regionals by robot performance score? Do we not have a state tournament, and only a bunch of regionals? Do we invite everyone one to the state tournament and have it at the Metrodome? Do we give fewer awards at regions (maybe just performance and research)? Do we scrap divisions? Do we add more divisions? Do we put a restriction on trophies, and give trophies to the second place team if the first place team in a judges category has already won a trophy? Do we judge teams but not return the score sheets so we can move the placement of teams around? Do we use a different ranking system (ordinals, whatever)? FLL is unusual in having so many awards/judged categories that are correlated. Should we reduce this number? And what leeway do we have anyway as an official FLL partner? There are a hundred more questions like this. My only point here is that this isn’t simple. I’m always willing to listen to suggestions here.

Experienced teams seem to have an advantage. Yep, that’s true. That’s why we put in divisions way back in 2000. To date, MN is still the only FLL program in the world that does divisions. But even with that, FLL is a program that rewards teams that stick with it. They get better at programming and design and problem solving and teamwork, which is the whole point.

Where do officials come from? Local companies, ex-coaches, former students (yes, we have some ex-FLLers as judges – college age). The vast majority are engineers or scientists. With a rare exception, all technical judges are engineers, programmers or scientists. For example, the judges on the Crosswinds Div 2 team, were a lawyer (teamwork judge – coached a team to a Director’s Award in the past and wrote the FIRST teamwork rubric); a CEO of a local research lab and former university professor (research judge); a computer engineer and life long Lego builder (Design); a software engineer (programming), and head judge is a PhD in Mechanical engineering, a VP at a local manufacturing company and a FLL judge for 5 years. These judges have no connections with the schools involved and frankly usually don’t even know anything about them. Many of our officials are world class engineers and scientists who I would trust with anything and they bring that same level of critical thinking to judging. But we grow, add judges, etc., so we always can get better. At the end of the day, all these people have the same objective you do, bring good science and technology, and problem solving, opportunities to kids.

Let me just say a word here about kids and adults. We all know the kid story here but I believe that this program also changes adults. Whether you are a coach, mentor, or an official, you learn a lot about decision making, dealing with teams, handling pressure and difficult decisions. I’ve seen young people who are officials grow considerably from their involvement. And I know what coaches learn. I have a presentation I give to groups entitled “Everything I know about Management, I learned in Lego League, or why coaching kids teams is better than management training”.

The handling of the pipeline doesn’t seem to be consistent, the pipeline is “fixed” but isn’t aligned right, and the flags don’t go up, are points awarded? Arrgh, the sea, she is a harsh mistress. As Chris said, during challenge training, this challenge is all about finesse. Many teams hit the pipeline piece hard or sideways, and it doesn’t go in right, and when the kids push it, the flag doesn’t go up. We are trying to consistently NOT give any points there for that. There are some judgment calls however, the robot may misalign the piece when the kids pick up the bot (if it stops there) or other similar cases. Refs always give kids the benefit of the doubt here. We’ll work on this consistency but I think it’s fair to say, with all due respect to Scott, the pipeline design wouldn’t win any robust design awards. :-) Maybe that’s true in real life too, which is why they leak so much..

How do you determine how many teams advance? This is explained clearly in the scoring document. There is no standard across states (or from FIRST) on how teams advance from regions to states. MN has been doing this the longest, and we have worked very hard to develop an overall fair scoring system.

Why does a simple robot that scores 200 on the performance table get a lower technical judge score than a robot that scores 90? Don’t you follow the KISS principle? Very good question, and it comes up every year at regions. Yeah, I’m an engineer and I understand this issue from both sides. The primary reason this happens is this: the challenge is specifically designed so a simple point and shoot kind of robot can score around half the points. But that simple robot is maxed out at that level, no matter how much the kids work on it. A more complex, more capable design can do all the missions (400 points) but at the early stages, like at regions, it may not be completely working yet. If you look to the winning teams at state, in design, they always score very well on the table, are elegant designs, capable, but not overly complex. We have done analysis in the past to show positive correlations of performance score to technical judging scores. Here is my philosophy on competitions/subjective/objective scoring. In junior high and below, I believe judging of things like programming is important for two reasons: it forces the kids to focus on it, and it ensure kids are doing the programming. At high school, there should still be some judging but it can be lowered in emphasis, with more focus on actual results. In HSR we have done that by making the performance score worth 60% rather than 25% in FLL. And by college, competitions should be purely objective performance (like the solar car race) because that is the real world.

My team’s presentation looked better than the one I saw at the awards ceremony? Certainly possible. Remember there is also a 10 minute Q&A with the judges and that is where many times the judges get their real answers and insight. So you can’t go by the presentation alone. I will say that we are going to start reading a paragraph from the judge on why this team won. This will help give some context as just the presentation can be misleading.

The presentations seems to be favoring teams with cool presentations versus ones that are well researched. This comment bothers me, because I’m not sure what’s going on here. I know my philosophy here, and I know the core research judges’ philosophy, and they couldn’t be further from this statement. So either the new research score sheets from last year (from FIRST) are driving us to this, or it’s just simply that good teams also give good presentations. I’m open to further comments here. Here are some comments from a research judge on this topic.

“Many team have good solid research but have : a) Poor or missing analysis - Teams consistently (though not always) clearly identify the issue. Most propose a solution. Very few of them explain how it resolves the issue, and maybe 3-4 teams explain why it’s a good way to resolve the issue. I look for “here is the issue, and why it is important. Here is our proposed solution, and how it resolves the issue. Here is why it’s a good solution.” b) Missing ‘follow-through’ – I will consistently award a team a higher score if they *complete the assignment* compared to a team that excels only in a few areas. c) Poor presentation –style and ability to communicate counts as part of the overall score. It counts in the real world, too.”

The research project is about identifying a problem, proposing a solution and validating that solution (in this case as much as is practical). That’s problem solving. And teams that go through all of that, will do better.

The judge didn’t seem to ask many questions. This shouldn’t be true but I know some judges aren’t real outgoing (this shouldn’t be a surprise, they are techies). I’ve judged programs like this a lot and I do know an experienced judge can tell a lot without asking any questions but they still should. That’s the point. I’ll work on this with the judges.

Scores in one region are much higher than in another, it doesn’t seem fair that teams in a region with much lower scores can advance just as many. Sometimes life isn’t fair. This is what I call the Jamaican Bobsled team theory. Jamaica sent a bobsled team to the Olympics even though the top 20 Swiss team were probably better than that one Jamaican team. Too bad, all those Swiss teams can’t go. This is a common sports thing, one conference or region is always tougher than the others. Such is life. Teams are assigned as much as possible to a geographical region. As head official, I don’t even look at what teams are in what region until I actually get to the event.

Some comments on the scoring range. This year in each category, we are using a scale of 1-10, instead of 1-20 as we have always done in the past. The primary reason for this is, was the result of ongoing discussions with FIRST and other state partners. As you know the rubrics have 4 levels of performance. We think the rubrics are just fine, and were well done. The recommendation is the 4 levels translates into scores of 1-4. We don’t feel that gives enough range, and did experiments last year to test it. We felt a good compromise was 1-10, so that’s why we switched. There is no standard amongst partners on advancing teams from regions (or qualifiers) to “Official” (state) tournaments. We feel our system is well tested, soundly based on principles, and fair. But we also recognize it’s complicated and there are other methods that are ok also. So until there is some consensus amongst state partners, we’ll keep our system as it is.

But this past weekend, in 3 of 4 tournaments, there were ties for the last spot going to state. This has never happened before (a tie). We believe this is a result of the 1-10 scoring system not giving enough range for judges and therefore as we go down towards the middle of the team scores, there is not enough point separation. This was obviously the concern with the 1-4 system (and the fact all good teams would likely get all 4’s). So after next week’s regionals, we will go back to 1-20 for state tournaments and for next season.

So the bottom line is, yes, I think about this stuff a lot, I lose a lot of sleep over it, we have put a lot of work into it, but I will always accept feedback, criticisms, suggestions, etc. because it can always be better. And yes, sometimes we make mistakes.

I am going to close with two quotes. One I gave at last year’s state tournament, and one from a judge from this year (we were discussing some feedback from a new team). These aren’t platitudes. I’m not a platitudes kind of guy. We need programs like FLL to help kids become good, critical thinking, problem solvers and that have technical and scientific literacy. None of us can do this ourselves but as a community we can.


Quote of the week:

Growing up, my favorite aunt was a nun, Sister Martha Louise. I liked Sister Martha because she was funny and she always talked to me like a real person, no matter how old I was. Sister Martha used to say :

“Christianity is in the messiness”.

What did she mean by that? That you don’t learn and grow at a big formal thing like a mass, but in the every day, messy, interactions with others. This certainly holds true, no matter what your faith.

You know what? FLL is something like that. The value to you and your team isn’t at some big formal event, but in the daily struggles in working as team, solving problems.

So remember that we are giving out awards here, but that you have all succeeded by getting here. Don’t base your feeling of success on an award that could come down to a robot acting weird, or a judge’s view you may not agree with. You have won a lot just by challenging yourself to get here.

And from the judge:

I have been involved as parent, coach, or judge, since the very first trial run of the Lego robotics competitions. In only a few years this has grown from fairly primitive, subjective awards to well-described as-objective-as-reasonably-possible awards. We haven't figured out how to get the robots to do the judging yet, or maybe we could eliminate the rest of the subjectivity. :-)

It's hard on the kids when they don't win. It's especially hard to experience disaster on the day of a competition, or have to deal with a robot that suddenly doesn't perform as well as it did back on the home table. Being the coach on competition day is one of the most exhausting things I have ever done. I remember getting so wrapped up in wanting my kids to feel success that I felt tempted to blame someone else, anyone else, when we simply weren't able to show that we were the best.

I also remember the aggravation I felt, as a parent or a coach, when the judges or referees didn't do what I thought they should. Teaching my kids how to respond graciously to that was an unexpected challenge. This is the part of "gracious professionalism" and teamwork that the kids don't often see modeled -- after all, people being polite doesn't make interesting television or exciting news.

I extend my sympathy to the coach and the team members for their difficulties. I hope they realize the value of the things they learned prior to the competition. The competition is just one day out of many they have spent, and disappointment there should not be allowed to ruin their months of working together. There was no favoritism or bias at the competition, just serious competition and imperfect (human) judges.

I write too much, but I feel strongly about the value of this program. If this coach has specific suggestions of how we can make things better, perhaps they would share them. This program has improved an astonishing amount in very few years (and it was at least average to begin with), and will continue to improve with good input.



Community email addresses:
Post message:
List owner:

Shortcut URL to this page:
Yahoo! Groups Links

<*> To visit your group on the web, go to:

Document Actions