Tuesday, April 22, 2008

One criteria is not enough

More from Scott Adams. This time, he proposes a rating system for movies to help him identify movies he might like. He has 11 different criteria, and that's just for a 1-3 hour experience. Don't we need at least that much to figure out what is going on in schools?

Mr. Adams criteria are impressive. Shape of story arc, star power, mumbling quotient, bladder, artistry, sadism, originality, incomprehensibility, humor, scariness, suspense. All but the first on on a ten point scale, whereas story arc is a series of high/medium/lows. 

It's important to note that he does not suggest adding all these up. Rather, together they give profile of a film, so that people with different tastes -- or perhaps we can think about it as 'needs' -- can find what they are looking for.

Unfortunately, our current paradigm in education calls for a single rating for each school. Under NCLB, it is the simplest possible rating: yes/no. Either a school is making all the progress it is supposed to making, or it is not. In New York City, we have a wider set of criteria, but it is still reduced to a single rating, A-F. 

If I were to design a system to describe schools, I would never reduce it to a single rating; schools do too many different things. There's teaching basic skills and complex thinking. There's teaching analytical problem solving and teaching open-ended creativity. There's working with average children, special education student and the gifted. There are academic issues, and non-academic (e.g. teamwork, perseverance, work ethic). All of these are important, but not equally important to every parent -- perhaps not equally important to every community. Reducing measures of schools to a single measures loses that detail that parents want to know about.

Moreover, losing that detail equates schools that do many things extraordinarily at the expense of doing some terrible with with schools that do everything well but nothing extraordinarily. To put that another way, is a C average a result of a bunch of individual C's, or is it a result of a mixture of A's and F's? 

How do we judge schools, or those who work in them, with just a single composite measure? If Mr. Adams can even jokingly suggest that he needs a dozen criteria to understand a movie, shouldn't we need even more for a school?

Monday, April 21, 2008

What is patriotism?

Last week, Scott Adams (the author of Dilbert) posed the following question on his blog:
If a person is relatively certain that going to war will end his ability to enjoy the rest of his life, one way or another, and the war does not present a plausible threat to the homeland, is such a person unpatriotic for dodging the draft to save himself?
Some of comments struck me as rather asinine. For example, they keep referring to you "your country calling you to serve," or questioning the value of patriotism. And yet, they don't actually get to the real question Mr. Adams is posing: What is patriotism?

Many arguing that patriotism itself is a bad thing seem to be operating under the assumption that patriotism mean blind acceptance of the president's views/beliefs/orders/desires. But plainly that can't be true.

Obviously, one element of patriotism is love of country. I would suggest that that is actually the root of patriotism, it's essence. Love of one political party over another is not patriotism, nor  is love of a particular leader, though perhaps either could be compatible with patriotism. 

Refusing to defend one's country against and existential threat is clearly unpatriotic. No question. 

Refusing to enable internal powers to twist one's country to support their new goals or ends is not unpatriotic. 

But the line is not clear. If you love your country enough, wouldn't you want to see its interests furthered? For example, if the United States needed to annex Canada in order to survive an energy crisis, it strikes me as being unpatriotic to refuse to take part in the invasion. Of course, allowing the country to get to point -- or enabling others to do so -- would also be unpatriotic. 

An essential element of modern (liberal) democracies is the peaceful handover of power from one faction/party to another. George Washington stepped down after two terms, peacefully. And Al Gore refused to challenge the results of the 2000 election past the Supreme Court decision, though many urged him to and the Constitution allowed for it. Is it inevitable that power will change hands, and with that so will policy.  Louis XIV said, "L'Etat c'est moi" (i.e. "The state is me"), but democracies' leaders cannot say such a thing. Therefore, aggrandizement  of a particular leader or blind devotion to his/her policies cannot be patriotism, as both will change shortly, perhaps even radically.

Criticism of a leader is not unpatriotic, on it face. And refusal to serve an unjust cause that does not protect the nation or its interests is not unpatriotic, either.

Of course, one must come to some kind of answer of what it is that is loved, when on proclaims love of country. Is is the land itself? Is the original peoples? The powerful classes? The masses? The messy diversity that exists in that particular country, be it ethnic, socio-economic, regional, what have you? Their common denominator? Some set of value or principals or some sort?

I think that there are different kinds of countries and that there different answers for these different countries. I don't think that Iceland and the United States can have the kind of patriotism. Icelanders share a common language that goes back centuries, a homeland which their forbearers have inhabited for as least as long, a common culture and way of life. The US is almost entirely made of immigrants and their descendants. Has even a quarter of our population's families been here for even 100 years? (Half of New York City either immigrated here themselves or are the children of immigrants.) The French might love their common culture and language, both of which go back far further than this country's. 

I think that this country, the United States, is especially a country of values. 230 years ago, it was not out ethnicity that set us apart from England, and yet we broke apart from the British Empire. Patriotism in the United States cannot simply be about our ethnic heritage, as that varies. It cannot be about the land, as we have grown through our history and out founding documents barely reference it at all. Clearly, the United States is largely about our governing values and principals.

However, I would not argue that the Constitution is the entirety of what matters about the US. There are other values that are a part the very fabric of this nation that do not about in the Declaration of Independence or the Constitution. Pluralism, the common school, economic opportunity, freedom from many areas of discrimination. 

I acknowledge that there are some questions as to which values actually should be included in that group. Simply projecting all of my own values cannot work, as others have different values. There will always be arguments about what values are the essential American values, and some incontestable values likely can be in conflict with others, so the prioritization of those values will also be an area of debate. 

But blind faith in the leader of the moment or the powers of the day? That is not patriotism. Temporary control over the levers of power does not make a group, however powerful, the same thing as state. 

So, dodging a draft that exists to fight an unjust war that does not protect whatever it is about a nation that is loved when one is patriotic? That is not unpatriotic. And it might even be patriotic itself. 

Wednesday, April 16, 2008

Wireless Disk Speed Test

There are so many ways to access data with Apple hardware. It could be on an internal drive. It could be on an external drive, either FireWire or USB. It could be on a disk attached to an AirPort Extreme (AirDisk), or a TimeCapsule. It could even be on the disk inside a TimeCapsule. How do the speeds of these various methods compare?

I used XBench, a free benchmarking tool, to compare the speeds of these different forms of storage. I don't have anything in particular to say about this tool, nor can I vouch for its usefulness generally. But it was free. Regardless of its details, it provides a common measuring stick, especially because I have run it from the same machine (i.e. a stock early 2008 black Macbook) for each test.

Unfortunately, other issues have varied across groups of tests. For the first group of tests, the external drive was a 300GB/7200rpm IDE drive in a Metal Gear Box USB2.0/FireWire enclosure. For the second group of tests, the drive was a 750GB/7200rpm SATA drive in a USB2.0 enclosure. 

                   XBench Disk Test Score
A: Internal Disk            28.70
B: FireWire Disk            49.01
C: USB Disk                 33.82
D: AirDisk (802.11n/2.4MHz)  3.91
E: AirDisk (100bT)           6.94


The internal disk is a 2.5" SATA disk, which explains why it scored so much lower than the USB and FireWire disks. Note that the USB and FireWire disks are actually the same disk, in the same enclosure. Of course, the two tests use different chipsets in the enclosure, and the pair serves to demonstrate the speed advantage of FireWire. Moreover, FireWire outperformed USB in every disk subtest. This same enclosure was attached by USB to an original AirPort Extreme BaseStation, which supported 100bT Ethernet. The wireless connection was 1/10 the speed of the direct connections. While the wired Ethernet connection was twice as fast as the wireless, that was still about 1/5 of the direct USB connection.  

For the second group of test, I attached the same 750GB/7200rpm SAT/USB drive to the same Apple BaseStation (BS) and to an Apple TimeCapsule (TC) , which also had its own internal 500GB/7200rpm SATA drive. 

                         XBench Disk Test Score
F: USB 2.0                         30.67

G: BS AirDisk (802.11b/g/n)         3.55
H: BS AirDisk (802.11n/5MHz)        5.10
I: BS AirDisk (100bT)               7.03

J: TC AirDisk (802.11b/g/n)         2.41
K: TC AirDisk (802.11n-only 2.4MHz) 3.27
L: TC AirDisk (802.11n/5MHz)        6.64
M: TC AirDisk (1000bT)             12.96

N: TC Disk (802.11b/g/n)            3.03
O: TC Disk (802.11n-only 2.4MHz)    3.56
P: TC Disk (802.11n/5MHz)           6.32
Q: C Disk (1000bT)                 15.10



Clearly, switching to 5MHz band -- at the cost of backwards compatibility with 802.11b & 802.11g, and the ability to penetrate as many walls -- gives a huge speed boost, both for the Extreme BaseStation and the TimeCapsule. Simply turning off compatibility itself results in a speed boost, though not as great. A wired connection, however, is always much faster than a wireless connection, topping out at 5x as fast as the b/g/n network. Note, however, that the same disk was still more than 2x as fast when hooked up directly with a USB connection.

Wireless benchmarking is notoriously flakey, and so you have to put assume some margin of error in each of these scores. That would explain why each of the disks had the top score in at least one test, despite clear trends suggesting that the TimeCapsule disk is the fastest and BaseStation's AirDisk is the slowest. 

Thursday, April 10, 2008

Why K12 teachers need tenure

Brian Lehrer keeps asking why we need tenure, given the teacher shortage we face. Unfortunately, he keeps doing it in the context of other education discussions, and he never gets an answer.

So here I go.

******************

First, a little history. Tenure is not a product of collective bargaining. It is not in teacher union contracts. It is a matter of state law, and goes much further back than when teachers unions gained collective bargaining rights. Though in higher ed it has been about academic freedom, in K12 education it has equally been about patronage (i.e. new administrators replacing existing teachers with their cronies).

Second, a point of clarification. Tenure for K12 teachers is not guaranteed lifetime employment. Rather, after three years of teaching with good evaluations, tenured teachers cannot be fired at the drop of a hat or at the whim of an administrator. Rather, the principal must document the teacher's problems, let him/her know about them, given him/her a opportunity to correct them, and then check to see whether they were corrected. If all of these steps are documented, the teacher can be dismissed. For an employee who has already proven him/herself over years of employment, it's just good management practice. 

*******************

So, why do we still need tenure? Patronage is not the problem it once was, certainly not with a teacher shortage and licensure requirements. And what kind of academic freedom do K12 teachers need?

A) The teacher shortage is not evenly distributed. High performing schools don't have the same problems attracting teacher. High paying district don't have the same problems attracting teachers. English, social studies and art teachers are in good supply for most schools. Low performing schools, lower paying districts, math and science positions, these are the areas of teacher shortages. So, the shortage issue is not a factor for most teachers.

B) This really comes down to the question of why principals might want to be rid of a teacher. I would suggest that any manager would want to be rid of any employee who makes his/her job or life harder. Ideally, this would only be low performing teachers, but that is a fantasy view.

Any kind of rabble rouser can make a principal's job harder. Encouraging parents to express their concerns to a principal ends up costing principals a lot of time. Encouraging parents to go high up in the organization of the principal does not give them fair hearing can cost a principal far more. Encouraging students to organize and express their concerns to administrators is not always taken well, either. And yet, teachers who feel a strong drive to teach for social justice commonly do both of these things.

Obviously, union activists are already protected by other labor laws.

C) Academic freedom in K12 is not like in higher education, that's true. But it is still an issue.

A teacher who tries to raise the bar in his/her classes can create no end of problems for a principal. If standards in school have been too low, and a teacher demands more than students are accustomed to, students and their parents can demand enormous amounts of principal's time. This is a different form of rocking the boat, but can still be enough for a principal to wish to be rid of the teacher.

Principals cannot be experts on everything. Once, when teaching high school English, my principal as a former middle school math teacher. He insisted that I as an English teacher, "not worry about critical and analytical thinking" and "just teach English." Though he had no training or experience with high school English, he had ideas about what it meant. He did not approve of the fact that I was spending as much time on teaching my student how to reason as on the mechanics of writing. His assistant principal insisted that we not teach students to write essays at all  any more, and instead focus on other forms of non-narrative writing.

Another principal might be an old school traditionalist and insist that English classes only be about books. He might not approve of using film or video to teach about theme, plot, symbolism, character development, story arcs, allegory and any of the rest. But a teacher might feel that this would be the best way for students to learn these lessons. I never had this experience, but I have spoken to those who have. Quite simply, teachers who insist on doing something different than what has been done before can face blowback from administrators and from parents (i.e. "if it was good enough for me..."). 

******************

No, we don't need tenure if principals can be counted on to make good decisions in the best interests of children. But they are human, and therefore often make decisions in their own interests. Moreover, we have a real shortage of high quality principals, even as we are breaking up large schools into multiple small schools and opening up charter schools. 

Which brings me to something that Richard Rothstein talks about. We have regulations not to ensure the highest quality of anything, rather to prevent the lowest quality. Once a teacher proves him/herself over three years, tenure ensures that s/he will only be dismissed for a valid reason after s/he -- who has already demonstrated that s/he can do the job well -- has been given a chance to correct the situation. 

I do not suggest that there are not problems with our tenure system. A lot falls to principals, perhaps too much. Teacher observation and evaluation is not easy, and the tenure process in dependent on principals making good decisions about teachers during those first three years. Principals, coming from the teacher ranks, know far more about how to teach or have difficult conversations with children than with adults, and yet they are expected to do these thing in the process of trying to remove a tenured teacher. Principals need support and training that they rarely receive.

And that is why we still need tenure. It takes a series of bad decisions over a number of years for a poor teacher to get tenure. But without tenure, it only takes one bad decision for a good to be dismissed. 

Can a teacher be expected to ensure every one of his/her students makes progress?

Can a teacher be expected to ensure every one of his/her students makes progress? 

I don't know. It's a tough question.

Obviously, we want the answer to be "Yes! Of course!" But what is the reality of the situation?

The question is really comes from the debate about whether to hold teachers accountable for student progress. To what degree should individual student data be the basis for kind of system, even leaving aside problems with the tests themselves. 

During my last year teaching in New York City -- where the state maximum class size for high school classes is 34 -- I had as many as 37 students in class. Could I ensure that every one of those 37 student made progress. The previous year, I had a class roster of 41, but I was told that most of them would never show up. (This was a class for sophomores held during the the last period of the day, a period that usually was only for freshman, who did not start the day until second period.) Less than a dozen of them showed up even once, and only half a dozen showed up more than a small handful of times. That spring I also taught an extra class during first period for freshman who had failed English the previous semester, giving them a chance to catch up on that missing credit. Again, a roster of 40+, but only a fraction of that showed up even once.

Would it have been appropriate to expect each of the students on my roster make progress?

What about students who are pulled out of classes regularly for speech therapy, to speak to a social worker or psychologist or even is suspended? Should their teachers be expected to ensure that they make the same kind of progress as other students?

What about the students who show up to class, but don't do their homework? Or those who sleep through class most of the time? Should the teacher be accountable for their progress? How much progress? How about the student who misses the first 20 minutes of his/her first class every day because s/he is drops of his/her little brother at school every morning before coming to his own school? 

If a teacher is working one on one for even half and hour with just one student, of course the teacher should be accountable for the student making progress. But how much progress? Whether we like it or not, some kids are smarter than others, or have more aptitude in a particular area. Some are better prepared for the content of a particular class. And some have more support out of class (e.g. people who encourage them to do their homework, a quiet place to do their homework, etc.). 

If a teacher is working with just five students for 38 minutes a day, I think that it is fair to hold the teacher accountable for them making progress. But if they were never properly taught arithmetic, if is not fair to hold the teacher accountable for their learning algebra at the same rate as another group who have mastered arithmetic. Each group might start knowing no algebra, but the educational experiences of one group makes them much better prepared to learn algebra than the other. 

I am not making excuses. Rather, I think that these different situations should result in different expectations for progress. If we are to hold teachers accountable, our system must take these kinds of issues into account. Moreover, I would suggest that issues tend to cluster in schools, mostly as a function of socio-economic and immigration factors  in neighborhoods, 

But even if we take all this stuff into account, we still have the original question. How many students must a teacher have before we would expect one of them to fall through the cracks? 50? 100? 150? 300? 500? Surely there is point where it would be inevitable, right? Surely this is some point where it becomes more likely than not -- what I mean by "expect" in this case. I mean, even for an above-average teacher, there's got to be a point.

Of course, I want the answer to be far above the student loads that teachers face. I want every teachers to be able to reach every student. But is that a reasonable expectation?

This is a great part of why Ted Sizer, Debbie Meier and Paul Schwarz call for teachers to have smaller student loads. This does not necessarily mean smaller classes. The same number of students in a class, but for twice as long (e.g. double periods) would results in halving student loads, whereas a 20% reduction in class size combined with 20% increase in the number of classes taught would keep student loads the same. Their point is that every student should known well by at least one adult, and that is unlikely when every high school teacher has over 100 students. Some may be known well, but many will slip through the cracks. 


Wednesday, April 9, 2008

Probably


I like Hillary and I like Obama. Both are smart. But have demonstrated since their early adulthood that they care about the issues that I care about. Neither of them has any red flags that warn me off of them.

But I've favored Hillary all along. 

Obama's speeches are things of wonder, not question. He is capable of addressing issue and bring them to the nation that Hillary is not, and I don't just mean race. He is a big issue, ideals and principals guys. That is his nature, and some of his many gifts make him perhaps uniquely able to discuss them in ways that resonate with many Americans. (We'll see whether or not it is most Americans come the general election.) I don't think that he is just about speeches. He understands the legislative process. He has a sense of how to work with people. He is capable of discussing policy, no question.

But for some reason, I've favored Hillary all along. No knock on Obama, whom I find inspiring on many levels. My favor for one does not have to come at the expense of the other. In fact, I voted for Obama because at the time I felt that he was a better match up again John McCain.

But I've favored Hillary all along. 

Without getting into each of the candidates strengths and weaknesses, I want to explain why. In fact, I don't think that it is about their general strengths and weaknesses. They are each a great candidate. They simply are different candidates. I don't think that one is definitively better than the other. Either would make a great president. Either can beat John McCain -- I'm now not so sure which would be a better match up. I've thought a lot about each of their strengths and weakness, but I don't think that that has really made a difference.

So, why have I favored Hillary all along? These are not reasons that I am necessarily proud of, but I am not ashamed of them either. They exist in the context in which I like both of them, and would happily support either of them. These are the results that I think I have uncovered in myself.

First, I feel like I know her better. Obviously, I've known her longer. I was in college when Bill ran for president and I really liked her then. After Nancy Reagan and Barbara Bush, Hillary was exciting to me. She was the kind of woman I respected. She was more like my Mom, in that she was a well-educated professional with her own career. She did not pretend to take a backseat to her husband, despite his political successes. She was interesting, smart  and opinionated, even sarcastic. She was a real feminist, and was portrayed as a radical liberal. Every one of these things was good. (I must admit, though, she is not as liberal as she was painted, as not as liberal as my ideal president would be.)

I could only hope that I might marry so well. I don't mean that I wanted to marry her, but certainly I wanted someone with every one of those traits. 

Through the years, I've learned more about her. She was bookish and strong, clearly awkward during during formative years. She never was fashionable, even when she tried. But that didn't diminish her at all, as she demanded to be evaluated on entirely different standards. I learned how thoughtful she was, how she she consciously came to liberalism and the Democratic party. She could have been one of my friend in high school or college. 

Perhaps she was too aggressive or pushy for that. We might have clashed and fought. Maybe we couldn't have been friends, but I'd like to think that that was the of person I wanted as my friend. And even if my pushiness and her aggressiveness wouldn't have mixed well, that didn't mean that she wasn't suitable to lead. In fact, her aggressiveness might even make her more suitable to lead.

I just like her. Always have. Probably always will. 

Furthermore, there's something about the abuse that she had taken all these years, not just during this campaign. Sure, Chris Matthews has been horrendous. But when she was first lady she was a huge target of the right. Even when Bill was running for president she was a huge target. Even when Bill was governor of Arkansas, she was a huge target, this feminist from up north. I'm sympathetic to that. I feel for her. Perhaps that shouldn't matter, but I think that it does. I know that years of experience a target of ad hominem politically motivated attacks really shouldn't be a qualification for presidents of the United States, but if I am being honest I've got to admit that it has resonated within me.

On the other hand, there is Obama. He's handsome. He's smooth. He's is, quite simply, more cool than I ever could be. He's arrogant - which I like - and he pulls it off with panache. Second City has a great line about him, "He's just the right amount of black." I think that that cool he's got is part of what they are referring to -- though clearly not the only part. 

I was never the cool kid. Moreover, I never wanted to be the cool kid, or hang out with the cool kids. I wasn't in love with the homecoming queen, and I wasn't impressed with my high school's quarter back. Did I reject all of this because I knew I couldn't be a part of it? I don't think so, but who can really be sure? Regardless, I was one of the smart kids and I was happy to be one of the smart kids. I never looked enviously as the other groups, be it in high school, college or since. 

Perhaps Obama was one of the smart kids. I mean, he was smart enough to be in that group. But he projects that he was always one of the cool kids. And I always felt like they were more about style than substance. 

I think that Hillary's near tears in New Hampshire is relevant here. Obama wouldn't have that moment, not publicly. It's not his style. He is an incredible public speaker and have an amazing presence about it. He's one of those guys who never seems to be working hard, even when he is. But in that moment, Hillary genuinely showed her frustration. She was at ropes' end. She'd been working hard for decades for these causes that matter to her, getting more and more immersed over time (i.e. from being first lady of Arkansas while being a partner at a law firm, to being full time first lady of the U.S., to being a full time senator). She's been working her ass off, and it actually is hard work. She was sure that she was what America needs, and people weren't listening. She was frustrated. She was tired. She might even have been a touch despondent. What more did she have to do?

I've had moments like that, when I look back on all my efforts and feel like people just aren't listening, are making my work harder than it need to be, when I feel that something important is not going to get done properly, despite everything that I have done for it. Passionate people everywhere have had those moments.

Does Obama have those moments. It's hard to think that he doesn't. But we'd never see them. That's not who is he is. He doesn't project as a grinder who gets things done. Rather, he's the leader, the rock star. 

Look, this isn't about which would be a better president. This is not about the most important strengths of each candidate, or their important weaknesses. They add up in my head to be just about comparable, though different.  This is not about who had a tougher childhood, who faced more barriers to their success, who was touched more by MLK's death, who had had tougher odds to overcome, who is smarter, who is a harder worker, who is more liberal or anything like that. This is about why, despite my conscious and rational conviction that these two candidates are both great, just different, that I have consistently favored Hillary all along. 

Use of test data

Today, the Brian Lehrer show attempted to address the question of the proper use of student achievement data in making tenure decisions for teachers. This was prompted by a NYT story from today's paper. 

Two important quotes from the story came from state legislation.
  1. Original, but scrapped language: "That section said teachers would be evaluated for tenure based on, among other things, an 'evaluation of the extent to which the teacher successfully utilized analysis of available student performance data.'"
  2. Final (?) language: "'The teacher shall not be granted or denied tenure based on student performance data.'”
I understand that on the surface this change seems a travesty, but it is not. Let me explain. 

To help you to understand the issue, I am going to give some other measurement examples that are likely more familiar to you.
  • Kitchen measuring cups come in two basic varieties: liquid (the transparent ones you read from the outside) and solid. You can use solid measuring cups for liquids, but it'll be a little in accurate. At the same time, however, you really cannot use the liquid measuring cups for solids with any real accuracy. Note that even though the context (i.e. the kitchen) and the purpose (measuring and cooking food), different tools are needed to measure the different substances well. Especially note the asymmetry in substitution, in that one can come close to replacing the other, but not the other way around. Small changes in what you are trying to measure can necessitate a whole different tool to measure it well.
  • Most of us have a simple device to measure our own size in our homes: the bathroom scale. However, it only measures one aspect of size well; while it is a good tool to measure weight, it has a horrible tool to measure height. It's really not a good tool for width or depth, either. Or waist size. Or any number of other aspects of size. Of course, you could use it to measure height, a little bit. If you know that the person weights 32 lbs, you've got a clue as to how tall they are. If you know that they weight 180 lbs, you again have a clue. But in either case, you could be far off, depending on whether the person is think or fat. And, of course, that assumes that you are measuring a person, instead of dog or a box of books. Similar constructs, or different aspects of the same construct (e.g. size) are not necessarily measured the same way.  
  • That bathroom scale has some other limitations. It assumes that it is being used in a particular context. If you take it to the moon to measure my identical twin, you will get quite a different answer. It is calibrated to earth gravity. Of course, that's usually not a problem. But if you work for NASA, it's something to keep in mind. Furthermore, while you can use your doctor's fancy scale on the moon, even it won't work in space. Context matters.
  • Bringing it back to the kitchen, some of us have food scales in our kitchens. We can't use them to measure people, but you can't use a bathroom scale to measure food for cooking. Though they each measure the same thing (i.e. weight) they are useful for different ranges and precision. My kitchen scale is 160 times as precise as my bathroom scale (1/20 ounce vs. 1/2 lb), but only goes up to 5 lbs. Precision matters, as the range at which a tool can be accurate 
  • Another kitchen example is measuring flour. The best cooks measure it by mass rather than by volume, because flour volume is notoriously unreliable. (Think about sifted flour vs. unsifted, for example.) But flour is sensitive to moisture in the air. In a more humid environment, you use a little more flour, as whatever you measure will include some moisture that the flour absorbed from the air, and in a less humid environment, you need to use a little less four. Environment matters. 
  • One last kitchen example. We presume that if water is boiling that that means that it is 212 degrees (100 Celsius, of course). But at higher elevations this is no longer true, because of differences in air pressure. So, in Denver, the boiling  point is 201F/94C. If you don't take this into account when calibrating your kitchen thermometer, your meat will be underdone and your cheesecake will be runny. Calibration is critically important.
All of these issues are apparent in achievement tests for student, the ones we use in our schools for NCLB and many other purposes, too. 

There are many problems with our tests, and none of them are new. E.F. Lindquist wrote about them in 1951 in a book called Educational Measurement. Now let's be clear here, he was no opponent of standardized testing. I mean, this is the guy who invented the scantron machine! But he was concerned that people would misuse test results, or misunderstand their meaning. 

One of his biggest concerns, and I think that this one underlies virtually all of the big problems with testing today, is that people can get confused between the items being tested and the "construct" that the items are supposed to stand for. 

For example, I might give an essay test on particular novel. As the teacher, I act under the assumption that the the scores students get on the test represent students' writing ability, understanding of the novel or even ability to analyze literature in general. Maybe that's a fair assumption. Or, maybe if I used different questions, some student would do better on the test, and some would do worse. Perhaps the boys would do better if I asked about some aspects of the novel, and the girls better if I asked about other aspect. Perhaps some questions might be easier for students who had read a particular other novel in another class the previous year, and those who were not in the same class last year -- and did not read that novel -- would do worse.

Technically, this is called "person x item interaction," meaning some students do better on some items (i.e. questions), and some do better on others. This is not to say simply that some items are harder than others, rather that different items are harder for different students. And when we confuse their performance on those item with their underlying ability, we are making a big mistake. 

I won't get into the ways that our current testing system makes that problem more likely, at least not today. But clearly taking a test designed for one purpose and using it for another makes that far more likely. 

**************

Now, to get back to the particular issues of the day. First, the change in the legislative language and the question of whether "performance data" (i.e. test scores) should be used to make tenure decisions. 

Our current tests are usually, at best, designed to measure current performance levels. But eve Brian acknowledged that going by performance level would not be fair, because teachers working in low performing schools should not be punished for their commitment to work with the most needy student.

So, measure improve in scores, year to year, right? Well, the tests are usually not designed to do that, and rarely do it well. Moreover, even if they were, recent studies have shown that lower income students lose more ground over the summer than higher income students. It's easy to imagine why, as higher income students are more likely to attend richer summer programs that build on their learning in schools. This is entirely beyond the control of schools and teachers, but this would imply a weaker teacher performance in low income schools than those in higher income schools, if we measured year over year growth. 

There are issues with using the same tests for low performing schools and Stuyvessant High School.

Should we expect students' math and reading scores to go up by the same amount in 7th grade as in 3rd grade, or are some years more critical than others?

What about the subjects for which we don't have high stakes tests? Do students perform differently on high stakes tests than low- or no-stakes tests?

These problems go on and on. Sure, there are answers to most of them, but they usually require more costly tests and analysis procedure. And some problems do not have answers.

But the real point, the biggest problem, and what I wanted to mention on Brian's show today, is that these tests are not known to be "instructionally sensitive." That is, none of them have been designed to differentiate good instruction from bad instruction. None of them have been validated for that. None of them have even been checked for that. We have no reason to believe that these tests -- or the individual items on the tests -- are capable of providing information that would allow us to identify good teaching or good teachers. Heck, we don't even yet know how to design items that are instructionally sensitive. This is what James Popham was talking about two weeks ago at the annual meeting of the American Education Research Association. (He is a former president of the association.)

Sure, if all you have is a hammer, than everything looks like a nail. But that'll break the lightbulb and it's not going to do anything useful when you are trying to change a flat tire. And that is what we are talking about here. We do not have the tool to accomplish that, and until we do we need to back off.

If the state of New York wants to invest a couple of million dollars in such a research effort, out of the $20 billion  it spends on education each year, perhaps that would be a good idea. The Department of Defense spends billions of dollars each year on the next generation of weapons and equipment, paying defense contractor to invent/develop them before using them in the field. 

**********************

None of the previous section addresses the original language in the legislation and why it might be bad. But there are problems there, too, even though it doesn't call for teachers to be evaluated on student performance.

I am all for teachers using performance data to help guide their teaching. I think that teacher education programs should teach pre-service teachers how to make sense of such data. But I'm not convinced that it should yet be a factor in tenure decisions.

First, we better make damn sure that the tests are good and actually provide meaningful measures of student performance before we demand teachers make use of them day to day. The original language calls for teaching to the test. That is what it means. It means that teachers should take test data to guide their instruction so that students do better on the next test. 

At its worst, it means teaching how to take these tests, or even how to answer the kinds of problems that appear on the test, rather than focusing on the core lessons of the topic. It means narrowing application of material to how they appear on tests, rather than real world use that might not be able to appear on the test.

It means that the tests -- regardless of their quality -- will drive instruction, rather than the tests providing information about student performance, or even about instruction. It confuses the cart and the horse.

And then there's the bottom line. It is principals who are responsible for evaluating teachers. This language is about what principals should consider. Unfortunately, principals do not know how to do this stuff, either. Moreover, no one teaches principals the dangers and concerns in depending on potentially problematic tests, that might be used in appropriately to draw conclusions about things that the tests are not capable of supporting. I do not blame principals for this, rather I look to their preparation programs and their districts, both of whom have failed to teach them about this valuable material. But if they do not really understand it, why would anyone put them in a position to evaluate it?

***********************

But I am not just a hater. I am happy to recommend resources that might help principals and other the learn more about this. 

DataWise is a book about using assessment data to guide instruction. It addresses the immediate problem at hand, how principals and teachers can use test results.

More importantly, Measuring Up: What Educations Testing Really Tells Us is a brand new book about tests, testing and educational measurement in general. It is written for a lay audience, without all of the complex mathematics and statistics that underlie testing. 

The credibility and expertise of the authors of both books are beyond reproach. These are not one-sided political screeds by any means. 

Tuesday, April 8, 2008

Research and Policy: What conferences can teach us

A couple of weeks ago,  attended the big education research conference (i.e. the annual meeting of the American Education Research Association). It's a week long, with hundreds or even over 1,000 of panels, thousands of papers, over 13,000 presenters, over 16,000 attendees. It lasts a week, taking over all the conference rooms in a few large hotel the middle of the host city, New York this year.

There are always a few panels on the topic of how research can inform policy, or why it does not. I always attend a few of these, as I only care about doing research insofar as it will actually make a difference for schools and children. 

A good friend of mine had an odd experience during her session. One of the presenters failed to show up, meaning that there were only three papers to present. They had been grouped under a common theme, but the papers were quite different. The additional time gained by only having three presenters resulted in a real conversation between them, something that there hardly ever is time for at these things. Sure, panels that bring together for discussions might take the form of a conversation, but rarely do those based on presenting papers (i.e. research). 

You see, the timing just doesn't work out. Each session gets about 90 minutes, with perhaps 5 minutes taken up for introductions and transitions. Each presenter gets about 15 minutes, and the discussant gets the same. That leaves ten minutes for questions from the audience, at best. If presenters or the discussant go long, that come out of the Q&A. Obviously, there just isn't time for a real discussion. Moreover, if the panelists knew there would only be three papers, they'd present for 20 minutes each. It was just the accident of the missing panelist that gave them the extra time for discussion -- and likely a discussant who was more interested in letting the panelists speak than in hearing his/her own voice. Obviously, sessions with five papers are even more crunched for time.

My friend remarked to me on what a great experience this was, and how she wished that all sessions could be like this. She conceded that it would mean that fewer papers would be accepted, but thought that it was worth it.

In my opinion, she was missing the purpose of this conference. She was accepting it on face value, that it was to spread knowledge and learning. I think that she was just wrong in this.

The purpose of this conference, like virtually every other academic conference, is to have somewhere to present papers. It is so that we can fill up our CVs. It is so we can show that we are doing real work. Of course, other people rarely actually read the papers we write for these conferences. If we are doing it right, we actually just present drafts of articles that we hope to get published in journals. Journals that few people people will read except to find references and citations to add to the papers and articles that they themselves are writing.

An exaggeration? Sure, but not much of one.

Of course, these conferences also exist for networking purposes. There are any number of social events, and everyone in the field is in town for this one. I know that every year I see people whom I both like and respect, but whom I never see otherwise. 

But no one actually thinks that people are going to read their papers. The association tells presenters to print 12 copies of their paper, and two more with large text for those with poor vision. A small minority of presenters actually do this. Most presenters tell people to email them to ask for a copy of the paper, but they seem surprised when this actually happens. Some respond that their papers are still in process, or data is still being added, or they are being revised based on the feedback they received at the conference. 

This year, I emailed nearly 90 requests for copies of papers presented this year. Only about half replied by sending them to me, and most of them seem surprised at the reqest. Five more just emailed me copies of their slides. Half a dozen said that their papers would be ready in a 2-5 weeks. And the rest, more than a third, did not even respond at all. This matches my experience in years past. (Of course, I don't ask discussants for their notes, or people on discussion panels for papers. These numbers only refer to people who were supposed to present papers.)

I don't think that this means that the conference is a fraud, or that the presentations are frauds. Rather, it indicates that the purpose of the conference is not actually the sharing of papers, that it is not learning on the part of attendees. And it is not fraudulent because everyone knows already knows this. I only think that one of my two papers this year was really any good, but I am not worried about it because I do not think that anyone is ever going to read it.

**************

And now, back to the point. Why don't people who make policy listen to researchers?

There are standard reasons given all the time. Researchers can't answer questions fast enough for policy-makers. Researchers come up with different answers over time. Researchers often can't explain that the single salient feature or trait is, making their answers too complicated to be the basis of policy -- or even too incomplete. Policy-makers care more about the appearance of making a difference than actually solving problems. Etc., etc..

These reasons may all be true. But I don't think that any of them are the most fundamental problem. 

Rather, I think that the major reason is that most researchers just don't care. 

Their job is to do research. Or, perhaps, it is to get papers published. But their job is not to make a difference in schools or for students. It is not even to have influence in their profession. And the rest is just not of concern to them. If so many researchers don't care if anyone reads their papers, why should anything think that they care about influencing policy. Moreover, these attitudes are learned somewhere. 

I had an interesting conversation with a professor from the midwest who used the phrase, "Caught, not taught" to explain how most lessons in schools are transmitted, be they for the better (e.g. the value of hard work, the importance of treating people with respect) or worse (e.g. the relative importance of style over substance, of politics over results). It is not that anyone means to teach young researchers (i.e. doctoral students and junior faculty) that making a difference in schools does not matter, but that view is caught by them nonetheless. 

We, as group, are rewarded for publication and have not insisted on other measures of success, nor have put those other measures first in our own careers. We are not actually interested in sharing our work with others, and certainly not in a form that does not contribute to the particular system in which we are locked.

So, is it any wonder that policy-makers don't listen to us? I don't think that we actually are trying to say anything to them. 

Tuesday, April 1, 2008

Drobo v. ReadyNAS V: Conclusion


So, what's the bottom line? I've written about what I am looking for, the obvious differences between Drobo and ReadyNAS, differences in connectivity and use of disks, and some of the more technical advantages that favor ReadyNAS. What does it all add up to?

Speed is not a huge issue for me. Sure, I would like a faster solution. But it is not that important. If things go well, I'll never need the fastest speeds. And I'd have to upgrade other equipment to take advantage of it. More importantly, Drobo is already faster than 802.11n wireless networking, so it is fast enough for me. However, others, especially those working with large graphics or video files over ethernet could really make use of ReadyNAS's huge speed advantages. 

Cost, on the hand, is a concern for me. Drobo makes better use of its disks, and costs less up front, especially if you don't need DroboShare to put it on a network. That's a major advantage for Drobo, regardless of how you use it. 

ReadyNAS has many features that Drobo does not. I am especially intrigued by the ease with which you can back up the data on the unit. This would be important desktop users, as their computers and any backups on their ReadyNAS are always in the same place and if disaster strikes all the data could well be lost. It is also important if any data is kept exclusively on the device. However, keep all my data on my laptop, so the device itself is a backup. I am a laptop user, and I usually have my machine with me, meaning that the laptop itself acts as offsite storage, much of the time. And if there were a fire in my home, you can be damn sure that I'm not leaving without my laptop, anyway. But older backups themselves are valuable, and Drobo does not make it easy to copy them. To be entirely honest, though, as much as I like this feature, I am not disciplined enough to use it. And ReadyNAS's other advanced features really are of no use for me.

Apple supports Time Machine -- its cool backup system built into the latest version of OS X -- to devices attached directly to its routers or other Mac OS X computers, but not to other network devices. For my purposes, this virtually cinches it for Drobo, as it can be plugged into my Apple router with USB. I'd rather not try unsupported hacks to make Time Machine work with ReadyNAS.

In fact, I tend to wonder why home users would want ReadyNAS. For basic use, Drobo is cheaper and simpler. Its advanced features are more appropriate for small offices than even most home offices, even even most most offices wouldn't know what to do with them. 

And so, despite the recommendations that I have received to check out ReadyNAS, I don't think that the decision is close at all. When my current network drive gets full, I'll get a Drobo and another internal drive or two and be good to go!

Drobo v. ReadyNAS IV: ReadyNAS Just Does More

In previous posts, I've discussed what I am looking for, some obvious differences between Drobo and ReadyNAS and differences in their connectivity and use of drives. They have tended to favor Drobo. But ReadyNAS does some things that Drobo does not.

  1. As I have already mentioned, ReadyNAS is faster. It does not have to deal with a USB bottleneck (40-60MB/sec, theoretically) when doing gigabit Ethernet (100BM/sec). 
  2. Drobo is not even as fast as USB 2.0. Drobo claims throughput of up to 22 MB/sec, but most people report around 15MB/sec. On the other hand, ReadyNAS can push through as much as 37.9MB/sec, and there are reports of 50+MB/sec or even 70+MB/sec. ReadyNAS is known for its speed, and it blows away Drobo in this area.
  3. While you cannot connect ReadyNAS to a computer with USB, ReadNAS can share devices that are hooked up to its own USB ports, including flash drives, USB hard drives and even printers. Drobo and DroboShare can only share the data on the Drobo itself.
  4. ReadyNAS supports many file service protocols (i.e. SMB, AFP, NFS, HTTP, FTP). Drobo only supports SMB. 
  5. ReadyNAS devices  can be set to back themselves up, and Drobo cannot. This itself can be a very big deal, as it easily allows you to create backups to take offsite. 
  6. ReadyNAS comes with backup software for other computers to backup to ReadyNAS (Retrospect for Windows and for Mac OS X). Drobo does not.
  7. ReadyNAS can automatically shut itself down if the UPS it is attached to notifies it of power loss. Drobo cannot.
I'm sure that there are even more differences that favor ReadyNAS, too. 

Of course, there are more similarities, too. They both have built-in system monitoring and notification of problem, though surely ReadyNAS's is more complete. They both support remote administration, though ReadyNAS's is more complex as there is more to administer. Moreover, I believe that ReadyNAS can be administered from a web browser -- so long as it supports Java -- whereas Drobo requires a particular application be installed on the machine. 

When you pay for ReadyNAS, you clearly get more. In fact, if these features matter to you, they are easily worth more than the price difference between the devices. Be it data security, speed or protocols, these differences are so great that one might even think that ReadyNAS is in a whole difference class than Drobo.

Drobo v. ReadyNAS III: Connectivity and use of drives

Previous posts have addressed by needs/criteria and some of the obvious differences between ReadyNAS and Drobo. This time, I'm getting a bit more technical. 

Drobo does not do Ethernet by itself. That means that it is not a truly a "network attached storage" device. However, DroboShare is designed to work with Drobo to put it on a network. There are some advantages and disadvantages to this setup. 
  1. If you want to connect it to your computer by USB, you have that option, unlike with ReadyNAS (RN). You can even switch it up from time to time, without impacting your data.
  2. If you know that you don't need an Ethernet connection, you don't even need DroboShare, and can save the $200. 
  3. USB 2.0 is not the fastest way to connect a drive to a computer, FireWire is. Moreover, gigabit Ethernet can be faster than USB. Drobo is stuck with that USB bottleneck on speed, because it connects to DroboShare with USB.
However, my router -- Apple AirPort Extreme BaseStation -- can share USB drives plugged into its USB port. Therefore, I don't need DroboShare. Because my computers access the network wirelessly --being laptops -- USB is not the bottleneck. Rather, the slower speed of wireless networking is the bottleneck.  The only times that I might be able to take advantage of ReadyNAS's real speed advantage (i.e. gigabit Ethernet) would be when I plugged a network cable into my laptops. And to do that, I'd have to replace my router and the hub in my office with new gigabit equipment. For me, ReadyNAS's faster connections just don't matter. 

The second big technical difference is how each device handles expandability. This expandability is the coolest thing about each of them. When they close to full, you can add more drives until they are full. If you are already using all four drive bays, you can replace the smallest drive with a larger drive and get more space. Yes, they each  can use drives of varying sizes simultaneously! This means that you can just by the most cost effective drive -- by which I mean the least $/GB -- at the time. Later, when you need more space and storage prices have dropped further, you can buy another larger disk. They both grow as you grow, and each allows you to take advantage of the fact that larger drives become available every few months and the cost per GB keeps going down. 

However, there are some differences in how they handle disks of difference sizes.

ReadyNAS treats every drive as though it is the size of the smallest drive. This means that the extra space on the larger drives are ignored. When you replace the smallest drive with a larger drive, ReadyNAS will then use more space on every drive. For example, if it has four different drives - 100GB, 200GB, 300GB, 400GB -- it only uses 100GB on each disk. If you replace the smallest disk, say with a 500GB disk, it then checks what the new smallest disk is (in this case, 200GB), and only uses that much space on each drive. It's a very simply approach. If you have larger disks, it doesn't use that space now, but will use it later when other disks catch up. 

Drobo also ignores some space, but far less. Rather than ignoring space on all the drives, it just ignores space on the single largest drive. It ignores the extra space on the largest drive in excess of the size of the second largest drive. So, in the previous example, it ignores 100GB, because 400GB - 300GB = 100GB. Unless the three smallest drives are the same size -- and the fourth can be the same size or larger -- Drobo ignores less space than ReadyNAS, and it never ignores more space. 

Both devices essentially use one drive for redundancy, so that if anything happens to any of the drives you data is still safe. For Drobo, the spaced used for redundancy and the space ignored add up to the capacity of the largest drive. With ReadyNAS, it's a bit more complicated, but always more than than.

So, here's how it works out for each device in various configurations. 

Example 1: 
Drives: 100Gb, 200GB, 300GB, 400GB
           ReadyNas Drobo
Available   300GB   600GB 
Redundancy  100GB   300GB
Ignored     300GB   100GB

Example 2: Replace the smallest drive above (100gb) with a 500GB drive.
Drives: 200Gb, 300GB, 400GB & 500GB
           ReadyNas Drobo
Available   600GB   900GB 
Redundancy  200GB   400GB
Ignored     600GB   100GB

Example 3: Start with the most cost efficient drives available today, and add larger drives later.
Drives: 750GB, 1000GB, 1250GB & 1500GB
           ReadyNas Drobo
Available  2250GB   3000GB 
Redundancy  750GB   1250GB
Not used   1500GB    250GB

Almost regardless of your configuration, Drobo uses more of the drives' capacity. This means that, in addition costing less upfront, it will cost less over time for a given amount of available storage. Or, you'll get more storage for the same amount of money. And once you have filled all four drive bays, regardless of your configuration, if you replace your smallest drive with a larger drive you will get more space. Whereas with ReadyNAS, if your smallest drive is not the only drive that size (i.e. you have another drive the same size as your smallest drive, or even all four drives are the same size), you have to replace multiple drive to get more usable space.

Clearly, advantage Drobo. 

Drobo v. ReadyNAS II: Obvious Differences

Yes, I am talking about ReadyNAS (from Netgear) and Drobo (from Drobo). 

But which one should I get? Well, my previous post laid out what I was looking for. This leaves the question of which of these two products best fits my needs.

First, what they have in common:
  1. Both use multiple drive so that if a single drive fails, you don't lose any data.
  2. Both can be expanded more or less on the fly, simply by replacing smaller drives with larger drives.
  3. Both are fairly easy to set up.
  4. Both start under $1000.
  5. They are about the same size, each about 5" wide and 6" tall and 9" deep. (Drobo a little deeper and wider, ReadyNAS a little taller.)  
There are some important obvious differences, however.

First, ReadyNAS looks like it is geared more towards pros at home (i.e. real techies), whereas Drobo is aimed more at advanced users. ReadyNAS does more, has more features, is more powerful. Drobo is simpler. To me, ReadyNAS just looks like a windows product, both in terms of specs and appearance. Drobo seems more like an Apple product, both in appearance and simplicity.

Second, ReadyNAS is more expensive. The basic unit, without drives, is over $900. Drobo is around $450, plus $200 for DroboShare -- which you need if you want to hook it with Ethernet, as opposed to via USB. To be fair, though, it is only marginally more expensive to get a ReadyNAS unit with 1TB of storage in it, though this only lessens the price difference, and does not actually wipe it out. 

So, on the surface, if you want more features and power, get ReadyNAS. But if you want a prettier and easier to use solution, or you want to save money, get Drobo.

But that's not the end of the story. 

Drobo v. ReadyNas I: My Needs

DJ and I have had some major technology failures the past few months, and this had prompted me to revisit our computer storage.

What does that mean? Well, individual computers store stuff on their hard drives. File servers -- often just called "servers" -- store stuff on their hard drives, in a way that other computers can access them. While it is pretty easy to set up a way to share files between our computers, that is not the only issue. We also need somewhere to back up our data.

You see, if a computer has a problem, it can be a pain to copy data off of it. If the problem is the hard drive, the data can even be lost. Which is bad.

In my view, Apple's Time Machine (TM) -- part of the latest version of Mac OS X -- is a great backup solution. However, it's not perfect. At times, it needs quite a bit of storage. The other issue -- one I've long pointed to -- is that laptop users cannot simply plug in an external drive as easily as desktop users. The laptop might not even be in the same room as the external drive, and the laptop moves around, even when it is in the same room. Therefore, the shared storage/backup drive should be on a network. Now that Time Machine allows backing up across a network, I've been thinking about how I want to set things up.

(Yes, TM has supported backing up to another computer's external drive for a while, but we don't have -- or even want -- that desktop computer. Nor would we want it on all the time, sucking up power. Yes, TM works with Apple's Time Capsule, but that's not what I want. 1) It's largely redundant with my existing wireless router. 2) The cost of storage is too high. 3) The storage is not expandable. 4) The storage is a single source of failure.)

The list in that previous parenthetical paragraph forms the basis for what I am looking for.
  1. I want to hook up my shared/backup storage to my network.
  2. I am cost conscious about it, especially looking forward. I understand startup costs, of course. But I don't want it to be extra expensive to add storage later. And I'd rather not spend money to replace equipment I already have, unless I am really adding something new.
  3. I want the storage to be expandable. I've been using a serious of external drives to hold my backups, and I am sick of that. I don't want to have to look though multiple volumes to find something. Moreover, TM backup stores cannot span volumes. So, what ever I do must support expandable volumes.
  4. I want some sort of data redundancy, so a single drive failure does not mean lost data. You see, all hard drives die. The only question is whether you will still be using them when they do. If you are lucky, no. But if you are not, well, that really sucks.
These features strike me as really advanced. When I was working in IT, there were expensive solutions to handle all of this. You see, these are not new issues at all. The shocker is that there are consumer level products that can deal with all of this. 

Greeting and salutations...


It's about time that I got a blog, right? Here I go!

I expect that posts will be about education issues, technology & politics, primarily. There might be a little on cooking and the kitchen. Perhaps a bit on living in Brooklyn. And maybe even a little bit on my favorite sports.

A blog. The ultimate procrastination tool. But at least I'll be writing - though not doing much proofreading.