Tuesday, January 29, 2008

Arbitrary vs. Random

I bring up this point to a lot of people, and a lot of people find it confusing. Are random and arbitrary synonymous or aren't they? I admit I'm not always entirely sure how to describe the difference between the two words, but since I insist people use them appropriately I should probably offer an explanation.

Let's start by defining them both. Arbitrary, according to answers.com, describes things that are "1. Determined by chance, whim, or impulse, and not by necessity, reason, or principle" or "2. Based on or subject to individual judgment or preference." Random, on the other hand, applies to things "1. Having no specific pattern, purpose, or objective" or "2. Mathematics & Statistics. Of or relating to a type of circumstance or event that is described by a probability distribution." I'll limit the discussion, at first anyway, to the two most widely used definitions.

What we can gather from these definitions and what we know about the words is that they're both used to describe a selection process. Decisions can be made arbitrarily or they can be made randomly. How, then, do they differ? It comes down to the outcome really - an arbitrary decision is one that doesn't matter either way. The outcome of repeatedly making a decision about something whose outcome doesn't matter can, and oftentimes does, lead to the same decision being made over and over. A good example is asking people to select a "random" number between 1 and 10, inclusive. It's been demonstrated that, usually, 7 or 3 will be chosen more frequently than most of the other numbers given a large enough pool of participants. The reason for this is that people perceive these numbers as being random: they're prime numbers, they aren't even, they aren't right in the middle, and they aren't extremes. All of that makes them very appealing as "random" choices. In reality, a random selection gives each possible option equal weight. That is the basis for a random selection as opposed to an arbitrary one. What follows is, given a large enough pool of samples, every number between 1 and 10 should, statistically, be selected as much as any other number (give or take a bit). A human's bias in the selection process is what removes the randomness.

Going back to the definitions, one might argue that an arbitrary choice can also be random because there is no "necessity, reason, or principle" behind random selection. In fact, there is. It's all in the rhetoric here, which can make it seem tricky. I would counter that argument by reminding the person that random selection necessarily gives all options equal weight. Another argument that might come up is that random numbers have "no specific pattern, purpose, or objective" and that neither do arbitrary ones, which are chosen on a whim. That argument I counter by saying that arbitrary selections do, in fact, have a pattern as I mentioned in the example above with the numbers 3 and 7. Note that there is still room for debate here (there always is), so if you can think of a good argument let me know.

So what about randomly running into a friend at the store? Or all the random people who showed up to the party? Most of the time, random isn't used to describe a selection process at all. Instead it's used in place of more appropriate words such as coincidental, assorted, or unexpected - none of which are random. However, I have to concede that this is the nature of language: if a word or phrase goes into widespread use it can become part of the language. This is still considered slang or colloquial, though, and should be treated as such.

To summarize, for those who need to explain this to others: Arbitrary describes a decision-making process in which the choice simply doesn't matter or is made on a whim. Random, on the other hand, describes a decision-making process as well, but one in which every possible option is given equal weight.

Labels: , , ,

Friday, January 18, 2008

Real vs. Perceived Identity

The issue of real vs. perceived identity came up in a conversation I was having today. When I talk about identity here I don't quite mean it in the normal sense of the word. The way I'm using it here makes it almost synonymous with understanding. Hopefully you identify with what I'm saying :). Getting back, I've got a definition for perceived identity - your identity to another person based on the way they perceive you - but I'm not sure I've got anything for real identity. The first idea that comes to mind is a person's view of themselves. After all, they've got a complete picture, right? I don't know if this works just because of all the personal biases involved - it would take a very humble or very objective person to give themselves a fair evaluation of character. Does that bias matter? My next inclination is to say that real identity would be another person's view if they knew everything there was to know about the person being identified - but that seems wrong too. That person is bound to have their own set of biases and, though they might not have any personal interest vested in this other person, they're sure to relate or disagree with certain things which will skew things a bit. Ideally, this identity would be the same regardless of who's view it was given they had all the information they needed. Maybe real identity, then, is just an abstract idea to talk about but something we might never achieve. I can deal with that I suppose. Perceived identity is much more practical if we assume that that's the case since it's the only thing we can actually measure or use. Of course everyone will perceive things differently but at least now we've got something to work with and opinions to compare. Another intersting idea to mention is that we might be able to say whether a particular perceived identity is or is not someone's real identity. Or maybe we'll just be able to say whether it isn't but never that it is? It certainly isn't, or it might be? Hmmm. I think, as usual, this takes us back to what exactly we mean when we say "identity." Since I think that might undermine the idea behind this entire post, I'll save that for another day. Thoughts?

Labels: , , ,

Wednesday, October 31, 2007

Wisdom

Wisdom showing wit alone
is worthless to this world.
This wisdom must be fed and grown
in order to unfurl.
For Steven, Albert, Thomas
did not sit idly.
They tried and asked and then alas
became what we now see.
Intellect requires knowledge
which from experience comes.
And all who try will gain an edge
o'er the foolish and the dumb.
“Be not afraid,” parents often say
to children in their youth.
While they're young they should be made
to be (moderately) uncouth.
So try and live and always give
the best you poss'bly may,
For those who die and never live
do their own selves betray.
Hawking, Einstein, Edison
would disapprove inaction.
Their effort, brain, and a bit of fun
Has formed their tiny faction.
Once again reiterating
for wisdom's sake and yours,
All these things are intertwined
Lack one and all are lost.

Labels: , ,

Saturday, October 13, 2007

Solaris File Server (w/ ZFS)

Preface
Building a terabyte file server (very predictably) proved to be a great learning experience and an exercise in hardware buying decisions and selecting the right software to run. My goal here isn't to provide a step-by-step how-to for building a home file server, but instead to provide insight in the decision-making process involved in such a project.

The Hardware/Software Relationship
When most people are buying PC hardware, they typically don't give much of a thought to the type of software they'll be running on it. Usually, though, the software they buy doesn't rely much on the hardware and as a result everything works out just great regardless. I faced a pretty unique situation in deciding which hardware to purchase for this server, especially when the financial constraints of a college student were factored in. Should I buy a low-end motherboard with relatively few features and use add-in cards wherever necessary? Should I stick to older hardware so that I could use components I already own to save some money? Should I buy a RAID card, or would I be using software RAID? Or should I just make sure to get a motherboard with on-board RAID? Obviously, these are only a few of the questions I faced going into this. What I found is that clearly defining goals will save you a lot of time when it comes to deciding which hardware to buy - and so I set my goals. I wanted all-new hardware so that upgrading and repurposing the guts of the machine would be feasible in the future. I wanted relatively low power consumption wherever possible but the ability to turn on the horsepower when I really needed it. Obviously, I wanted an obscene amount of storage to keep everything on. Lastly, I want this thing to be as resilient and robust as possible in the face of catastrophic hardware failures (and easy/inexpensive repairs would obviously also be nice) so I chose software-based RAID to keep it independent of the hardware.

Solaris and ZFS
To be totally honest, I was already sold on ZFS before I made any of the hardware buying decisions; any geek would be considering that Sun has put together what just might be, as they've termed it, "the last word in file systems." The ease-of-use, guaranteed data integrity due to checksums, reliability, speed, efficient use of disk space and disks, etc all really had me sold on ZFS as a file system and that really was the main factor that ultimately drove me to make the hardware choices I did.

The Hardware
Here were the final specs:
  • An old Lian-Li case capable of holding at least 8 3.5" devices. That's no joke.
  • Old PCI video card
  • EPoX EP-MF570SLI AM2 Motherboard
  • AMD Athlon64 X2 4000+ Brisbane
  • 430 watt Thermaltake PSU
  • 2x1 GB of DDR2 800 from Transcend
  • 80GB Seagate drive
  • 6x500 GB Western Digital Caviar SE16 (WD5000AAKS)
It should be noted that all the hardware excluding the 500 GB hard drives and stuff I already owned cost me <$300. After hard drives, the whole system came to just under $900. I settled on not buying a new enclosure because the one I already owned had no practical limit on the number of drives I could store. Besides the 8 3.5" bays, I had 2 5.25"-to-3.5" bay converters just in case. Onboard video would have been ideal, but unfortunately most high-end motherboard manufacturers don't bother with it. Since I planned on running this thing out of a closet and SSHing for administration, the less power the better. I opted for an old PCI video card I had sitting around just to satisfy the requirement. The motherboard I chose was a beast, to put it subtly. This board has PCI and PCIe (including support for SLI, which is nice though I'll obviously never use it for this purpose), dual Gb NICs which has been one of the nicest features, 8 SATA and 2 PATA connectors, dual-channel RAM support, and a bunch of smaller conveniences that make for a nice package. The complaints? 2 of the SATA ports and 1 PATA port require drivers to work since they're tacked onto the chipset as opposed to being part of it. In Solaris, this means they might as well not exist. This board also has two fans to keep it cool, one for the chipset and one hanging out off the I/O plate that can be disabled (but that I have enabled regardless). Even though this is going into a closet, I'd be more comfortable knowing it didn't need those two fans there. The fewer the better. Lastly, onboard video, while too much to expect from a high-end motherboard, would have made this thing perfect. For the $80 I paid for this thing (open box), it can't be beat. The AMD processor I chose was dual-core, something Solaris could easily take advantage of given its heritage, $65, which was a perfect price-point, and was rated at 65 watts. Throw in overclockability in case I'm ever in the mood, and you've got a winner. The PSU was selected because it was a) $40 and b) very highly rated with over 1400 reviews on newegg.com. Normally I wouldn't be so cheap with a power supply, but this offer was too good to pass up. The RAM, too, I got lucky with. DDR2 800, very fast timings, and it didn't use more than the standard 1.8v for DDR2. The 80 GB Seagate drive and optical drive were things I had lying around, and the rest of the drives were the cheapest I could find online that weren't refurbs. As a bonus, all reviews indicate they're excellent quality drives.

The Software
I've already said a little bit about Solaris and ZFS, but I haven't even begun to do them justice. I'd heard through ZFS from a friend and done my own research and it was love at first sight. Problem was, I'd never worked with any version of Solaris before and support for those who aren't Sun's customers is relatively sparse compared to certain Linux communities (Ubuntu comes immediately to mind). Also, this needed to run well and run for a long time with occasional changes and easy recovery in case of failure. I was already sure Solaris would work with the hardware since someone mentioned in a review that they'd gotten it running just fine. So while I waited for my hardware to arrive, I found a few tutorials online and worked through them on a Solaris virtual machine I'd installed on my Windows desktop. Turns out ZFS was even easier to use than everyone made it out to be, and the only thing I wasn't really confidant about was configuring Samba for sharing on the network, but I didn't let that phase me. How hard could that be?

Worst-Case Scenario
It didn't happen, but what if? What if everything came crashing down? With software and hardware decided upon I began wondering what my chances of losing data would be in the event of catastrophic failure. Let's see. If everything except my hard drives went up in flames but somehow left the drives themselves untouched, I'd be just fine. Pop the drives into another machine and import the ZFS pool. Done. So failure was isolated to the drives. I let a friend convince me to run the drives in a raid-z configuration as opposed to the raid-z2 that I'd originally planned. The extra data security wasn't worth the 500 GB it was costing me, we decided. If any one of the drives failed it could be replaced no problem. Give the pool some time to resilver once the new drive has been put in place and we're ready to go again. What about multiple drive failures? Hard drives suffer two types of failures - electronic and mechanical. The platters are stored inside an almost totally airtight tomb, with the electronics exposed to the world. Say we had multiple failures of the drives' electronics because the drives got wet... I did a thought experiment and found that as long as one drive had a working circuit board, I'd be fine. I'd be out a lot of money in replacing lost hardware, but I'd be just fine. I could use that one working circuit board on each drive to clone the data to other 500 GB drives and get the array back up and running. What about multiple mechanical failures? Well, then I'd be totally screwed. The chances of this happening? I didn't bother with probability calculations, but my guess is that it's significantly less than my winning the lottery twice in a week. And I don't play the lottery.

Conclusion
Overall, the operation was a success. I did run into an annoying bug in the version of Samba that ships with Solaris 10 Update 4. but once I realized what was happening it wasn't very difficult to work around. 2.5 TB of usable storage with all the data integrity of raid-z and ZFS, and the reliability of Solaris.

Labels: , , , ,