The SA'r

November 28, 2009

Network Management – The Art of Avoiding Outages

Filed under: Uncategorized — hex45 @ 12:46 am

“That which is not monitored is not managed.” – A wise System Administrator

Failure is a part of life.  This is especially true in the world of IT.  It is not a question of if, but when.  The key to successful enterprise management is to know when things fail.  This can only be accomplished through monitoring.  The name of this art – Network Management.  Well, actually it is more than network management, but I guess that is the title it gets because of its roots, kind of like we still say we are dialing the phone.  A better name would be – Enterprise Management, which is starting to catch on, but the old IT folks won’t understand.

Monitoring the Enterprise

If a tree falls in the forest and no one hears it, did it really fall?  Well, I don’t know the answer to that, but I do know that if a system fails, someone is going to hear it.  The goal of the IT staff is to be the first one to hear it (or better yet, know that it is going to fall/fail).  Nothing is more painful than having your customer point to a fallen tree and ask you if you heard it.  You must listen to your forest (Enterprise).

Referring back to the Enterprise Architecture post, I view systems as collections of services.  Going a bit further we will note that services are composed of components.  These components are computers, switches, storage (SAN, NAS), and a bunch of other stuff.  Therefore; systems are composed of services, and services are composed of components.  And, the collection of our systems, services, and components is our Enterprise.

So, what do we monitor?  Simple, as much as we can – systems, services, and components.  To simplify this discussion, let’s look at this in terms of levels; Systems, Services, and Components.  At the top level, Systems, we are checking its functionality.  For example, if the system was a website, we could perform an HTTP get to check the functionality of the site.  For more detailed monitoring we might craft a special HTTP request that would exercise the services that make up the site.  The data returned from this HTTP request could then be analyzed to determine if the site was operating normally.  In the case of Services, well, basically we are doing the same thing, keeping in mind that services are systems.  So looking at the website again, we connect to its database and run some queries to gather status information.  For the Components we can use SNMP get a whole variety of data.  In the case of a computer we might collect CPU data, disk information, memory usage, and more.  For switches and router; system performance (CPU and memory), port information (usage, up/down status, error counts, etc.), and routing data (updates, errors, etc.).  The more data we collect, the more likely we are to spot issues.

Relationships -

The point is that at each level we are collecting data that will be use to determine the operational status of all of the parts that make up a system; as well monitoring the system itself.  Why not just monitor the system?  Monitoring all of the supporting services and components allows us to quickly address the actual cause of a problem.  If we know what caused the problem, we can fix it.  The website is down is not enough information.  Is the actual problem the server, switch, load balancer, router, database, firewall, or the user?  Not knowing leaves a lot of things to check.  If we are monitoring all of the supporting services and components we will know what is wrong and our efforts to fix the problem can be focused on what is broken.

When we monitor the Enterprise in terms of its architecture, we do not really need the “system” level monitoring, because we will be monitoring all of the services that comprise the system.  Well, ok this is not really completely true.  The point is that in most cases, if we are monitoring components and services, problems that affect systems will be identified at a lower level.  And if the relationship of services to systems is know, and maybe even incorporated into or monitoring tool(s), we will understand why a system is down based on lower level issues.  This is the goal – service/component level monitoring of the Enterprise systems.

Reality –

All right, here is the truth, you can never get there.  The problem you run into is Zeno’s paradox of Achilles and the tortoise.  Once you monitor half of the stuff, there is still half to go.  You can get close, but close is all you can do.  That being said, it is well worth the effort.  Most basic stuff can be addressed quickly and easily with open source tools like Zenoss and Nagios.  And as your monitoring solution matures it will become more effective.  Just keep in mind that the task is never done.  The more you monitor, the less you will miss.

So what are you waiting on? Start monitoring.

- Carl

November 11, 2009

What is Enterprise Architecture?

Filed under: Uncategorized — hex45 @ 5:20 pm

I typically try to avoid “buzz” terms like Enterprise Architecture, but sometimes you just need a term.  This is one such case.  It is either use a term, or paragraphs of explanations.  This is intended to be the paragraphs to define the term, Enterprise Architecture.

Quote_EA

The term Enterprise Architecture (EA) is not well defined.  A quick look at Wikipedia makes this fact more than evident.  There are several competing interpretations of EA.  But all of these interpretations have a common element – The structure of components to address the needs of a larger system.  Typically in these definitions the larger system is the business goals or processes.  However, for the IT staff, business goals are not really the system they consider.  And if they say that it is, look for their ITIL coffee cup and it will probably be full of ITIL Kool-Aid.  No, for the IT folks, Enterprise Architecture is all about services and systems.

Before I give you my definition of EA, I need to clarify a couple of things.  A service is work that is offered by a system to other systems.  And a system is a collection of services.  Clear as mud, right?  No.  How about an example?

To make this easy, I am going to use a single server running a web-server.  The service in this example is a web site.  But the web-server application cannot functions without an operating system (OS) and a network connection.  In this example, the OS and network are examples of services.  The system that is composed of those services is the website.  The example can also extend to the web-browser that uses the website, which is also dependent on an OS and network.  It is interesting to note that the web-browser is dependent on web-server as well and can be viewed as a system, but that is going just a bit too far.

So, what is EA? – Enterprise Architecture is the view of the enterprise from the perspective of services and how those services are used as building-blocks for systems.  In other words, the system-level view of the Enterprise from the perspective of the services, with a focus on how those services are structured into systems.

Back to the web site example, but with a bit more complexity.  The web site is actually running on several servers with a MySQL cluster for data storage.  The network is composed of load-balancers and multiple switches.  In this example the system is the collection of a database service, web service, and network service.  Looking at the database service, it is also a compilation of services; networking, OS, etc.  So, systems are made up of services, and services are typically systems.  Simple.

So what is the advantage of Enterprise Architecture?

When we start viewing the Enterprise in terms of services there are some distinct advantages.  The first is sharing of services occurs naturally.  As new systems are implemented, connecting them to existing services simple happens.  This results in better structure in our solutions and a reduction in costs due to the sharing of resources.  Another advantage is that management of systems can be focused at the service level.  Instead of managing a web site, we manage the network, databases, and OS; we manage the services that comprise the web site.  The significance of this is that when problems arise, we are not trying to fix the website; we are fixing the service that is broken.  This results in faster restoration of services.  There are also advantages in terms of reliability, repeatability, and maintainability.  Once we shift focus from the systems to the services, the management of the Enterprise improves.

The goal of Enterprise Architecture is to improve IT operations.  Fuzzy term, but a positive goal.

- Carl

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.