Auditing IT departments

This post is spawned from a need to write down at least a minimal set of procedures that get preformed over and over through my IT adventure. It typically starts on day one: I’m the new guy, and the IT department is in shambles, or no one outside of the old sysadmins head (who was mysteriously fired) has a clue as to how the whole thing is really put together. Where do we go from here?

For reasons untold, I always seem to be put in this position. Either brought in to document an unknown configuration or architecture, or brought in to fix and rebuild, or brought in to “go in another direction”. In the past I have stepped into IT nightmares, only to have to figure out (usually with moderate to severe resistance from the established IT team) just what they have and how it is configured (or more precisely, which strips of duct tape are really holding this place together).

This post will be a general outline of how to assess a current IT situation. It is by no means an end-all or be-all of IT auditing, and in fact there are many official, sometimes very bureaucratic methods of IT auditing. Again, this is not auditing in terms of items like HIPPA compliance or security auditing or network/systems pen testing. This is a basic “management has no idea what we have” or you are a new sysadmin and the old one left, and they left no real documentation for you to start from.

Without too much verbiage the following will be noted: this post can end up sounding condescending, or akin to a witch hunt. It is not, although it has been my experience that at the end of this exercise, procedures or people are changed or put in place.

First things first, just what the heck do we have?

This seems so simple and obvious, we have to quote Captain Zapp Brannigan:

Now, like all great plans, my strategy is so simple an idiot could have devised it.

Literally, this is plain dead simple. However, dear reader, you would probably find it shocking just how many IT departments do not have a firm grasp (or perhaps misunderstanding) of what they have. Physically as well as logically. Let’s begin the day by asking the current IT staff/sysadmin some straightforward questions:

  • Can we take a look at the inventory control system?
  • Tell me, how old is the oldest server we have?
  • Does the networking department have a list of all switches/routers/hubs/firewalls/IDS/virtual switches? In the same inventory control system?
  • Does the networking department have a list of all LAN/VLANs? How about SDN configurations? In the same inventory control system?
  • How old is the oldest switch we have?
  • When was the last raid array check performed?
  • When was the last time a disk was replaced? What server was it in?
  • When was the last server hardware failure?
  • When are your maintenance contracts up for renewal?
  • If I picked a random port on the wall, how long would it take for networking to tell me exactly where it goes (which switch+port, (v)lan, gateway, trunk and ACLs (if applicable)?

While it may seem odd to ask questions like “When was the last time a disk was replaced?” or “When was the last server hardware failure?”, it can give deep insight into accounting in your IT department. Departments that have physical infrastructure and are “on top of things” can easily pull up a sheet and say “18 days ago power supply 1 failed on server AA14-DC2”. Of course, we are assuming they would have some kind of inventory control in place for this. You may get responses like “we just have a notebook with server names in it” (hey, that is a good start), or “we have nothing, we just replace machines as they break” (hrmm).

The real point is, we have to start somewhere, and that somewhere is: we need a list of everything you have. You should have this list already, but if you do not, we need to fabricate one, with factual and accurate information. It is imperative that the information is factual and accurate. It should include the following items:

    • Servers. For each server list:
      1. Server name (and any aliases)
      2. Current IP address
      3. Function (e.g., https server or Hypervisor, or cluster member, etc)
      4. Precise Location (e.g. rack A-9, unit 27)
      5. Make
      6. Model
      7. CPU (type, speed, number of cores, cache)
      8. Memory amount (total and individual sticks + size
      9. Disk (size and number)
      10. Type of disk (SATA, SAS, NL-SAS, SSD, etc)
      11. Network (1/10 gigabit copper, 10/40/100 gigabit fiber, etc), and amount
      12. Any storage network adapters (Fiber Channel, iSCSI, FCoE, etc)
      13. Unit redundancies (power, disk, I/O, etc)
      14. Power requirements
      15. Cooling requirements
      16. Space requirements (e.g. 4RU, full 36″ depth)
      17. Purchase date
      18. Warranty expiration/renewal date
      19. Purchase price
      20. Annual maintenance cost
      21. Annual maintenance renewal date
      22. Manufacturer EOL date, if known
    • Spares/Graveyard
    • A lot of organizations have a spares or graveyard closet/rack. Thats OK, but what is in it?

      1. Servers
      2. Switches
      3. Parts (host bus adapters, memory, CPU, etc)
      4. Cables
      5. Software
      6. Test equipment

    • Networking (Physical)
      1. Switches
      2. Routers
      3. Hubs (if any)
      4. Firewalls
      5. IDS
      6. PBX/Asterisk/Telephony
      7. Phones (if in charge of them)
      8. WAP, WiFi Routers
      9. Storage Networks (Fiber Channel, iSCSI, FCoE, etc)
      10. Media converters
      11. Location of demarcation line(s)
      12. Location of all switch closets
      13. Cable and cable plants. (e.g. what is CAT{3, 5, 6, 6A}, OM4-MMF, etc)
      14. Test equipment

    • Networking (Topology)
      1. Overall Architecture (e.g. traditional access, distribution, core or two-tier collapsed core, etc)
      2. Maps, diagrams, drawings
      3. Physical Topology of each layer
      4. LAN overview. What classes are you using? Where?
      5. VLAN overview. Are you using them? Dump your VLAN database
      6. SDN Are you using SDN? Current configuration?
      7. If the Storage network is not unified with the ethernet, list it (Fiber Channel, iSCSI, FCoE, etc)

    For each item under networking (physical), you should have the following entries:

    1. Make
    2. Model
    3. Number of ports
    4. Speed of each port
    5. Precise location (e.g. closet V-24 or Ceiling tile VC-104)
    6. Current IP address
    7. Current system name (and any aliases)
    8. What device is on each port
    9. MAC of all static devices on each port
    10. ACLs
    11. Systems allowed to do SNMP or monitoring of this device
    12. Copy of current configuration
    13. Copy of current running software/OS
    14. Unit redundancies (power, disk, I/O, etc)
    15. Power requirements
    16. Cooling requirements
    17. Space requirements (e.g. 4RU, full 36″ depth)
    18. Purchase Date
    19. Firmware/Software version
    20. Update schedule
    21. Manufacturer EOL if known
    22. Replacement schedule
    23. Annual maintenance cost
    24. Annual maintenance renewal date

    The following should in some way drop out of the research that you have done above. You will still need to get more information, but this is so that the security information is in one location. So, put this in a new spreadsheet/database/etc…

    We now want to know what kinds of ACLs, permissions, etc surround the equipment and networks that we just listed.

    • Logical Security – Authentication
      1. How is user/device authentication performed?
      2. Where is the authentication server?
      3. Who has access to the authentication server?
      4. Who has access to modify authentication?
      5. What is the process for modifying authentication (technical and managerial?)
      6. How are accounts which are no longer valid handled? (e.g. user leaves/terminated)
      7. How are new accounts added?
      8. What is the process by which we validate changes to authentication?
    • Logical Security – Servers/Clients
      1. On servers, what are the firewall rulesets?
      2. On clients, what are the firewall rulesets?
      3. Is there a single global firewall that goes on servers or clients by default? What is it?
      4. How do servers authenticate users?
      5. How do clients authenticate users?
      6. How do applications authenticate users?
      7. Do servers or clients run any IDS?
      8. What actions are taken in the event of intrusion detection?
      9. What actions are taken in the event of repeated unauthorized access attempts?
      10. Are security audits performed on clients and servers?
      11. How often are security audits performed on clients and servers?
      12. What is done with the results of those audits?
      13. How is logging performed on servers and clients?
      14. What actions are performed, if any, on logs from servers and clients?
    • Logical Security – Networking
      1. LANs, what are your ACLs between LANs?
      2. VLANs, what are your ACLs between VLANs?
      3. What servers/hosts have access to the management interfaces of switches? Routers? Hubs? Firewalls? IDS?
      4. Are you allowing SNMP? Which servers/hosts can perform SNMP on which equipment?
      5. How is user authentication to networking equipment handled on the equipment?
      6. How is user authentication to networking equipment handled from a management standpoint?
      7. What is the password change policy on networking equipment? (frequency, constraints, creation, etc)
      8. What is the network change policy (e.g. change management)?
      9. Is logging necessary or being performed? To where? With what level of access?

    Next in the list should be the client/employee machines, labs, kiosks, signage. Wrapping up that should be Class or instructional rooms, auditoriums, etc (e.g. if IT does A/V, then A/V)

    Monitoring and Inventory Control

    Now is a good time to stop and assess your pile of information. You should be getting just to the point where you realize you have a large pile of information, and it is about to get out of hand. Simultaneously, you are now in a much better position to get a feel for what the IT infrastructure here looks like, and how the organization operates. This is useful, because now is a good time to start thinking about what kind of inventory control and monitoring system would fit in well with this organization.

    While this is a subject that can span papers or even books, this post will be very brief and only point out a few items of note:

    The Inventory control system does not have to be the monitoring system, but sometimes it helps. (e.g. server AB46 is down. Sometimes it is great if your monitoring system also has what AB46 is comprised of, when its maintenance contract runs out, what its service code or serial number is, etc, etc, as this can speed up mean time to repair).

    Why are we talking about monitoring systems here? Mainly as they can help us map out the infrastructure, and additionally aid in creating and maintaining SLAs with our customers and our groups (which are your customers, right?) Eventually you want a monitoring system for your entire IT infrastructure. You should not be without one, even if you are in the cloud (“hey, we moved to AWS, we don’t need to monitor, we can just have instances re-instantiate themselves if they fail!” — but even then, how do you know they are failing over and over if you are not “looking”?)

    There are many monitoring systems out there and many “asset control” and “inventory management” systems out there. My word of caution would be to ensure you are using the correct tool for the job. Many asset control and inventory management systems are very large and all encompassing. Some of them can cause more work than the problem they solve.

    Physical Infrastructure (physical plant)

    Now we should have a slightly better handle on the physical infrastructure we have. Note that we left out the actual physical infrastructure, although the location of each item should be noted. Depending on your situation, the physical infrastructure could be handled by someone else completely. Even in that case, we would like to know our contracts and SLAs with them. If your physical infrastructure is being handled by another division at your organization and you do not have an SLA with that division on your infrastructure, perhaps it is time to start having those conversations. Your physical infrastructure list should include at least the following:

    • Physical Infrastrcutre
      • Rooms
        1. Size
        2. Location
        3. Room number/designation
        4. Cooling (BTU)
        5. Power available
      • Racks
        1. Make
        2. Size (rack units and depth)
        3. Location
        4. Key number
        5. Room number/Designation
        6. Power available (Voltage and current, # of outlets)
        7. Cooling (BTU)
        8. Users who have access
      • Power Panels
        1. Panel Location
        2. Panel capacity
        3. Panel circuits
        4. Redundancy?
      • Battery Backup
        1. Make
        2. Model
        3. Unit locations
        4. Size
        5. Redundancy?
        6. Maintenance Agreements (cost, renewal date)
        7. Last battery change/next battery change
        8. Scheduled downtimes
      • HVAC
        1. Make
        2. Model
        3. Unit locations
        4. Maintenance Agreements (cost, renewal date)
        5. Maintenance schedule (filters, belts, etc)
        6. Scheduled downtimes

      Map it out

      At this point, we should have a better picture of what we have physically, where it is located, and what it is connected to (power, network, etc). However, there is an item missing which some would say is equally or more important. How is all of this stuff you have logically connected. You should currently have a list of server names and services on each server, but what we need is a map (probably more than one) so we can get a handle on what is doing what, and how they are connected.

      Side note: some may argue that this is not really necessary. In fact, there are a number of organizations I have visited where this map/drawing/diagram does not exist. These organizations were very difficult to work in, as most employees in IT kept asking each other where services were or what cable went where. While a good deal of this is made less relevant by VMs and more so by off-shoring your services to “the cloud”, it is still a very useful item to have for everything else (I would argue, for everything. If service B is in AWS, note that on your diagram).

      You are of course free to choose how to do this, but two maps (minimally, sometimes maximally) seems most useful. Those maps are:

      • The network map
      • The server+service map

      Lets look at the network map first. If you have gone through the above exercise of gathering information, you are mostly there. Since humans are visual creatures, a map of the network and server+services will help us quickly understand what we have. The actual act of building this map will greatly enhance our understanding of the infrastructure we have.

      In my experience, it is easier to start by listing out all of the equipment on the map, then go back and fill in the cables that connect the equipment together. Sometimes this can be quite complex with large hierarchical networks, but the map is worth the effort. If you know ahead of time what the network looks like (3-tier, flat, etc) it is easier to place objects on your diagram. Once you have placed all of the objects on the diagram (switches, hubs, routers, firewalls, etc) you can start to connect them.

      Suggestions: Use a tool that has many format options (for output files) and is easy to share. This may seem trivial at first (“doesn’t everyone use (or own) visio?” you might ask yourself), but ask around the organization first. Sometimes you can be asked to draw up all disgrams in visio only to find out the organization had only purchased one license (for this exercise!). Also, it is strongly suggested that you color code your cables on the map by type (serial, copper CATn, MM fiber, SM fiber, etc). Be sure to include a legend, there is no accepted standard for the color of a CATn cable on a map.

      Where are we now?

      At this point, you should have the following items:

      • List of all servers and their components
      • List of all services, and where they are housed
      • List of all networking equipment
      • List of where all networking connections go
      • List of all networks (wired, wireless, virtual)
      • AAA (Authentication, Authorization, Accounting)
      • Physical infrastructure
      • Building infrastructure

      What is missing from this list? Backups.

      Unless what the organization does is ephemeral and your developers keep it all in git and deploy from there, or you build it all out on the fly in AWS using scripts (ansible, or AWS native scripts), which is happening more and more these days, you will need backups of your “things”.

      Let us ask the current administration some questions to see where we are:

      • How often are backups taken?
      • What is being backed up?
      • Where are the backups stored?
      • How often is random backup restored and verified?
      • How is bit rot handled?
      • How are backup rotations handled?
      • Are there long term backup/storage needs? How are those handled?
      • Who is authorized to delete a backup?
      • How are backups deleted?
      • How are backups secured?

      From this list we can see where this is going. We need to answer these kinds of questions, and also ask these kinds of questions of ourselves when we implement a backup strategy (not to mention the penultimate question of “what should be backed up?” Not to beat a dead horse, but with the recent advent of containerization and cloud services, much of what would normally be backed up is no longer necessary and the configuration of those services is the item that should be backed up (for example, who cares about backing up AWS services “X, Y, Z” when an ansible script builds them out and a deployment service deploys the software from a backed up repository to those services? Obviously, only the build out scripts and deployment scripts need to be backed up, as well as the code repository (and often times it is already in a state where it is being backed up).

      On a personal note here, however, more and more, software is relying on other software that is pulled in from outside repositories, some of which the software authors do not own. It may be prudent for developers (and administrators) to ensure that those repositories are backed up, and perhaps a local copy (local to the software developer) is kept.

      Now that we have this, there is really only two items to take care of (albeit, they are huge, sometimes multi year projects):

      1. Organize this infomration and store it
      2. Analyze this information for:
        • Upgrades
        • Maintenance
        • Rebuilds
        • Refactoring
        • Re-architecting
        • Decomissioning
        • New installs

      Those two items are for additional posts. This post was not meant to be an all-inclusive post on auditing (from a technical standpoint), however, I hope that it gives ideas and a basic framework of what data you might want to collect when auditing an IT infrastructure.

This entry was posted in Architecture, Networking, Servers. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *