Some useful info on implementing DBaChecks;

This post will detail the SQL database/infrastructure tests we have configured on our data warehouse environment using dbachecks, the open source PowerShell module.

I will just say I think dbachecks is awesome.  I’m pretty grounded and not known to hype things up but if anybody is questioning if they should be using the module I would strongly encourage them to do so.  There is a bit of a learning curve if you aren’t used to PowerShell but it’s totally worth the effort and it is possible to be so much more proactive in terms of checking some of the basic things that can go wrong.

Anyway, some background to start with.  I am a database administrator and developer, I split my time roughly 50/50 between the two disciplines.  Our data warehouse environment is on-premises SQL Server 2016 and we do a full ETL load every night which takes around 4 hours to process.  This includes restoring copies of transactional databases onto the DW server, running staging ETL, DW ETL, processing SSAS cubes etc, so it’s quite an intensive workload particularly in terms of IO and CPU.

What made us decide to implement dbachecks?  We have considered using it for several months but various high profile and short turnaround projects have meant that lately we haven’t dedicated much time for this kind of thing.  About a month ago we had an ETL failure which dbachecks would’ve given us ample warning of had we already been using it:  As part of our staging ETL we have a couple of IDENTITY columns which we seed from 500 million to create uniqueness from a legacy system which we are still required to derive history and report from.  One of these IDENTITY columns hit it’s maximum value which caused the ETL to fail partway through.  This wasn’t a problem for the source column within the transactional system which is seeded from 1, and isn’t likely to be a problem for a few years.

Moving onto the exciting stuff, here are some details about the checks we are currently running on a daily basis:

The Ola Hallengren maintenance tests; we are using all apart from log backups because all databases on our DW server are in simple recovery mode.  We do take backups of some databases but not all (because some are restored from backups taken from transactional systems).  So we’ve had to customise our implementation so as not to receive failure notifications about database backups which we know we aren’t taking.  I will add some detail about how we did this in part 2.

We are only running two of the Agent checks at the moment (AgentServiceAccount and ValidJobOwner) but we will add others later, maybe as infrequent checks.  We probably wouldn’t configure FailedJob because some of our jobs routinely fail, we are notified separately and we don’t want these failures appearing on any reports.  When we implement on transactional systems it is likely that we will configure FailedJob because we expect all jobs to succeed on OLTP

Here are a list of Database checks along with a description of why we chose to implement:

DatabaseStatus We have around 30 databases on our DW environment and we want to know if any of them are not accessible or having some kind of problem.
IdentityUsage As previously described, this is the test which would’ve avoided the ETL failure caused by IDENTITY column hitting the maximum for the data type.
LastDiffBackup We take nightly diff backups of some (but not all) of our databases on DW, and for those that we do we want to know if the last diff backup was within the last last 24 hours (except weekends when we do full backups).
LastFullBackup For databases which are being backed up, full backups are taken every weekend.  We want to make sure that the full backups are never older than 1 week.
SuspectPage We do not want any suspect pages in any of our databases

Here is a selection of the Instance checks we are running.  There are so many that I haven’t had time to list and provide comment on them all, but I may edit this post to provide the full list when there is time:

ADUser We want to check that our AD user accounts are mapped to users which exist and aren’t locked etc.  This may give a useful indication of whether domain service accounts are locked out (which happens occasionally)
BackupPathAccess As I found out last week you never know when somebody will unshare or remove permissions from a folder dedicated to storing backups, so this is a handy check to perform.
NetworkLatency We don’t want query or ETL load performance to be constrained by unacceptably high network latency
OLEAutomation We want to make sure that OLE automation is disabled for security reasons, it is not required for any of the databases on DW server
OrphanedFile Implemented to keep things tidy, and to avoid having large orphaned database files taking up space unnecessarily
ErrorLog Handy to know if we have any severity level 17-24 errors which need investigation
LinkedServerConnection To keep our long list of linked server connections tidy, as we aren’t always informed when another server has been renamed or decommissioned.

Finally here are some of the Server checks we have configured:

DiskCapacity Our data centre team are already monitoring disk capacity but we want to have some visibility for ourselves, to track capacity over time and compare with growth of individual databases (which we are capturing separately).
InstanceConnection We want to know that superficial health of the instance is okay and that the server is responding to ping requests.

This was just an overview of what we decided to test with dbachecks on a daily basis.  We are currently looking at what checks we may want to run more infrequently (maybe once or twice per month) and again I may come back and edit this post at some point.

Part 2 will focus on how we implemented dbachecks on our DW server, including some things which tripped us up and that we need to remember for implementations on other environments.

Hope this is helpful, if further information is required please feel free to contact me.

Following on from my previous post about what we decided to test, here is an overview of how we implemented dbachecks on our data warehouse environment…

Step 1 – Install dbachecks and run a few tests

You only need to install dbachecks (and dependencies) on the machine you will be running the tests from.  So if you will only ever run one-off tests directly from your desktop machine then you only need to install on your desktop and not the server.  If however you want to schedule your tests and have the data piped out to a database for later analysis/reporting/alerting then you will probably want to install dbachecks on the server.

Chrissy LeMaire wrote a post on the dbatools website which goes into detail about methods of installation.  Our corporate firewall won’t allow us to use Install-Module so we had to opt for an offline install, which involved downloading required modules from GitHub and importing from there.  If you are in the same situation then you will need to download four separate modules from GitHub:

We chose to download only from master branch for each of the modules.  Download zip file, right-click the downloaded file and choose properties > Unblock.  Now extract contents of the zip files to their own folders in C:\Program Files\WindowsPowerShell\Modules.  Make sure to rename the folders to remove the name of the branch (i.e. “-master”).

Now, for each of the folders you have downloaded and extracted, you will need to import these modules one by one.  To do so open a PowerShell window and type:

 import-module -name psframework

Due to dependency I had to import modules in the order listed above.  Also, and this was the first learning point for me, the .psm1 module file needs to be in the root of each folder before you can import, but for psframework this is not the case.  At the time of writing you need to use the “PSFramework” subfolder which contains the .psm1 file, rather than the “psframework” folder which contains “build”, “library” etc.  This sounds like a really silly and basic thing to get stuck on, and I think the error message is pretty self explanatory, so maybe my fault for not reading.

You should now have a working installation of dbatools and dbachecks, and you can verify by running a few ad-hoc tests.  Here is the first check I ran to make sure we didn’t have any more IDENTITY columns about to max out on us (which luckily we didn’t):

Invoke-dbccheck -sqlinstance SQL01 -checks IdentityUsage

Step 2 – Decide which tests to run

We looked at each of the checks and decided which ones we wanted to run for our server, how frequently, and with what configurations.  To start we ran the following which gives a nice output of all the available checks:

Get-dbccheck | ogv

We copied and pasted these results into a spreadsheet to make it easier to build our list of requirements.

Step 3 – Configure your tests

I won’t go into detail about how to set configurations for each check per environment because Rob Sewell has done a thorough job of that here.  But you need to figure out which configs are applicable to each check and then decide what the values for each config should be.  I read another blog post about an easy way to do this but I can’t find it now  =(

The second thing I stumbled over was due to my excitement/haste to get everything setup.  Don’t just login to the server using your admin account and apply the configurations because they will only apply to the login you used to apply them.  If you setup an agent job (which should hopefully be running under a different, dedicated service account) or somebody else logs in to run checks then they won’t be ran using the configurations you have applied.  You will need to either login to the server using the agent service account and apply the configs, or read about SQL Authentication in Rob’s post via the above link.

Step 4 – Schedule the tests

When happy with the configurations we wanted to schedule the tests to run regularly.  There is a really great post here about different ways of scheduling, the short version of which is to configure the step as CmdExec step (not PowerShell) within agent, and do this:

untitled2 - Copy

 

“C:\dbachecks\dbachecks.ps1” contains the Invoke-DbcCheck command.  And C:\ is the local C: drive of the server, if this isn’t obvious.

We’ve got a bunch of tests running every day and there are some others we are thinking about running on a less frequent basis under a different job.

Step 5 – Output to table and alerts

Rob wrote another post here about outputting to a table for later analysis.  We took what Rob did and tweaked it slightly, because we had to do something a bit different due to the way our backups are configured.

The databases on our DW server have different backup regimes.  Some are not backed up at all (because they are restored every night from backups of transactional systems before being ran through ETL), some just have a full backup taken every day to make it easier to share the database with authorised partners.  In some cases we take a full backup weekly on a Sunday then a differential every day except Sunday.  As you can guess we were getting quite a few failures which for the DW server we fully expect and don’t really care about.  There may be a way of doing this using PowerShell, but we found it easier to split these checks into different steps so they can be ran against specific databases, this is what our job steps currently look like (you can probably guess what goes into the first few):

untitled3 - Copy

 

The fifth step makes sure that all of the checks come under the same summary ID (among other things).  That may seem a bit fussy but it was required to run the last step which fires off an email to us if there are any failures – it just uses sp_send_dbmail which we use all the time for reporting data validation issues, it’s simple and it works for us very nicely.

If you’ve made it to the end of this post then congratulations!  I realise I haven’t shared some detail about the revised logging to table or contents of job step 5 (above) because this would’ve made it even longer, but happy to share this stuff if anybody would find it useful.