slideshow 1 slideshow 2 slideshow 3 slideshow 4 slideshow 5

Welcome

You've found davewentzel.com.  I am an independent consultant and principle with David Wentzel LLC, focusing on anything related to data.  Here you'll find a collection of my thoughts, mostly on data architecture and SQL Server.  I host a blog as well as a Features section with searchable content.  I also showcase a Portfolio of open source projects I am working on, especially Drupal integration.  I host some family media to showcase some of those skills. Lastly, I have done quite a few presentations on SQL Server topics that you may be interested in.  

Data is one of your company's most valuable assets.  We can help you most the most of it.  

Service Broker Demystified Series

Hypothetical questions:  

  • What are the three most important features added to SQL Server in the past 15 years (since SQL 7)?  
  • What will be the most important feature of SQL Server in the coming years?  

My answers:

  1. DTS/SSIS (especially the wizards): the ability to extract and load data without writing any "data marshalling" code allowed companies to migrate their old Access applications to SQL Server quite easily. For years I had tons of side work migrating applications from Access or FileMaker to SQL Server. The tool of choice is DTS/SSIS.
  2. The DMVs: I wrote about this recently. The ability to see what the server subsystem is waiting on is the key to understanding the performance characteristics of your application.
  3. Service Broker: The ability to perform asynchronous data eventing is going to be HUGE in the coming years. Also, the ability to let your app do asynchronous data work without the user waiting on an hourglass, and without additional asynchronous client-side coding, is the key to the "perception" of good application performance. Further, I'm convinced that Service Broker is going to be the tool to bridge the divide between SQL Server and NoSQL solutions. Some of the biggest architectural decisions with NoSQL solutions is sharding data and the asynchronous nature of mappers and reducers. Service Broker (SB) will likely be the underlying technology that allows Microsoft to finally give us true sharding in SQL Server. Also, how better to implement a mapper than as an asynchronous SB message to another SQL instance?
Everyone's favorite features list will be different  Perhaps you like the HADR features of the last few releases or even the in-memory table offerings.  To each his own.  We all have our own unique experiences.  This is my list.
 
I feel that not only has Service Broker (SB) been revolutionary in the past 9 years or so, but I obviously feel it is only going to get more important in the future. Anecdotally, SB seems to scare people.  Perhaps the syntax is too esoteric and not very "TSQL-like".  Or the lack of a GUI scares people away.  Or maybe people feel asynchronous eventing is best left to JMS/MSMQ or another tier and not in the db (if so I suggest you google "SODA" which is SOA with Data in the middle).   I've found two issues that scare people the most:  there is a lot of new jargon which are unfamiliar to data people (contracts, messages, services) and people don't understand the little idiosyncracies of SB.  And when you start to understand them then you'll find SB is actually quite intuitive.  
 
This post is the introductory post for a series I'm going to do called Service Broker Demystified.  This isn't a SB tutorial.  You won't learn the basics here.  I assume you know the basics already and just want to understand better some of the esoteric stuff that trips people up with SB.  Here are the topics we'll cover (these links may be bad until I get the posts finalized, check back often):  
 
Service Broker Demystified - Contracts and Message Types: people get confused as to what the difference is between a contract and message type. Then they don't understand why both are needed. The fact is, like foreign keys, neither is REQUIRED, but both will help you find bugs and ensure accurate workflows if you use them. Just like FKs help to ensure accurate data.  
 
Service Broker Demystified - Why is there no ALTER CONTRACT statement? There isn't, get over it. DROP and CREATE if you must change a contract. We'll cover WHY it works this way.  This is a good thing.  

Service Broker Demystified - Why do we need Services and Queues? Why do we need both?  There always seems to be a one-to-one relationship with services and queues. So, why did Microsoft give us both?  To just confuse us?  No, there are cases, albeit rarely, when it is helpful to have something other than a 1:1 relationship.  

Service Broker Demystified - Services: How can you determine if a service is an initiator or target? How does the DEFAULT contract change all of that?  Why should we even care if a service is an initiator or target?

Service Broker Demystified - [Initiators] and 'Targets': Why are initiators always placed in identifier brackets and Targets are specified as quoted string literals?  Once you understand this you'll start to see the bigger possibilities of Service Broker.  

Service Broker Demystified - Case-Sensitivity: Some SB objects are case-sensitive. Which ones?  Shouldn't they follow the case sensitivity of my server or database?  No.  Only "exposed" SB objects are case-sensitive.  This post will cover why.  

Service Broker Demystified - CLOSED conversations: yeah, they seem to hang around for awhile. Why is that?  

Service Broker Demystified - Must my SQL Server be wired to the internet? I've heard this question at least 3 times. People think their SQL Servers must have internet connectivity in order to use Message Types. Huh? It's the goofy SB syntax that is confusing them.  No internet access required.  This post will cover this confusing topic.  

Service Broker Demystified - How to Radically Simplify SB: Service Broker can be confusing. I've found that most implementations of SB I've worked on followed a handful of design patterns. If you create a "harness" that handles YOUR unique design patterns, you can simplify your life greatly.  I'll show you my harness that has saved me countless hours and headaches.  

Upgrading Vertica

In my blog posts Installing Vertica 7 and Vertica Installation Prereqs I covered installing Vertica 7.0.1 Community Edition.  Since then Vertica 7.0.2 has been released as well as 7.1 EE (Dragline) which has some compelling features like Live Aggregate Projections which are great for "pre-materializing" queries that rely on aggregations (SUM, COUNT, MIN/MAX) that can cause excessive IO under Vertica.  Unfortunately there is no CE edition of 7.1 yet.  Regardless, you can still upgrade to 7.0.2.  
 
I have been tasked with upgrading our 7.0.1 installations to 7.0.2 so I thought I would document the process.  I'm going to give the unabridged version of the upgrade steps, please refer to my previous posts Installing Vertica 7|here and Vertica Installation Prereqs|here that should fill in the gaps if anything is unclear.  If you are not sure what version of Vertica you are running you can run select version().  
 
Prerequisites
  1. Download Vertica 7.0.2.1 (which is the latest CE as of this writing) from my.vertica.com.  You will need to do this from your laptop.  wget and curl likely won't work.  
  2. Copy the .deb file (assuming Ubuntu) to your sudoer's (root or equivalent) home directory.  Here's an example command using pscp: pscp <location of .deb file> <user>@<ip>:<path on ubuntu>
  3. Take a backup.  
  4. Plan for 2-4x as much catalog space as you currently utilize during an upgrade.  du -h /home/dbadmin/<db>/<node>_catalog/Catalog.  
  5. Find the options you used when you last installed or upgraded Vertica.  nano /opt/vertica/config/admintools.conf.  Search for "install_opts" which will detail the options you used previously.   This is an important step...if you accidentally choose a different set of options when you upgrade then you may find that Vertica stops working due to mismatched components.  Note the install_opts for later.  You do not need to note the -s or --hosts parameters (more on why later).  

Upgrade Process

  1. As dbadmin, stop the database.  One way is the following:  /opt/vertica/bin/admintools --tool stop_db -d VMart -p <password> 
  2. As a sudoer, from any host in the cluster run the new .deb file (Vertica will take care of ensuring it is copied and installed to every other host for you):  sudo dpkg -i /home/dbadmin/<file>.deb 
  3. As a sudoer, run update_vertica with the install_opts noted earlier, substituting the new .deb name for the old one.  You do not need to pass the --hosts or -s parameters...those are automatically determined by the installer based on the current hosts in the cluster, which may differ from the hosts that existed when you last installed/upgraded Vertica.  Here's an example command:  sudo /opt/vertica/sbin/update_vertica --hosts <IPs> --deb /home/<deb file> -L CE -accept-eula
  4. As dbadmin, start your database /opt/vertica/bin/admintools --tool start_db -d VMart -p <password>

Similar to SQL Server, a database is upgraded when it is first started in Vertica...so expect the initial startup to take a bit longer than normal.  Remember, this needs to upgrade the data files and catalog on ALL nodes in your cluster.  At this point you may want to verify that you actually DID upgrade your database...note that my db is now reporting 7.0.2-1.  

Now would be a good time to take another backup.  

Additional Notes

If you encounter errors that you feel you can safely skip then you can always use the --failure-threshold option.  I recently wrote a blog post on this.  

Lastly, you may have to upgrade supporting packages as well, such as MC.  Check your documentation.  

 


You have just read "Upgrading Vertica" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Workaround: Vertica Error: This version of Debian or Ubuntu is unsupported

I google'd around for hours and couldn't find an answer to this problem so hopefully this post will help someone else.  

If you use Ubuntu and Vertica then you probably know that the only supported release of Ubuntu is 12.04 LTS.  I have no idea why 14 isn't supported or when it will be supported.  This is problematic if your company has standardized on 14 or if you have Puppet scripts that are geared toward 14, you just have an affinity for 14, etc etc. If you try to install/upgrade Vertica on a 14 box you'll see the following error: 

Error:  failed to get system information for all hosts
This version of Debian or Ubuntu is unsupported.  
Hint: additional failures may be hidden.
Installation FAILED with errors.

I've yet to find a good way to get around this on google.  Here are the two methods I've used.  Hint:  use the second method...it's safer.  Do  not use this on a production Vertica installation.  That wouldn't be very wise.  

Use --failure-threshold NONE

If you use this switch then install_vertica (or upgrade_vertica...which calls install_vertica under-the-covers) will not fail for ANY errors.  You'll get a nice list of failures that you can then go and fix manually if desired:  

If you use --failure-threshold NONE then don't be surprised if Vertica won't start.  (Take your backups first if upgrading).  

Change SystemProfileFactory.py to ignore the Ubuntu OS check

This is much safer, but still...don't do it on your prod cluster.  Do all of this as root or equivalent.  

  1. Determine your OS version as reported by Ubuntu.  nano /etc/debian_version .  You'll see something similar to the screenshot.  Note the value returned.  It should be "jessie" for 14.  
  2. Modify the python script.  sudo nano /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/network/SystemProfileFactory.py 
  3. Ctl+W (for WhereIs), then "wheezy" (wheezy is the codename for Ubuntu 12.04 LTS).  
  4. Replace "wheezy" with "jessie".  See the screenshot.  
  5. Ctl+O (for Writeout)
  6. Ctl+X (to exit)
  7. Rerun update_vertica or install_vertica as required.  

Again, these tips should not be done on prod.  I like all of my Ubuntu boxes to be on 14 which makes my Puppet scripts easier to control.  I've found no issues using Vertica with Ubuntu14 (in testing envs).  I've also found no issues using LVM or ext3 either, which also causes the profile checks to fail as well.  See above.  

YMMV.


You have just read "Workaround:  Vertica Error:  This version of Debian or Ubuntu is unsupported" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Basic Vertica DBA Tasks

In this post in my Vertica series I'm going to cover some basic DBA tasks that you need to handle.  This is not a comprehensive list, just the absolutely critical items.  I've noticed that in many shops running "newer" database managers (especially NoSQL variants) that the developers are also responsible for administration tasks.  There just aren't that many full-time DBAs for these newer products.  The mindset of many NoSQL vendors is that you don't need a DBA...the system is self-healing and auto-tunable.  Vertica doesn't spew that mantra (that I've seen anyway)...however, there is no good reading material geared solely for the administration of Vertica either.  

Basic DBA tasks for any DBMS

These are the tasks that I feel are critical for any DBA, regardless of database platform:  

  1. Installing and upgrading the DBMS.  I already covered installing in Installing Vertica 7.  We'll cover upgrading Vertica and the underlying OS in a future post. 
  2. Backup/Restore.  I handled this in my post Vertica Backups.  
  3. Recovery.  This is slightly different than backup/restore in my mind.  When a DBMS comes up after a crash it will likely need to perform some sort of "recovery".  Every DBMS handles this differently.  Some are completely automated, some require manual intervention.  Vertica handles this automatically.  If you've handled k-safety properly then whenever a node comes back online (for whatever reason) Vertica will automatically recover it by copying data from other nodes.  This really eliminates the need for restoring data due to hardware or software failure...you only really need to restore a db when you need to create a new, copied environment or when you've accidentally committed bad data.  
  4. Reviewing the Logs.  Any good DBA should be reviewing the available DBMS logs periodically.  You can see partial logs in the Message Center in MC.  However, if you read the logs directly you can begin to think up methods to automate log management.  The two most important logs are vertica.log and dbLog.  To view these in Linux you use the tail command.  tail -f <catalog_path>/vertica.log and tail -f <cdb_path>/dbLog.  vertica.log will log almost any activity that the database manager performs.  dbLog will list just those messages specific to the database.  The former is almost unusably verbose (automation is your friend)...so the latter is what you really should be checking frequently.  
  5. Authentication and Authorization.  Vertica follows the DCL syntax of the ANSI standards and most DBMS vendors.  You can always use MC to handle your ANA needs.  
  6. Auditing.  Auditing is very specific to every organization.  Some have strict requirements, others don't care.  Good DBAs, even if auditing is not a functional requirement, audit various events if for no other reason than trending.  Vertica has rich auditing facilities...I simply know very little about them.  Sorry.  
  7. Performance Management.  I'll cover this in the next blog post Basic Vertica Performance Management Tasks.  
  8. Resolving connectivity issues.  Vertica is similar to SQL Server in this regard.  You must know the servername (clustername), database name, port, and user/password (if not using OS authentication).  Vertica clients connect to a built-in load balancer via a VIP (virtual IP) that is shared across Vertica cluster nodes.  This ensures that if a node fails clients can still work.  This is similar to a SQL Server cluster.  The IP Virtual Server (IPVS) is the load balancer in Vertica and it even allows adding and removing nodes on a running cluster without interrupting users.  This type of load balancer is akin to "round robin" which means that any single node can be handling inordinately more activity just through dumb luck.  Vertica also has "native connection load balancing" that is disabled by default but can actually ensure that CPU and memory overhead is honored across nodes too.  The caveat is that you have to use Vertica's client libraries and it takes a bit more work to set up.  
  9. License Management.  HP licenses Vertica by the size of the raw (uncompressed) data in the tables.  This doesn't include projection data, temp tables, the WOS (we'll cover that in a future post), or data not yet purged by TM (covered in the future too).  If your db's size nears the licensed limit you'll see warnings in admintools and vsql.  You need to then either upgrade your license or purge some data.  The beauty is that Vertica never just shuts down.  The most visible problem is that your queries will return a status code of SUCCESS_WITH_INFO instead of just SUCCESS.  If you code for that scenario then you can save yourself some grief.  You can run SELECT get_compliance_status(); to determine your licensing compliance.  

There are certainly other DBA tasks that could be added to this list, but this list comprises the absolute basics to be an entry-level Vertica DBA.  


You have just read "Basic Vertica DBA Tasks" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Vertica Concurrency and Isolation Semantics

This is a brief blog post in my Vertica series covering how Vertica handles concurrency and isolation.  

Vertica supports the four standard SQL isolation levels (READ COMMITTED, READ UNCOMMITTED, REPEATABLE READ, and SERIALIZABLE).  However, it really only honors READ COMMITTED and SERIALIZABLE.  READ UNCOMMITTED is mapped to READ COMMITTED and REPEATABLE READ is up-sized to SERIALIZABLE.  So, if your app is, for some reason, sensitive to isolation levels, beware. This shouldn't affect the query workloads that Vertica targets, which is read-intensive data warehouse workloads (with a high amount of concurrent data loading). In a future post I'll cover how Vertica is architected such that it can handle massive data loads without affecting online query concurrency.  

Vertica supports SQL92-style implicit transactions.  This means that you don't really need a BEGIN TRAN clause, although you can certainly do that.  It supports SAVEPOINTs and ROLLBACKs in a manner that is semantically identical to SQL Server.

Snapshot Isolation
Vertica has the ability to run queries in snapshot isolation mode similar to SQL Server. It is still READ COMMITTED under the covers. The trick is to preface your queries with AT EPOCH LATEST (or a given time) to your query. When you do this you instruct Vertica to fetch the committed data as of the time the query started. Essentially you are querying a snapshot, although you don't need to create or declare the snapshot like you do in SQL Server. These queries should have the fastest possible execution. Essentially you will hold no locks and won't block write ops.  EPOCHs are advanced when new data is committed.  

Vertica written with AT EPOCH clauses are called "historical queries".  

More on Epochs

Let's take a deeper dive into epochs.  

Vertica's SQL dialect doesn't have a name (such as T-SQL or PL/SQL) that I am aware of.  Vertica's SQL is based off Postgres but it is a very limited subset of functionality.  It's syntactically similar to TSQL and follows most of the ANSI SQL standards.  it has CTE's (although not recursive), derived tables in FROM clauses, subqueries (correlated and non-correlated), etc etc.  You terminate all SQL statements with a semicolon.  

Vertica's SQL does not support variables.  The vsql utility supports host-like environment variables but that is useless if using DBeaver or embedding SQL in Java.  Instead you have to use a native programming language and handle your variablization there.  Because there are no variables I have this script mostly hardcoded for my test environment.  

select current_epoch from system; will show you the equivalent of the LSN for your system.  Whenever the next commit occurs it will, in this case, be 957.  This is a system-wide, not connection-specific, value.  

The epochs table will show you the current epochs that you can query against.  The oldest listed, in this case 956, is the oldest that can be queried and is called the Ancient History Mark (AHM).  

Note that creating a table does not advance the current_epoch.  However, when we INSERT a row into a table, and commit it, we do advance the current_epoch.  

The epochs table now shows that we are able to query two different epochs.  The AHM has not advanced.  

Just a quick note, similar to the bash shell or the autocomplete feature in SSMS if you type the first few letters of a command or object and then hit tab you can either autocomplete the command or the shell will give you a list of suggestions for you to continue typing if there are multiple possibilities.  In this case I typed "pub" and then tab and vsql showed me all contextual values that would be available for autocompletion...in this case objects in the public schema.  

After entering another row my current_epoch has again advanced and I can now query 3 different epochs.  

A basic SELECT against our table will show two rows.  

Now I can use the AT EPOCH statement, prepended to the SELECT to see the data at various snapshots in the past.  

If I choose EPOCH 955 I generate an error since that EPOCH is no longer available.  

EPOCH 956 shows me no rows.

EPOCH 957 shows me one row.

EPOCH 958 shows me two rows.  

If I try to query the current_epoch (959) I also generate an error.  

I forgot to grab a screenshot but AT EPOCH LATEST, in this example would be functionally equivalent to EPOCH 958.  AT EPOCH LATEST is assumed when the AT EPOCH clause is omitted.  It is functionally equivalent to saying, "show me all committed data as of the start of this query".  

To reiterate I can now query 3 epochs.  If you try this on your own you have to run your queres quickly...

...because if you wait about a minute you'll notice that your list of available epochs is automatically reduced.  I show the current_epoch to show you that this is a quiet system and I have done no backups, delete statements, or anything else to advance the current_epoch...this is a quiet system.  

Epochs, be default, will be available for about a minute until the Tuple Mover process comes along and cleans up older epochs.  This is totally configurable and we'll cover it more in a future post.  

Epochs is the implementation that allows Vertica to have a READ COMMITTED SNAPSHOT implementation.  You are always querying against the latest epoch if you don't specify an epoch.  Any currently executing INSERTs, UPDATEs, or DELETEs will be in the current_epoch (and there may be multiple if you are running concurrent data modifications) and are not available for querying outside of the current session scope.  

If you think about this, epochs are functionally equivalent to READ COMMITTED SNAPSHOT, it's simply a different implementation.  This is one reason why I enjoy Vertica...it helps me understand other database managers and their implementations of things such as concurrency and isolation management.  


You have just read "Vertica Concurrency and Isolation Semantics" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Vertica Backups

Today's blog post in my Vertica series is about Vertica backups.  Vertica stores its data in a series of many files.  Once a file is written it is never modified.  Data updates are handled as deletes-followed-by-inserts and the deleted data is "logically deleted" via delete vectors.  Deleted data is removed when the final results from each node are merged.  This makes backups very simple...you just back up the files in the data directory.  

Backing up Vertica databases is roughly equivalent to a "hot backup" in Oracle (there are separate configuration backup utilities for MC and even admintools).  Vertica provides you with a python script that gives you a bunch of options to make life simple...vbr.py.  In order to use the Vertica backup tools your db must be up and running.  Essentially vbr.py make copies of your Vertica files to another storage area.  The scripts are written such that the files are transactionally consistent and have minimal locking overhead on the running Vertica system.  

There are no transaction logs in Vertica.  There is nothing equivalent to rollback segments or redo logs in Oracle.  In a later blog post I'll cover how Vertica maintains transactional consistency without something akin to transaction logs.  Also remember that your data is sharded across many nodes.  Vertica manages all of that for you as well.  In the worst case scenario Vertica will require you to run recovery on a node if it gets out-of-sync for whatever reason.  Even this is an automated process.  Even though data is sharded across nodes Vertica is not "eventually consistent".  All queries are transactionally-consistent.  Again, I'll cover how this is handled in a later post.  

Vertica backup utilities

As mentioned before, you can always roll your own scripts to copy the data directory on your node to somewhere else.  rsync is great for this.  Newer versions of Vertica provide better utilities to handle this for you.  

  • backup.sh and restore.sh are the original Vertica-supplied bash scripts.
    • these can be found in /opt/vertica/scripts
    • these use ssh and rsync to copy the data files, catalog, and configuration files to another area.  
    • the restore script takes a similar set of arguments but shuts down your db first, copies the data, then bootstraps your db.  
    • All backups are always done by node so any data that changed since backup.sh was run is recovered automatically from other nodes (assuming the db is k-safe).  
    • With Release 5.1 Vertica deprecated these scripts which, although they still work, are unsupported.  
  • vbr.py is the replacement and is fully-supported by HP.  This is a Python script located in /opt/vertica/bin.
    • Fundamentally backups and restores are still handled via rsync and ssh.  
  • Roll your own scripts.  None of these utilities are documented as being available on Community Edition but they are there.  I'm not sure if HP changed its mind and simply forgot to update the documentation to reflect this fact.  (This is also the case with FlexTables which we'll cover in a future blog post).  How do you roll your own scripts?
    • Create a tarball of your /opt/vertica directory (google can help with this).  Then copy the file to a safe place.  The script I use (which I'm reluctant to share just in case my testing is lax...I don't want to be responsible for your data loss) does something like this:  tar -czvf /tmp/vertica.tgz /opt/vertica

More on vbr.py

There are lots of nifty features in vbr.py other than simple node backups and restores.  For instance, besides full backups we can also take incremental and snapshot backups, as well as backups of specific schemas and tables (helpful if you do multi-tenancy in Vertica).  You can also use vbr.py to copy a database to another cluster (a non-prod staging cluster for example).  Let's look at some of these.  

vbr.py --setupconfig

This launches a pseudo-wizard that generates a configuration file for subsequent backup/restore tasks.  The benefit of taking backups this way is that the restore occurs using the same configuration file.  This makes it harder to screw things up.  

/opt/vertica/bin/vbr.py --setupconfig

This will provide you with the following options...  

...but it will not actually take a backup.  Rather it saves the backup configuration to the ini file.  Let's take a look at that. Note that all of the options chose are saved.  Note that I am backing up to any other Linux-based host (or really any host that can support ssh, rcopy, and scp).  When I actually run the backup I must ensure that the path exists on the remote backup host, in this case /home/dbadmin/backup.  

We can now run the backup using the config file:  

vbr.py --task backup --config-file MyBackup.ini

As mentioned earlier this is essentially copying all of the relevant data from your data and catalog directories.  

 

Here you'll note that Vertica wants you to verify that you really trust this remote backup host.  

 

 

 

 

 

Again, all vbr.py is doing is copying all files and directories under data and catalog to your backup location under the MyBackup folder which corresponds to the .ini file you used for your backup:  

You can have many different configuration files for different backup purposes, but hopefully you get the general idea.  

Database Restores

You can restore using your config file as well.  To restore a full db:  

  • the "target" database must be down.  
  • The cluster "target" must also be down.  
  • The cluster "target" must also have the same number of nodes as the "source" of the backup.  In other words, you can't backup a 5 node cluster and restore it on a 4 node cluster.  You have to have the same number of nodes...the restore process will node juggle segmentation for you to reduce k-safety.  Remember, the restore process is merely copying files from the backup host to the cluster nodes.  
  • Node names and IP addresses must be the same on the target cluster.  

To copy a database to another cluster

vbr.py --task copycluster --config-file

This is essentially a backup/restore in one atomic operation.  The data and catalog paths must be identical.  In the configuration file you specify the mappings of nodes in the source/target clusters.  The target cluster must be down.  

So, this is just like a standard restore except you are mapping old node names to new node names.  

 


You have just read "Vertica Backups" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Developing with Vertica

When you first begin developing with any new database manager the hardest part is establishing connectivity.  Let's assume you just want to connect to Vertica from your laptop using something a little bit nicer than vsql over ssh.  As I mentioned previously, Vertica has no native GUI query tool.  In this post we'll cover everything you need to know to start exploring VMart from a modern, GUI-based query tool.  

GUI-based universal database clients

There are a ton of FOSS (free and open source) universal database clients.  A universal db client can connect to any database that has a driver for the data access platform that the universal client was written for.  Most universal db clients are java-based so if there is an available JDBC driver then you can connect.  Using a universal db client is a great way to learn NoSQL since most NoSQL products have JDBC drivers.  The theory is that if you learn the universal db client you should be able to learn any underlying database manager.  

I prefer to use a universal db client that is similar in commands and functionality to SSMS.  I think the best tools are DBeaver and SQuirreL. Both are java-based and are easier to setup, frankly, than SSMS.  Initially I used SQuirreL but that seemed to be more prone to crashing and was a bit of a memory hog.  It's a fine tool but I'm not going to cover it in this blog post.  Feel free to explorer on your own.  

DBeaver

I then tried DBeaver (pronounced dee beaver and is a play on SQuirreL).  I'm hooked.  In 2 minutes I was connecting to SQL Servers, Oracle, Vertica, and Cassandra in different DBeaver windows.  How cool is that?  

DBeaver is completely non-commercial and open-source, however I can't find the exact type of open source license that is used.  You can download the source or a standard Windows installer.  It is best to download the Enterprise Edition which also includes modules designed for some NoSQL products.  EE is also free.  

Once downloaded and installed you need to tell DBeaver which database manager you want to connect to.  The screenshot to the right shows all of the different database managers that DBeaver can work with.  Obviously we are going to choose Vertica on this screen.  

If you haven't installed Vertica drivers on your laptop yet DBeaver gives you an opportunity to automatically do this.  DBeaver automatically detects that you have no JDBC Vertica drivers installed and offers to install them for you.  

 

You should now see the following screen:  

Similar to SQL Server you need to know the host, dbname, and user/password that you want to connect to.  Like SQL Server if you don't have the SQL Browser service running on your SQL Server, you will need to know the listening port.  By default Vertica uses 5433.  

If you are unsure of your listening port you can run nano VMART/port.dat as dbadmin.  

Once you have that you should be presented with the Database Navigator which is similar to SSMS's Object Explorer.  

From here you can run queries and visually explore your Vertica database.  Within 5 minutes you can install DBeaver and connect to VMart and issue your first query.  I don't know how it could be any easier than that.  


You have just read "Developing with Vertica" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Vertica Management Tools

This blog post in my Vertica series is about the various management tools available to a Vertica user.  We briefly surveyed some of these tools in previous posts.  If you've been following this series we now have installing Vertica 7|Vertica installed, we've installed the creating a vertica database|VMart database using admintools, and we've installed the installing vertica management console|Management Console GUI tool.  Let's cover these tools and a few others in greater detail.  

admintools

I already briefly covered admintools when we looked at creating a Vertica database.  The Administration Tools provide a GUI-like experience (on Linux you can get a GUI-like appearance...if you don't mind your tool looking like WordPerfect 5.1 for DOS with no mouse support) without requiring a windowing manager like KDE or Gnome.  Most Linux folks hate running GUIs on servers...it's just more stuff that will crash the OS and drain performance.  Admintools can be launched from a shell (ssh, telnet, xterm) by running the following commands:

  1. Ensure you are logged in as dbadmin.  If you attempt to use your root-level account you'll most likely get the error noted to the right.  
  2. /opt/vertica/bin/admintools

However, admintools, itself, is scriptable.  Running admintools -a will show you everything you can do with admintools.  Here are some interesting things you can script like this:  

  1. create a new database (perhaps as part of a CI loop):   /opt/vertica/bin/admintools --tool create_db -s (nodeIPs) -d <dbname> 
  2. start a database:   /opt/vertica/bin/admintools --tool start_db -d<dbname>
  3. connect to a database (this essentially starts vsql):   /opt/vertica/bin/admintools --tool connect_db -d <dbname>

Management Console (MC)

I already covered quite a bit on MC in Installing Vertica Management Console.  MC doesn't do everything admintools does (just like SSMS doesn't do everything TSQL does) but it is a great GUI tool where GUI tools are really needed and aren't cumbersome...such as detailed performance monitoring graphs).  

Database Designer (DD)

Database Designer can run from MC, admintools, or can be scripted stand-alone. It can be run by admins all the way down to basic reporting users.  A report developer may want to run this, for instance, to ensure that a new query has the correct projections available for performance reasons, just like a good SQL Server developer would ensure that their query plans are using the correct indexes.  DD functionality will be different based on the login. For instance, you may be able to create a new design, but not execute it.  DD's purpose is to analyze your logical schema, sampledata, and your sample queries and then create a physical schema design (basically projections) that can be deployed or scripted. These designs are optimized for column-oriented storage.

Your designs can create new projections for your tables based on new sample data or you can pass it queries and it will create designs for you based on that workload. The deployment scripts can even be designed on a dev instance that is k-safe=0 and deployed on a higher k-safety instance.  

In the next post we'll take a deeper dive into Database Designer.  

vsql

vsql is Vertica's shell-based query tool.  It's functionally similar to osql or sqlcmd in SQL Server, or sqlplus in Oracle.  It does not run on Windows.  If you need a query tool for Windows the suggestion is to use ssh or a third party querying tool.  There is also no GUI-based querying tool such as SSMS.  This is a foreign concept to SQL Server DBAs but is a standard practice for most other RDBMSs.  For instance, Oracle shipped no GUI-based query tool until Enterprise Manager in 8.1.  Prior to that most people used Toad.  Here's how you use ssh and vsql.  

  1. connect to your server with ssh as dbadmin or an equivalent user
  2. /opt/vertica/bin/vsql
  3. enter your VMart password
  4. use \q to quit

Even though you can't install vsql on Windows you can install it quickly on another Linux box.  

  1. Ensure your Linux box is the same version as your Vertica box...same distro...etc.  
  2. scp <remotehost>:/opt/vertica/bin/vsql .$ ./vsql

You can also download the vertica .deb file and install it, WITHOUT running install_vertica.  

There are many other management tools such as install_vertica and update_vertica that we've covered already.  There is also vbr.py which is the native backup utility we'll cover in a future post.  

One tool I wish Vertica had was a graphical, web-based querying tool integrated directly into MC.  Unfortunately this doesn't exist yet.  In a future post I'll show you my favorite 3rd party open source tool that covers this shortcoming and makes the Vertica developer's life much easier.  

Scrutinize and Diagnostics

/opt/vertica/bin/scrutinize and /opt/vertica/bin/diagnostics

These are both diagnostic tools that work similarly with similar goals.  I would've merged them into one general diagnostic reporting tool.  They work similarly to pssdiag in SQL Server.  They both collect a broad range of information from all nodes in the cluster.  The files are zipped and can be uploaded automatically by setting a switch in the call.  


You have just read "Vertica Management Tools" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

Installing Vertica Management Console

In the previous post in my Vertica series, Creating a Vertica Database, we created the sample VMart database that HP provides.  By doing this we were able to do rudimentary tasks using AdminTools.  In this post we are going to install Management Console (MC) which is HP's web-based GUI tool.  Unlike AdminTools, which runs on the Vertica instance and is invoked via a telnet or ssh session, MC is a web server that can be invoked by any browser with network connectivity to the Vertica cluster.  

Unlike SQL Server where the SSMS GUI is not required (all SSMS tasks can be carried out via TSQL commands) there are some tasks that only MC can handle and some that only AdminTools can perform.  And then there are some tasks that both tools can handle, but can be done with MC much easier (the Database Designer being one).  MC can graphically manage the layout of your schema for you, AdminTools cannot.  

In Linux when you REALLY need a GUI admin tool the solution is usually to run a httpd process somewhere that accepts requests from a client web browser and constructs the necessary Linux commands under-the-covers. That's exactly how MC works.  Almost anything you do in MC can be scripted to SQL commands and saved for late.  

One MC instance can manage many Vertica clusters, each with different security, k-safety etc. Just like SSMS. Your MC version must match your Vertica version however.  Your MC users can be different from your database users and you can provide fine-grained permissions to various features in MC.  You can install MC before or after you install your first Vertica node. MC will auto-discover your existing db clusters if you install it on an existing cluster node.  This is why I've waited until now to cover MC installation.  

MC uses ports 22, 5444, and 5450 for cluster communications.   5450 is used for browser communications over https.  

Installing MC

  1. From your laptop  (remember, Ubuntu server has no GUI) login to my.vertica.com (accts are free) and download the Vertica console .deb package for your OS and processor architecture (for instance,  vertica-console_7.0.2-1_amd64.deb).  
  2. My advice is to install MC on one of the nodes of your cluster.  This isn't necessary and in a prod environment may not even be a smart thing to do.  So far we have only one node of our development cluster so we will install directly on that.  When you do install on a cluster node Vertica will recognize that fact and auto-discover that node and add it to MC automatically.  
  3. I assume you are connecting to your Ubuntu box using PuTTY with SSH. If so you can use pscp to copy your downloaded .deb file to your server. You can google to download pscp if you are running stand-alone PuTTY.
  4. Open a DOS prompt (elevated) and cd to the location of pscp
  5. pscp <location of .deb file> <user>@<ip>:<path on ubuntu>
  6. Login to your Ubuntu server as a sudoer (not dbadmin, which is not a superuser)
  7. sudo dpkg -i <pathname> 
  8. be patient.  This is installing misc java files and any missing dependencies.  
  9. Eventually you will asked to open https://ip:5450/webui to complete the installation.  Leave your terminal/ssh session open, do not close it.
  10. You should first be presented with a license screen in your browser.
  11. Step 2 you will provide a "superuser" MC user.  My advice is to just use the default (see screenshot), which is dbadmin from the underlying Vertica installation on your node.  
  12. You will then pick an authentication mechanism for non-superuser access to MC.  I'm not even going to attempt configuring LDAP.  
  13. At this point you likely won't see anything happening in your browser window.  Don't refresh or close your browser.  If you switch back to your terminal session you can watch config messages scroll through the screen while Vertica builds the MC application given your configuration preferences.  
  14. Eventually the terminal window will flash a message that the installation is complete and will give you the URL to log in to.  At this point you can close the terminal session and the configuration browser window. 
  15. Open the URL and you should see a login screen.  Log in with the superuser account that you previously created.  
  16. You should be presented with the following screenshot.  

 

Using Management Console

You can learn a lot about Vertica and MC by just clicking around.  "MC Settings" allows you to modify the config values you specified above.  This is independent of any clusters you may wish to manage using MC.  

The first task you'll want to do is import your existing (one node) cluster.  Click "Database and Clusters" and then Import.  Since (if) you installed MC onto an existing cluster node the database will be auto-discovered.  You simply have to supply your username/password.  If you supply your dbadmin password you will be able to manage all aspects of your cluster.  If you only have a limited user account (say, because you only need access to database designer) then add that account here and you will see only limited functionality you have permission to perform.  

You should receive a success message.  You can now discover MC and lots of Vertica features on your own.  

I'll point out some interesting MC features that are worth exploring next.  

Message Center

The Message Center will display errors (like the Error Log in ssms) as well as alert conditions.  Here you can see that my cluster is not k-safe (we'll fix

that in a later blog post).  One interesting feature is that you have the ability to mark messages as "read", similar to email.  You can basically dismiss messages that you don't wish to see again.  This cuts down on the noise messages.  What is slightly confusing to noobs is how to "expand" a message to read the details.  You use the triange to the left of the priority indicator.  

Database Overview Page

You access this page by clicking on the "Recent Databases" option for the cluster you just configured.  

You will be presented with the dashboard/overview page.  

There are tabs along the bottom of the page to access additional functionality.  Most interesting is the Design Page which launches the graphical Database Designer wizard.  We'll cover this in detail in a future blog post.  In a nutshell, this utility let's us tweak our physical designs for performance and k-safety.  

Finally, the Explain tab allows you to execute SQL statements and view performance metrics.  I quickly typed a simple query and generated an EXPLAIN plan.  

An EXPLAIN plan is equivalent to "Display Estimated Execution Plan" in that the query is not executed but an approximate query plan is returned.  The PROFILE option actually executes the query and shows detailed CPU and memory usage.  

You can find interesting VMart queries in /opt/vertica/examples/VMart_Schema.  Use nano to examine these files or copy them to MC and took at the EXPLAIN plans.  

We'll cover some additional management tools in the next post.  


You have just read "Installing Vertica Management Console" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Tags: 

On Certifications

"Should I get a certification?"

That's a common question in IT and my opinion on certifications has changed over the years.  I recently got my Vertica certification and in the process I've been re-evaluating the value I find in certifications.   

My First Certification

...was my Microsoft Certified Professional (MCP) in 1997.  I worked at a large "shop from home" television channel as a help desk guy.  The company allowed me to take a Windows 95 class and they paid for the certification.  I already had my B.A. and I didn't find a lot of value in having an MCP.  

Microsoft Certified Trainer

I didn't stay at that company for long.  I went to college to be a high school teacher but realized I hated teaching kids.  So I thought I could teach adults SQL Server, databases, and programming...something I did to help put myself through college.  I was hired by a MS Gold Partner and soon became an MCT.  Within a matter of a few months I held my MCSE and was working toward my MCSD.  Being an MCT for various products requires that you pass the exam for those products.  This usually meant learning just enough to pass the test so I could teach classes and earn my pay.  My breadth of knowledge was diverse, but had very little depth.  

As the aphorism goes, "those who can't do, teach".  I was teaching Exchange Server after just barely passing the exam the week previous.  As long as I stuck to the courseware I was fine and students loved me.  I even won "Instructor of the Month" for all of the United States in April 1999.  But if I veered from the courseware, or a student asked a moderately difficult question, I looked foolish.  This really wasn't what I wanted out of my career.  I felt like a shallow fraud.  

"Paper MCSEs"

It was around 1999 when I first noticed the ads on the radio touting, "Get your MCSE and make $70,000 per year your first year.  We can help you!"  These were basically bootcamps where you were given the answers to the exam questions and were allowed to briefly work with the technology.  They had similar bootcamps for Java Jocks.  This was the Dot Com Era after all.  The market demand for technologists was never higher.  This disgusted me.  My training company was also a Prometric Testing Center and I saw a lot of test takers come in to take the tests to get their certifications.  It was common to see people in the parking lot studying their "brain dumps" that they downloaded from the nascent Internet.  TroyTec was becoming hugely popular.  I saw tons of old TroyTecs laying around near the exam area that testers left behind as they were trying to cram just before taking a test.  

These people were "Paper MCSEs" and it put a sour taste in my mouth for a decade regarding certifications.  

One Particularly Disgusting Experience

Every Prometric exam room had a security camera on the wall plainly in sight of every test taker.  Our office administrator had a CCTV sitting prominently on her desk that she used to monitor the examinees.  Obviously this was to discourage cheating.  In fact, if you took an exam you had to sign a little waiver indicating that you would not cheat or sell the questions to a brain dump site.  

I came back from lunch one day and our office admin asked me to look at the TV with her.  Right there, clearly, was a test taker thumbing through a folder of notes looking for answers.  And there was the camera right over his head.  We often saw people cheating...you could see them looking at note cards they had up their sleeves for instance.  The office admin asked me what she should do in a case this brazen.  

My response still sickens me.  

I told her to ignore it.  

If you accuse someone of cheating it can wreck their career.  What if, somehow, he wasn't really cheating?  What if we accuse him of cheating and somehow this blows up in OUR face?  I would prefer not to be involved AT ALL in these scenarios.  As I saw it at the time, if you cheated and got away with it...good for you.  You're a fraud and you won't last long at your job.  

These cheaters were the rule, not the exception, during this era.  I viewed a lot of it.  It disgusted me and jaded my opinion of anyone who had a certification but no real-world experience.   

Networking Essentials

You could successfully cheat for just about any MS exam, but Networking Essentials was the hardest.  It was the widowmaker.  I saw guys take that test 4 or 5 times and still not pass.  "Net Ess", as we called it, required you to actually DO networking.  You couldn't just memorize brain dump answers.  You had to conceptually understand routing.  And you were asked to do math involving subnetting and supernetting.  You even needed to know about "plenum wiring", which I had run for 4 years as an intern in college.  The test was still multiple choice so it was possible to cheat, but it would likely take you a few tries.   

If you wanted to know if an MCSE was of the "Paper" variety, simply ask them how many times it took them to pass Net Ess.  Or ask them a question from the test.  

MCDBA

In 1999 MS released SQL Server 7.  This was a radical departure from SQL 6.5 on many levels I won't rehash here.  I had limited experience with 6 and 6.5 and IMHO the industry scuttlebut was that SQL Server was only a "departmental" database...not able to scale and too prone to crashing on Windows NT.  With SQL 7 the industry's views seemed to change overnight.  SQL Server was now considered a suitable database for these new e-commerce apps that were becoming popular, and mission-critical.  

Shortly after SQL Server 7 was released MS announced the availability of the MCDBA certification.  Since I worked for a Gold Partner I was privy to "inside information".  I learned that the two SQL test requirements (Administration and Design) for the MCDBA wouldn't be quite so easy anymore.  Was Microsoft finally going to have stringent testing.  Would there be no Paper MCDBAs?  Although still multiple choice the questions would be more scenario-based.  Gone would be pedestrian questions like "Where is the data for a view stored?" 

I begged my employer to let me take the tests before they were GA.  I studied and passed both exams on the first try sometime around August 1999.  I was probably one of the first MCDBAs outside of Redmond.  The tests were not easy and I remember seeing many candidates fail the exams for the next few months.  Around this time MCSEs were everywhere but did not command much of a premium in the market.  But an MCDBA did.  

The Windows 2000 "Upgrade" MCSE

I soon tired of being a technical trainer and became a full-time data professional in late 1999.  It was easy to forget my former career.  

When Windows 2000 was announced MS also announced that the Windows NT MCSE would be retired.  If you held that "version" of the MCSE (which I did) you could no longer call yourself an MCSE without taking an upgrade exam.  This pissed me off.  This was like having my B.A. removed...I earned it I should get to keep it.  I wasn't using any of the new features of Win2K, why should I upgrade?  I still felt that there was a bit of value in having the MCSE logo on my resume so I decided I would at least attempt the upgrade exam.  The upgrade exam was available until January 2002 and I scheduled my exam for the last week of December.  I didn't study and I barely passed.  

What really pissed me off is MS reversed their decision a year later and let everyone keep their MCSE designation even without the upgrade exam.  It is this kind of attitude by MS which is why I'll likely never hold another MS certification again in my life.  They compounded bad decisions upon bad decisions.  I talk to a lot of veteran MCSEs who still are not really sure if they are allowed to use the MCSE logo or not based on these ridiculous rule changes.  

Oracle OCA/OCP

In 2002 I was using Oracle every day designing ETL packages.  My employer sent me to training so I really just needed to pass a few tests to become an OCP.  I passed each exam easily.  But OCP exams are much like MS exams...very pedestrian questions with obvious answers.  I'm competent with Oracle, but frankly my skill level isn't great.  I guess that makes me a Paper OCP.  The fact that I aced every exam kinda indicates to me that an OCP certification, like an MCSE certification, is not a good measure of a potential new hire.  

The worst aspect of the OCP tests is that you don't need to know ANY PL/SQL.  Every day I encounter Oracle DBAs that cannot write even rudimentary PL/SQL packages.  I strongly believe this is because the OCP exams do not stress this.  

SNIA SCSP

It was another 4 years until I got my next certification.  I was working as a data architect at an ISV and our hosted environments were having HUGE I/O problems.  I knew at least the basics of SANs and storage and I/O but I was having problems communicating my observations with the SAN engineers.  They were denying their SANs had any problems...and I knew MOST of our problems were indeed the SAN.  I began to read anything I could about EMC, IBM Sharks, or anything storage-related.  After a few months I was able to articulate, with irrefutable proof, that we had SAN problems.  I even got the SAN engineers to admit it.  It took a few more months before our problems were fixed.  

Maybe I could pass the SNIA SCSP.  I took the test and aced it the first time.  No one paid me this time, I simply wanted the certification to prove to myself that I knew the technologies as well as I thought I did.  Even though I aced it, I guessed at every question involving tape libraries and NAS...I didn't feel those subjects were germane to a data architect.  But apparently they are very important to the fine folks at SNIA.  

I think the SNIA SCSP is the most valuable certification I have.  It is an industry-group certification, not a vendor certification.  This means that it is vendor-agnostic.  You are tested on concepts and theory vs tools.  With MS and Oracle exams there is clearly a push to have candidates learn the newest features of the newest release, even if those features have a limited applicability to most practitioners.  

Vertica, MySQL, etc

I recently took my Vertica exam and passed which makes me an HP ATP.  I've been using non-traditional DBMSs a lot lately, especially Vertica.  I had no formal training in Vertica, I merely picked it up by reading and building my own test servers.  I even have a Vertica series of blog posts where I'm disseminating some of the key Vertica concepts that initially tripped me up.  I took the test to prove to myself that I knew Vertica and to determine what areas I needed improvement.  In other words, I took the exam for ME.  I'll put it on my resume not because I value the certification, but because I do have valuable Vertica experience.  

Unfortunately I think the Vertica exam is a complete waste of time.  Here's why:  

  1. It is an online Pearson exam.  This means that an examinee can have the test open in one browser and can do google searches for answers in another.  
  2. The questions are about totally esoteric aspects of Vertica such as what screens in Management Console show you CPU utilization.  This is tool-based knowledge.  A better question might be how do you decrease CPU utilization for a given query.  
  3. There are questions clearly geared towards older releases of Vertica.  Terms have changed in the newer releases but the tests have not been updated to reflect these changes.  
  4. There are too many questions about installing Vertica and not enough about performance and k-safety.  In fact, there were no questions on k-safety.  Installing Vertica is a task done, at most, a few times a year.  The latter is done possibly every day.  Further, there were ZERO questions on querying data.  

This is just another vendor exam that focuses too much on the tool and not enough on the concepts.  

Parting Thoughts

I find little value in most certification exams.  Vendor exams are almost worthless.  There is too much emphasis on the newest features which are likely not being used, yet, in most shops.  And there is too little emphasis on foundational knowledge.  Industry certifications (SNIA, PMI/PMP) are much more valuable.  A thorough understanding of the concepts is required, and those concepts are transferable to most vendor implementations.  

I especially like the approach that EMC is taking.  They offer a Data Science Associate certification.  This is a holistic "architect" certification.  You must know a little bit of everything in data lifecycle management.  Design is covered, relational concepts are covered, querying in multiple languages is covered, as well as storage (this is an EMC certification after all).  Even though this is offered by a vendor there is nothing vendor-specific about it.  


You have just read "On Certifications" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.  

Pages

 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer