Lab 1
Admin
There are three parts to this lab: setting up an account, connecting to the class database (plus running a quick query), and critiquing a terminology.
You’re free to do this lab in whatever format and using whatever tools you think are reasonable. Just submit a PDF on Canvas.
Instructions
Set up UMLS account
Example of what you should see after logging in.
This gives you access to tools we’ll use later in the semester.
Go to https://uts.nlm.nih.gov, and click Sign Up (upper right corner). Accept the license, fill in your information, and check that you can log in to the UMLS Metathesaurus Browser.
Nothing to submit for this.
Connect to the class database
I’ve created a database that we’ll use for several labs this semester. It’s a MySQL 8.0 database hosted on AWS RDS. Currently, it contains several OMOP-formatted vocabulary tables downloaded from Athena.
For now, let’s just be sure everyone can access this database.
You’re free to use whatever software you like See software if you want recommendations. . Shoot me a message if you’re not able to get something installed.
The address, port, username, and password you’ll use for the database will be posted on Canvas.Couple notes:
The student account only has read access.
The DB is on a t2.micro
free tier machine and may be a bit slow.
Please be respectful and don’t run excessive queries that interfere with your classmates.
Finally, please don’t share the access information for this database outside our class.
There’s no PHI on there, but there’s not much bandwith as is.
Once you have access, you’re looking for the clinical_vocabulary
database.
Your task is to identify your favorite condition and tell me how many concepts use that word in their name. For example, if you pick hypertension Pick something else. , you could run:
SELECT COUNT(*)
FROM CONCEPT
WHERE concept_name LIKE '%hypertension%'
The real point of this is to be sure everyone can access an online SQL database, so there’s no need to do a fancy query.
Just tell me what condition you chose, what query you ran, and how many concepts you found.
Terminology example
Imagine we wanted to make a terminology about pizza. Here are two such terminologies.I’m only going to include the first 9 of 40 types of pizza from a diagram by Jess Kapadia.
Please answer the following questions in your submission. A couple sentences for each point could be sufficient.
- Which terminology is more understandable to you, and why?
- Which terminology do you think would be easier to add the remaining 31 types of pizza into, and why?
- Can you name one advantage and one disadvantage of each terminology vs the other?
-
Can you think of any ways to improve Terminology 2?Don’t stress too much on this one. This is one of the overall topics of the course.
- Can you think of another way you might organize these concepts?
That’s all for this lab!
◼️
Software suggestions
An easy way to test access is via the MySQL command line interface. This is available for Linux/Mac package managers, and is easily installed with Homebrew, APT, pacman, and probably any other package manager you may be using. Windows users could install the MySQL shell.
A more familiar alternative is to use a GUI application like Sequel Pro, DataGrip, DBeaver, or MySQL Workbench.
For programmatic access, I’d suggest any tool like R’s dbplyr
, DBI
and RMariaDB
, Python’s sqlalchemy
and mysqlclient
.
Choose whatever you find most comfortable. You can complete every lab with any of these (or other) tools.
MySQL CLI example
The command line client is probably the easiest thing to get installed.
Once installed, use the address, port, username and password I posted on Canvas to run the following, substituting things directly.For example, substitute $ADDRESS
directly to ABC123.rds.amazonaws.com
.
mysql -h $ADDRESS -P $PORT -u $USERNAME -p
You’ll be prompted to enter the password.
If your access was successful, you’ll see a MySQL prompt like mysql>
.
Select the database using
mysql> USE clinical_vocabulary;
Then run your query.