CSC483 HW05

CSC 483: Homework 5, Due Feb 10, 2026, at 1 pm

1. Begin your project notebook (2 pts + 2pts in PR)

Make a copy of the project guidelines:

https://colab.research.google.com/drive/1xqpoZnPbJjISUX3gsE59iw5CO-XQ6nNc?usp=sharing

Read through the guidelines, but then you can delete the markdown cells with instructions, they are available on the course website.
Load whichever dataset and meta-data from first_data or second_data you are most interested in. Careful of data sizes!
Add an exploratory plot with a similar level of finesse as we plotted in class (as in it does not need to be perfect), like a PCA or t-SNE.
Add a markdown cell with some notes doing your best to describe why and how we are making PCA and t-SNE plots.

Note: This may seem repetetive to what we did in class, and feel free to copy and paste your own work from notebook to notebook. In starting a new notebook you will have the chance to create a more organized analysis that combines elements of first_data and second_data without some of the extra bits I added as examples. Most notably, you only have to load the datasets you are actually examining rather than the multiple meta-data files and multiple RNA-seq files I’ve loaded in first_data and second_data. You do not need to complete all of the following, but for reference, recall the in-class instructions from the second_data notebook:

Decision points

use smaller rnaseq datasets ( filtered 6k x 16k) OR use gene P/A
reduce dim of genes OR dim of samples (transpose or not to transpose)
connect to metadata of samples OR of genes

red by genes means plotting samples and vv
merge rnaseq with sample metadata, use SRX numbers
merge gene p/a with annotation, use first_name_comp
merge rnaseq with gene annotation use first_name_comp

Choose a color scheme
Does DR technique separate out groups, as seen by color?

Mini-steps

color all points on PCA NOT blue
color half the points one color, half another
color the first sample a unique color
load metadata (as we did in first_data)
turn pca from numpy array to pd df
figure out rownames for PCA df, based on column names df4

2. Update Check-in Notebook (1 pts)

Move your reflections from HW4 into your check-in notebook.
Review your check-in notebook and scores for all previous HW assignments.
Include a Markdown chunk that attests to my notes or requests a correction/update.