CSC483 HW05

CSC 483: Homework 5, Due Feb 10, 2026, at 1 pm

1. Begin your project notebook (2 pts + 2pts in PR)

  1. Make a copy of the project guidelines:

https://colab.research.google.com/drive/1xqpoZnPbJjISUX3gsE59iw5CO-XQ6nNc?usp=sharing

  1. Read through the guidelines, but then you can delete the markdown cells with instructions, they are available on the course website.

  2. Load whichever dataset and meta-data from first_data or second_data you are most interested in. Careful of data sizes!

  3. Add an exploratory plot with a similar level of finesse as we plotted in class (as in it does not need to be perfect), like a PCA or t-SNE.

  4. Add a markdown cell with some notes doing your best to describe why and how we are making PCA and t-SNE plots.

Note: This may seem repetetive to what we did in class, and feel free to copy and paste your own work from notebook to notebook. In starting a new notebook you will have the chance to create a more organized analysis that combines elements of first_data and second_data without some of the extra bits I added as examples. Most notably, you only have to load the datasets you are actually examining rather than the multiple meta-data files and multiple RNA-seq files I’ve loaded in first_data and second_data. You do not need to complete all of the following, but for reference, recall the in-class instructions from the second_data notebook:

Decision points

  1. use smaller rnaseq datasets ( filtered 6k x 16k) OR use gene P/A
  2. reduce dim of genes OR dim of samples (transpose or not to transpose)
  3. connect to metadata of samples OR of genes
  • red by genes means plotting samples and vv
  • merge rnaseq with sample metadata, use SRX numbers
  • merge gene p/a with annotation, use first_name_comp
  • merge rnaseq with gene annotation use first_name_comp
  1. Choose a color scheme
  2. Does DR technique separate out groups, as seen by color?

Mini-steps

  1. color all points on PCA NOT blue
  2. color half the points one color, half another
  3. color the first sample a unique color
  4. load metadata (as we did in first_data)
  5. turn pca from numpy array to pd df
  6. figure out rownames for PCA df, based on column names df4

2. Update Check-in Notebook (1 pts)

  1. Move your reflections from HW4 into your check-in notebook.

  2. Review your check-in notebook and scores for all previous HW assignments.

  3. Include a Markdown chunk that attests to my notes or requests a correction/update.