Note: You can get a lot of this information from Hadley Wickham’s “R for Data Science” book, Chapter 8: here, and the https://happygitwithr.com/ tutorials.
Do this before you start this section!
- Click on the following link to download this file: SampleData.csv
- Save this file somewhere on your computer where you can find it.
Rstudio projects facilitate a important “best practice” for developing code to facilitate your research, which is to make sure that for each project, create one folder that contains all of your data, scripts, outputs (e.g., plots) and other assets.
Additionally, I highly recommend that this “project directory” folder includes a subfolder called “data” that contain all of the data files, and “figures” (or some other name) folder that will contain all of the output plots, etc.
Setting up a project directory in this way allows you to keep everything organized and up to date, and it also helps with collaborations or sharing code, because it is easy to follow where things are. It is also nice for your “future self”–if you come back to a project after some time, it is easy to pick up where you left off without wondering where you left all of the relevant files.
Finally, the more important reason to set up an Rstudio project folder is that it will allow anyone to run the code you are developing in this folder because you all of your pathnames will be “relative” to the parent directory. By using relative path names in your code, it becomes much more reproducible and enables collaboration.
Here is what I recommend for every project you start:
Try this out be creating a “trial” Folder somewhere in your computer.
Next, in R Studio, click on “File” > “New Project”
Click on “Existing Directory”, click “Browse” and select the “trial” Folder you created, then click “Create Project”
Now, you will see that there is a new “trial.rproj” file in your folder.
Open that .rproj file. This will open up a new RStudio window.
Now, whenever you create any file within this R Project, the files will automatically go into this folder. Also, if you want to import any files, you can just put in the file name instead of the whole path.
Let’s try this out:
Go find the “SampleData.csv” file that you downloaded earlier and
drop it into a subfolder called “data” in your “trial” project folder.
Now click on the “refresh” button on the far right side of your
“Files” window on the bottom-right part of your Rstudio window
You should now see that there is a “data” folder in your project directory, and that “SampleData.csv” is in that folder.
Now, you can use read.csv()
again to read your file. But
this time, you only need a relative pathname–that is, you only
need to put in "data/SampleData.csv"
to tell R to look for
“SampleData.csv” inside the “data” folder, instead of the whole pathname
on your computer.
Let’s call this version of the data dat2
dat2=read.csv("data/SampleData.csv")
dat2
## Indiv.ID sex age size weight
## 1 20-01 male adult 29.5 66.5
## 2 20-02 male juvenile 28.0 58.5
## 3 20-03 female adult 26.0 57.0
## 4 20-04 female adult 25.0 55.5
## 5 20-05 female juvenile 25.0 62.0
## 6 20-06 male juvenile 28.0 61.0
## 7 20-07 female adult 26.0 58.0
## 8 20-08 male adult 28.5 65.0
## 9 20-09 male juvenile 27.5 60.0
## 10 20-10 female juvenile 26.0 59.0
## 11 20-11 male adult 25.5 62.0
## 12 20-12 female adult 27.0 59.0
## 13 20-13 female adult 26.5 60.0
The beauty of this system is that, as long as the project directory structure is the same the computer, you can run this exact same code to import the data on any computer. In turn, that means that this entire project and code are portable in multiple ways. You could:
send a collaborator your whole project directory, or
copy the project directory on a thumbdrive, or
sync the project directory on DropBox, OneDrive, or other cloud storage that syncs with multiple computers…
and you should be able to run the code as-is.
However, the most elegant way to have your project run on multiple computers across multiple collaborators is to use a version control system, such as GitHub…
Version control is a system by which you can keep track of changes to collaborative projects. One common example is Google Doc, which is an online document that multiple people can collaborate on. Importantly, Google Doc automatically saves versions of the document when changes are made, and users can revert back to earlier versions if they want.
Git is an open-source software that facilitates version control of files in repositories (which is another way of saying project directory).
GitHub is a service that facilitates Git-based projects. There are other popular similar services, such as Bitbucket and GitLab
For practical purposes, this system will help me teach this class. If we do this right, it will help tremendously with the process of troubleshooting your code when we get to independent projects! It will allow me to access your projects on my computer, help make edits, and keep track of those edits.
The larger reason is that this will help you with your research. You will no doubt be using R (and perhaps other coding languages) for your research, and this enables a workflow that is portable from one computer/collaborator to another.
It will expand the scope of your work. Once you make a project repository, you can make it private or public. If you make a public repository, then you can share it with colleagues. Likewise, you can fork any other public repository–this is increasingly the way people disseminate new software or packages for cutting-edge analysis techniques.
It could be a nice thing to add to your CV
Here, we are following the directions on Chapter 6-13 of the Happy git with R website, here
Here, I highly recommend following the directions on Chapter 6 of Happy Git with R: https://happygitwithr.com/install-git
To take from their directions:
If you are using Mac, download XCode from the App store or here: https://developer.apple.com/xcode/. This includes Git and also may come in handy later.
If you are using Windows, download Git for Windows
When linking a GitHub repository with Rstudio, you will need to be able to authenticate your connection. To increase security, GitHub no longer supports access using a simple password. Instead, it requires you to authenticate using a personal access token. You need to generate a token, and then you will use this later to connect Rstudio with your repository.
(The longer way to go is to sign in to github, Click on your avatar on the top-right corner, then go to Settings > Developer settings (at the very bottom) > Personal access tokens > Tokens (classic))
Go to the BIOS967 organization on GitHub: https://github.com/BIOS967
At the top of the page, click on “Repositories”
Click on “BIOS967_Fall2023” (for the Fall 2023 semester)
Now, click on the green button that says “<>Code”
Copy the clone URL to your clipboard. Use the HTTPS URL.
In RStudio, start a new project: File > New Project > Version Control > Git. In “Repository URL”, paste the URL of your new GitHub repository.
Determine where your project will be saved locally. Be intentional. I have all of my GitHub repositories saved in my Documents folder in a subfolder called “GitHub”.
Check “Open in new session”
Click “Create Project”.
At some point in this process, you will be asked to authenticate. At that point, you will enter your GitHub account name, and then enter the Personal Access Token that you created in step 3.2.4.
Congratulations! Now you have successfully cloned a GitHub repository on your computer.
This means that, whenever I make an update or create new files on this repository, you can download those changes by “pulling” this repository.
Let’s try that out. I’m going to make some change on the repository and upload (“commit” and “push”) those changes up to GitHub. Then, I want you to “pull” those changes.
To pull changes from a GitHub repository, click on the “Git” tab on the top right window in Rstudio, and click the “Pull” button.
Once you have finished the set up process, you are ready to start your workflow!
First, notice that you should now have a tab called “Git” in the “Environment” window in RStudio (typically upper-right). If you click on that tab, it’ll look like this:
The Git workflow may feel a bit painful for you in the beginning because it seems a lot more manual and tedious than automated syncing that you may have gotten used to with Dropbox, Google Drive, etc. But really, it’s no different than those systems–it’s just that Git makes you much more intentional about when you update “local” vs. “remote” versions of your project files.
Here are the 4 main actions that you will take whenever you work with your GitHub repository:
Pull: When you open your project on your computer, I highly recommend that you always first click “Pull”. This takes the version of your repository that is stored on GitHub remotely and pulls it to your local folder. Thus, you have synced your computer with whatever changes that other people (e.g., your instructor) has made on your repository. Get used to pulling from your repository when you first open your project, even if you haven’t made any changes since last time because I may have suggested changes for you!
Stage: Whenever you make any changes on any files in your project directory, Git will “stage” those changes. This just means that the system recognizes that you’ve made those changes (e.g., add file, delete file, edit file) and it is ready for you to commit to those changes.
Commit: When you are done making a set of changes and you are ready to upload those changes to GitHub, you first need to commit those changes. When you do this, you will write a “commit message”–i.e., a brief description of the changes that you have made. Git will make you do this–it won’t let you push those changes to GitHub unless you’ve written something. This will go in your change history, and it can be a nice way for you (and your collaborators) to understand what changes you have made.
Push: Once you have committed your changes, you are ready to push those changes onto your remote repository–i.e., the copy that is stored in GitHub.
First, click “Pull” in the Git window. This will pull any changes that are stored in the remote directory (i.e., on GitHub) that you don’t already have on your local version.
It will most likely say “You are already up to date”. But again, get in the habit of doing this when you open your project.
Now you can make any change on your project. Let’s just start by editing the readme file.
On the bottom right window, click on the “Files” tab and select the readme.txt file.
Add some text–e.g., “Here is the first change in my repository”.
Save the readme.txt file.
Now go to the Git window (top right window). You should see the change reflected in the window.
Click on “Commit” in the Git window
A new window will pop out. Check all the boxes that represent each change you’ve made.
Check the “Staged” box. Then, write a commit message. Something like, “add a line to readme file”.
Press “Commit”.
Click “Push” to upload the changes to the remote repository on GitHub.
There are many Git clients out there. Personally, I just use GitHub Desktop
Git Clients provide a graphical user interface (GUI) that facilitates the processes of committing, pushing, pulling and other tasks. It also allows you to see the history of changes to your repository.
To install GitHub Desktop, go to https://desktop.github.com/ and follow their directions.
Let’s open a repository on GitHub Desktop. Here, we’ll start by using a repository that you’ve already cloned onto your computer. Then, on the right-hand side, you will be able to push the commit(s) to GitHub.
Open GitHub Desktop and go to File > Add Local Repository.
When it asks for the local path, click Choose and find the folder for BIOS967_Fall2023 repository that you downloaded earlier (or equivalent for the semester you’re in).
Once the local repository is added, any changes you make to your local files that are not yet pushed to GitHub will appear on the right-hand side of the window, and a list of changes will appear on the left-hand side. You can choose all or some of the changes and write a commit message and commit.
If you want to pull from GitHub, click “Fetch from Origin” on the top right-hand corner. If there are any changes to pull, then it will give you that option.
If you try it, you might find that you like this interface better than pull-commit-pushing from within RStudio. You are free to do it this way. It’s just that it will require to have both RStudio and GitHub Desktop apps open to update your codes. Ultimately, you should choose the way that makes the most sense to you.
Log in to your GitHub account
click on “Repositories” on the top ribbon
Then click the big green “New repository” button
Repository name: BIOS967_YourLastName
please follow this template for naming your repository–this
allows me to easily find your repo on my end
Preferably, select “Public”. But if you select “Private”, you will then have to add me to the repository once you’ve created it!
Initialize this repository with: Check Add a README file.
Click on “Create Repository”
Now, clone the repository on your computer following the directions in section 3.2.5 above.
Finally, add me as a collaborator on your project.
Once you’ve created and cloned your repository, I suggest you make a “data” subfolder and “scripts” subfolder so that you can get organized with your files from the beginning!