Getting Started with Version Control

Introduction

In the previous post I provided an overview of version control for managing computer files. In this post I’ll continue the brief hiatus from topics in data management to instruct readers on how to get started with version control. Since version control and data management are best learned by doing, the objective of this and future posts is to get readers hands on experience with using version control and managing and analyzing data using PostgreSQL. Future posts will build on this, so if you’re interested in using this blog as a learning tool then please follow along on your personal computer to get started. I’ll provide instructions for both Windows and MacOS.

Getting Started

Below are instructions for quickly getting set up with Git and GitHub and cloning a project. In the examples provided I use GitHub. However, the steps involved are generally the same for other online repositories like GitLab. Chapter 2 “Git Basics” of Pro Git by Chacon and Straub (2020) provides detailed information about getting started with version control.

If you have trouble successfully completing any of the steps outlined below, or notice any errors on this page, please leave a comment on this post, or send me a message on the Contacts page.

The full documentation and other reference materials are available on the Git and GitHub websites at the following links:

Text Editor

A good text editor is invaluable for a data scientist. Basic text editors come with both Windows (Notepad) and Mac OS (TextEdit and Nano). However, there are a number of freely available advanced text editors. For windows I recommend using Notepad++, and for Mac OS I recommend Atom. Both are very user friendly and easy to learn. The slick thing about Atom is that it was created by the folks who manage GitHub, and thus integrates seamlessly. If you really want to get serious about a text editor, and are willing to put in the time to learn, then check out Vim.

Installing Git

To begin you’ll need to install git on your personal computer. To install git go to https://git-scm.com/downloads and then follow the instructions below.

MacOS X

If your running MacOS X you may already have git on your machine. To confirm if you have git already, open a terminal window and type “git –version” and press Enter. If the result is something like “git version 2.14.2” then you already have git.

If not then do the following:

  1. Go to https://git-scm.com/downloads
  2. Click on “Mac OS X”
  3. Follow the instructions on the Download for macOS page.
Windows

Two options for installing git on Windows are provided below. For this and future posts I’ll be showing examples from the command line using Git Bash, so if you want to follow please use option 1 below.

Option 1: Git for Windows

  1. Go to https://gitforwindows.org/
  2. Click on “Download” and under assets click on the .exe file for your operating system (e.g., Git-2.28.0-32-bit.exe)
  3. On your computer double click on the .exe file and follow the instructions.

Option 2: Git SCM

  1. Go to https://git-scm.com/downloads
  2. Click on “Windows”, the download should begin automatically.
  3. Double click on the .exe file that downloaded (it should look something like: Git-2.28.0-32-bit.exe) and follow the instructions that are provided.
  4. Install one of the many General User Interface (GUI) options.
Setting up GitHub
Figure 1. HelloWorld repository on the elfinwood-data-sci GitHub page. Note that I’ve set this up as a public repository so you can all see it without any user permissions.

Next, you’ll create an account on GitHub and create a new repository to use for learning Git.

  1. Go to https://github.com/
  2. Either log in if you already have an account, or sign up.
  3. Click on “Repositories” in the upper middle portion of the screen
  4. Click on “New” on the far right side of the screen
  5. Enter “HelloWorld” under Repository Name
  6. Enter the following description “A repository for learning git.”
  7. Select Private, and then check the checkbox next to “Add a README file” and “Add .gitignore”
  8. Select “R” in the .gitignore template dropdown list
  9. Click “Create Repository”, and voilà you just created your first Git repository
  10. Your new repository page should look something like Figure 1.
Cloning a Project
Figure 2: Cloning a project from GitHub using HTTPS.

Now that you have a remote repository set up, the next step is to copy that repository to your local machine. The process of copying a repository is called “cloning”. Start by creating a folder on your local machine called “learning_data_science”. Save this folder where ever it’s convenient for you. To clone the HelloWorld repository follow the instructions below.

Figure 3: Cloning a repository using Git Bash command line in Windows.

MacOS X

  1. Open Finder and navigate to the folder above learning_data_science.
  2. Right click on the learning_data_science folder and then select “New Terminal at Folder”. A terminal window will open.
  3. Go to your HelloWorld repository on GitHub, and on the “<>Code” tab click on the green “Code” button (Figure 2, green arrow). A drop down will appear.
  4. Click on the copy icon (Figure 2, red arrow) to copy the URL under HTTPS.
  5. Go back to the terminal and type “git clone”, and then copy in the URL, and then hit enter.
  6. A local copy of the HelloWorld repository will be cloned into the learning_data_science folder.
Windows
  1. In Windows Explorer navigate into the learning_data_science folder.
  2. Right click in the folder, and then click on “Git Bash Here”. A command line interface will open.
  3. Go to your HelloWorld repository on GitHub, and on the “<>Code” tab click on the green “Code” button (Figure 2, green arrow). A drop down will appear.
  4. Click on the copy icon (Figure 2, red arrow) to copy the URL under HTTPS.
  5. In the command line window type “git clone”, and then copy in the URL, and then hit enter (Figure 3).
  6. A local copy of the HelloWorld repository will be cloned into the learning_data_science folder.
#Cloning a repository in a Mac OS terminal
learning_data_science % git clone https://github.com/elfinwood-data-sci/HelloWorld.git

Recommended Reading

As a supplement to this post, I encourage you to read section 2.1 of Chapter 2 in Pro Git: Everything you need to know about Git.

Next Time on Elfinwood Data Science Blog

In this post I provided a quick start guide to get up and running with version control. In the next post I’ll continue the brief hiatus from data management and focus on managing files and versioning using git, including adding, committing, pushing, pulling, and viewing diffs. If you like this post then please consider subscribing to this blog (see below) or following me on social media.

Literature Cited

Chacon S. and B. Straub. 2020. Pro Git: Everything you need to know about Git. Version 2.1.264. Apress. New York, NY. 521 pp. Online here: https://git-scm.com/book/en/v2 (accessed 2020-09-26).


Follow My Blog

Join 8 other followers

Join 5 other subscribers

Copyright © 2020, Aaron Wells

One thought on “Getting Started with Version Control

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: