نظام التحكم في الإصدارت (Git) | Version Control (Git)
أنظمة التحكم في الإصدار (VCSs) هي أدوات مستخدمة لتتبع التغييرات في التعليمات البرمجية (أو مجموعات أخرى من الملفات والمجلدات). كما يوحي الاسم، تساعد هذه الأدوات في الحفاظ على سجل التغييرات؛ علاوة على ذلك، فهي تسهل التعاون. تتعقب VCS التغييرات على مجلد ومحتوياته في سلسلة من اللقطات(سنابشوت Snapshots)، حيث تغلف كل لقطة الحالة الكاملة لكل للملفات و المجلدات داخل دليل المستوى الأعلى. تحتفظ VCS أيضًا ببيانات وصفية مثل من قام بأنشأ كل لقطة أو الرسائل المرتبطة بكل لقطة، وما إلى ذلك.
لماذا يعد التحكم في الإصدار مفيدًا؟ حتى عندما تعمل بمفردك، يمكن أن يتيح لك إلقاء نظرة على اللقطات القديمة للمشروع، والاحتفاظ بسجل لمعرفة سبب إجراء بعض التغييرات، والعمل على فروع التطوير الموازية في نفس الوقت، وأكثر من ذلك بكثير. عند العمل مع الآخرين، فهي أداة لا تقدر بثمن لمعرفة ما قد تغيره الآخرون، بالإضافة إلى حل النزاعات في تطوير البرامج المتزامنة.
تتيح لك VCS الحديثة أيضًا الإجابة بسهولة (وغالبًا تلقائيًا) على أسئلة مثل:
- من كتب هذه الوحدة؟
- متى تم تعديل هذا السطر المحدد من هذا الملف بالتحديد؟ بواسطة من؟ولماذا تم تعديله؟
- خلال آخر 1000 اصدار ، متى / لماذا توقفت وحدة اختبار معينة عن عمل؟
على الرغم من وجود VCSs أخرى ، فإن Git يعتبر الخيار الأول للتحكم في الإصدار. يلتقط كركتير XKCD الهزلي هذا سمعة Git:
لأن الطريقة التي نتعامل بها مع نظام Git غامصة, فإن تعلم التعامل مع النظام عن طريق تعلم الاوامر يقود الى كثير من الغموض, من الممكن تعلم قدر بسيط من الأوامر واستخدامها, وفي حال تعرضت الى موقف ولم تستطع التعامل مع النظام فقط قم بمحاكاة الكركتير.
في حين أن Git لديها واجهة قبيحة ، إلا أن تصميمها الأساسي وأفكارها جميلة. بينما يجب حفظ الواجهة القبيحة ، يمكن فهم التصميم الجميل. لهذا السبب ، نقدم شرحًا تصاعديًا لـ Git ، بدءًا من نموذج البيانات الخاص به ثم يغطي لاحقًا واجهة سطر الأوامر. بمجرد فهم نموذج البيانات ، يمكن فهم الأوامر بشكل أفضل من حيث كيفية معالجتها لنموذج البيانات الأساسي.
نظام بيانات Git
هناك العديد من الأساليب التي يمكنك اتباعها للتحكم في الإصدار. يمتلك Git نموذجًا مدروسًا جيدًا يتيح جميع الميزات الرائعة للتحكم في الإصدار ، مثل الحفاظ على السجل ودعم الفروع وتمكين التعاون.
اللقطات Snapshots
Git models the history of a collection of files and folders within some top-level directory as a series of snapshots. In Git terminology, a file is called a “blob”, and it’s just a bunch of bytes. A directory is called a “tree”, and it maps names to blobs or trees (so directories can contain other directories). A snapshot is the top-level tree that is being tracked. For example, we might have a tree as follows:
<root> (tree)
|
+- foo (tree)
| |
| + bar.txt (blob, contents = "hello world")
|
+- baz.txt (blob, contents = "git is wonderful")
The top-level tree contains two elements, a tree “foo” (that itself contains one element, a blob “bar.txt”), and a blob “baz.txt”.
Modeling history: relating snapshots
How should a version control system relate snapshots? One simple model would be to have a linear history. A history would be a list of snapshots in time-order. For many reasons, Git doesn’t use a simple model like this.
In Git, a history is a directed acyclic graph (DAG) of snapshots. That may sound like a fancy math word, but don’t be intimidated. All this means is that each snapshot in Git refers to a set of “parents”, the snapshots that preceded it. It’s a set of parents rather than a single parent (as would be the case in a linear history) because a snapshot might descend from multiple parents, for example, due to combining (merging) two parallel branches of development.
Git calls these snapshots “commit”s. Visualizing a commit history might look something like this:
o <-- o <-- o <-- o
^
\
--- o <-- o
In the ASCII art above, the o
s correspond to individual commits (snapshots).
The arrows point to the parent of each commit (it’s a “comes before” relation,
not “comes after”). After the third commit, the history branches into two
separate branches. This might correspond to, for example, two separate features
being developed in parallel, independently from each other. In the future,
these branches may be merged to create a new snapshot that incorporates both of
the features, producing a new history that looks like this, with the newly
created merge commit shown in bold:
o <-- o <-- o <-- o <---- o
^ /
\ v
--- o <-- o
Commits in Git are immutable. This doesn’t mean that mistakes can’t be corrected, however; it’s just that “edits” to the commit history are actually creating entirely new commits, and references (see below) are updated to point to the new ones.
Data model, as pseudocode
It may be instructive to see Git’s data model written down in pseudocode:
// a file is a bunch of bytes
type blob = array<byte>
// a directory contains named files and directories
type tree = map<string, tree | blob>
// a commit has parents, metadata, and the top-level tree
type commit = struct {
parents: array<commit>
author: string
message: string
snapshot: tree
}
It’s a clean, simple model of history.
Objects and content-addressing
An “object” is a blob, tree, or commit:
type object = blob | tree | commit
In Git data store, all objects are content-addressed by their SHA-1 hash.
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
Blobs, trees, and commits are unified in this way: they are all objects. When they reference other objects, they don’t actually contain them in their on-disk representation, but have a reference to them by their hash.
For example, the tree for the example directory structure above
(visualized using git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d
),
looks like this:
100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85 baz.txt
040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87 foo
The tree itself contains pointers to its contents, baz.txt
(a blob) and foo
(a tree). If we look at the contents addressed by the hash corresponding to
baz.txt with git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85
, we get
the following:
git is wonderful
References
Now, all snapshots can be identified by their SHA-1 hashes. That’s inconvenient, because humans aren’t good at remembering strings of 40 hexadecimal characters.
Git’s solution to this problem is human-readable names for SHA-1 hashes, called
“references”. References are pointers to commits. Unlike objects, which are
immutable, references are mutable (can be updated to point to a new commit).
For example, the master
reference usually points to the latest commit in the
main branch of development.
references = map<string, string>
def update_reference(name, id):
references[name] = id
def read_reference(name):
return references[name]
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
return load(name_or_id)
With this, Git can use human-readable names like “master” to refer to a particular snapshot in the history, instead of a long hexadecimal string.
One detail is that we often want a notion of “where we currently are” in the
history, so that when we take a new snapshot, we know what it is relative to
(how we set the parents
field of the commit). In Git, that “where we
currently are” is a special reference called “HEAD”.
Repositories
Finally, we can define what (roughly) is a Git repository: it is the data
objects
and references
.
On disk, all Git stores are objects and references: that’s all there is to Git’s
data model. All git
commands map to some manipulation of the commit DAG by
adding objects and adding/updating references.
Whenever you’re typing in any command, think about what manipulation the
command is making to the underlying graph data structure. Conversely, if you’re
trying to make a particular kind of change to the commit DAG, e.g. “discard
uncommitted changes and make the ‘master’ ref point to commit 5d83f9e
”, there’s
probably a command to do it (e.g. in this case, git checkout master; git reset
--hard 5d83f9e
).
Staging area
This is another concept that’s orthogonal to the data model, but it’s a part of the interface to create commits.
One way you might imagine implementing snapshotting as described above is to have a “create snapshot” command that creates a new snapshot based on the current state of the working directory. Some version control tools work like this, but not Git. We want clean snapshots, and it might not always be ideal to make a snapshot from the current state. For example, imagine a scenario where you’ve implemented two separate features, and you want to create two separate commits, where the first introduces the first feature, and the next introduces the second feature. Or imagine a scenario where you have debugging print statements added all over your code, along with a bugfix; you want to commit the bugfix while discarding all the print statements.
Git accommodates such scenarios by allowing you to specify which modifications should be included in the next snapshot through a mechanism called the “staging area”.
Git command-line interface
To avoid duplicating information, we’re not going to explain the commands below in detail. See the highly recommended Pro Git for more information, or watch the lecture video.
Basics
git help <command>
: get help for a git commandgit init
: creates a new git repo, with data stored in the.git
directorygit status
: tells you what’s going ongit add <filename>
: adds files to staging areagit commit
: creates a new commit- Write good commit messages!
- Even more reasons to write good commit messages!
git log
: shows a flattened log of historygit log --all --graph --decorate
: visualizes history as a DAGgit diff <filename>
: show changes you made relative to the staging areagit diff <revision> <filename>
: shows differences in a file between snapshotsgit checkout <revision>
: updates HEAD and current branch
Branching and merging
git branch
: shows branchesgit branch <name>
: creates a branchgit checkout -b <name>
: creates a branch and switches to it- same as
git branch <name>; git checkout <name>
- same as
git merge <revision>
: merges into current branchgit mergetool
: use a fancy tool to help resolve merge conflictsgit rebase
: rebase set of patches onto a new base
Remotes
git remote
: list remotesgit remote add <name> <url>
: add a remotegit push <remote> <local branch>:<remote branch>
: send objects to remote, and update remote referencegit branch --set-upstream-to=<remote>/<remote branch>
: set up correspondence between local and remote branchgit fetch
: retrieve objects/references from a remotegit pull
: same asgit fetch; git merge
git clone
: download repository from remote
Undo
git commit --amend
: edit a commit’s contents/messagegit reset HEAD <file>
: unstage a filegit checkout -- <file>
: discard changes
Advanced Git
git config
: Git is highly customizablegit clone --depth=1
: shallow clone, without entire version historygit add -p
: interactive staginggit rebase -i
: interactive rebasinggit blame
: show who last edited which linegit stash
: temporarily remove modifications to working directorygit bisect
: binary search history (e.g. for regressions).gitignore
: specify intentionally untracked files to ignore
Miscellaneous
- GUIs: there are many GUI clients out there for Git. We personally don’t use them and use the command-line interface instead.
- Shell integration: it’s super handy to have a Git status as part of your shell prompt (zsh, bash). Often included in frameworks like Oh My Zsh.
- Editor integration: similarly to the above, handy integrations with many features. fugitive.vim is the standard one for Vim.
- Workflows: we taught you the data model, plus some basic commands; we didn’t tell you what practices to follow when working on big projects (and there are many different approaches).
- GitHub: Git is not GitHub. GitHub has a specific way of contributing code to other projects, called pull requests.
- Other Git providers: GitHub is not special: there are many Git repository hosts, like GitLab and BitBucket.
Resources
- Pro Git is highly recommended reading. Going through Chapters 1–5 should teach you most of what you need to use Git proficiently, now that you understand the data model. The later chapters have some interesting, advanced material.
- Oh Shit, Git!?! is a short guide on how to recover from some common Git mistakes.
- Git for Computer Scientists is a short explanation of Git’s data model, with less pseudocode and more fancy diagrams than these lecture notes.
- Git from the Bottom Up is a detailed explanation of Git’s implementation details beyond just the data model, for the curious.
- How to explain git in simple words
- Learn Git Branching is a browser-based game that teaches you Git.
Exercises
- If you don’t have any past experience with Git, either try reading the first couple chapters of Pro Git or go through a tutorial like Learn Git Branching. As you’re working through it, relate Git commands to the data model.
- Clone the repository for the
class website.
- Explore the version history by visualizing it as a graph.
- Who was the last person to modify
README.md
? (Hint: usegit log
with an argument). - What was the commit message associated with the last modification to the
collections:
line of_config.yml
? (Hint: usegit blame
andgit show
).
- One common mistake when learning Git is to commit large files that should not be managed by Git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at this).
- Clone some repository from GitHub, and modify one of its existing files.
What happens when you do
git stash
? What do you see when runninggit log --all --oneline
? Rungit stash pop
to undo what you did withgit stash
. In what scenario might this be useful? - Like many command line tools, Git provides a configuration file (or dotfile)
called
~/.gitconfig
. Create an alias in~/.gitconfig
so that when you rungit graph
, you get the output ofgit log --all --graph --decorate --oneline
. Information about git aliases can be found here. - You can define global ignore patterns in
~/.gitignore_global
after runninggit config --global core.excludesfile ~/.gitignore_global
. Do this, and set up your global gitignore file to ignore OS-specific or editor-specific temporary files, like.DS_Store
. - Fork the repository for the class website, find a typo or some other improvement you can make, and submit a pull request on GitHub (you may want to look at this).
Licensed under CC BY-NC-SA.