diff --git a/Documentation/Makefile b/Documentation/Makefile index 6fb83d0c6ebf22..5f4acfacbdb6f0 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -52,6 +52,7 @@ MAN7_TXT += gitcli.adoc MAN7_TXT += gitcore-tutorial.adoc MAN7_TXT += gitcredentials.adoc MAN7_TXT += gitcvs-migration.adoc +MAN7_TXT += gitdatamodel.adoc MAN7_TXT += gitdiffcore.adoc MAN7_TXT += giteveryday.adoc MAN7_TXT += gitfaq.adoc diff --git a/Documentation/gitdatamodel.adoc b/Documentation/gitdatamodel.adoc new file mode 100644 index 00000000000000..b54ff0e52b27ed --- /dev/null +++ b/Documentation/gitdatamodel.adoc @@ -0,0 +1,302 @@ +gitdatamodel(7) +=============== + +NAME +---- +gitdatamodel - Git's core data model + +SYNOPSIS +-------- +gitdatamodel + +DESCRIPTION +----------- + +It's not necessary to understand Git's data model to use Git, but it's +very helpful when reading Git's documentation so that you know what it +means when the documentation says "object", "reference" or "index". + +Git's core operations use 4 kinds of data: + +1. <>: commits, trees, blobs, and tag objects +2. <>: branches, tags, + remote-tracking branches, etc +3. <>, also known as the staging area +4. <>: logs of changes to references ("ref log") + +[[objects]] +OBJECTS +------- + +All of the commits and files in a Git repository are stored as "Git objects". +Git objects never change after they're created, and every object has an ID, +like `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. + +This means that if you have an object's ID, you can always recover its +exact contents as long as the object hasn't been deleted. + +Every object has: + +[[object-id]] +1. an *ID* (aka "object name"), which is a cryptographic hash of its + type and contents. + It's fast to look up a Git object using its ID. + This is usually represented in hexadecimal, like + `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. +2. a *type*. There are 4 types of objects: + <>, <>, <>, + and <>. +3. *contents*. The structure of the contents depends on the type. + +Here's how each type of object is structured: + +[[commit]] +commit:: + A commit contains these required fields + (though there are other optional fields): ++ +1. The full directory structure of all the files in that version of the + repository and each file's contents, stored as the *<>* ID + of the commit's base directory +2. Its *parent commit ID(s)*. The first commit in a repository has 0 parents, + regular commits have 1 parent, merge commits have 2 or more parents +3. An *author* and the time the commit was authored +4. A *committer* and the time the commit was committed +5. A *commit message* ++ +Here's how an example commit is stored: ++ +---- +tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a +parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647 +author Maya 1759173425 -0400 +committer Maya 1759173425 -0400 + +Add README +---- ++ +Like all other objects, commits can never be changed after they're created. +For example, "amending" a commit with `git commit --amend` creates a new +commit with the same parent. ++ +Git does not store the diff for a commit: when you ask Git to show +the commit with linkgit:git-show[1], it calculates the diff from its +parent on the fly. + +[[tree]] +tree:: + A tree is how Git represents a directory. + It can contain files or other trees (which are subdirectories). + It lists, for each item in the tree: ++ +1. The *filename*, for example `hello.py` +2. The *file mode*. These are all of the file modes in Git. + They're only spiritually related to Unix file modes. ++ + - `100644`: regular file (with <> `blob`) + - `100755`: executable file (with type `blob`) + - `120000`: symbolic link (with type `blob`) + - `040000`: directory (with type `tree`) + - `160000`: gitlink, for use with submodules (with type `commit`) + +3. The <> with the contents of the file or directory ++ +For example, this is how a tree containing one directory (`src`) and one file +(`README.md`) is stored: ++ +---- +100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md +040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src +---- + +[[blob]] +blob:: + A blob object contains a file's contents. ++ +When you make a commit, Git stores the full contents of each file that +you changed as a blob. +For example, if you have a commit that changes 2 files in a repository +with 1000 files, that commit will create 2 new blobs, and use the +previous blob ID for the other 998 files. +This means that commits can use relatively little disk space even in a +very large repository. + +[[tag-object]] +tag object:: + Tag objects contain these required fields + (though there are other optional fields): ++ +1. The *ID* of the object it references +2. The *type* of the object it references +3. The *tagger* and tag date +4. A *tag message*, similar to a commit message + +Here's how an example tag object is stored: + +---- +object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3 +type commit +tag v1.0.0 +tagger Maya 1759927359 -0400 + +Release version 1.0.0 +---- + +NOTE: All of the examples in this section were generated with +`git cat-file -p `. + +[[references]] +REFERENCES +---------- + +References are a way to give a name to a commit. +It's easier to remember "the changes I'm working on are on the `turtle` +branch" than "the changes are in commit bb69721404348e". +Git often uses "ref" as shorthand for "reference". + +References can either refer to: + +1. An object ID, usually a <> ID +2. Another reference. This is called a "symbolic reference" + +References are stored in a hierarchy, and Git handles references +differently based on where they are in the hierarchy. +Most references are under `refs/`. Here are the main types: + +[[branch]] +branches: `refs/heads/`:: + A branch refers to a commit ID. + That commit is the latest commit on the branch. ++ +To get the history of commits on a branch, Git will start at the commit +ID the branch references, and then look at the commit's parent(s), +the parent's parent, etc. + +[[tag]] +tags: `refs/tags/`:: + A tag refers to a commit ID, tag object ID, or other object ID. + There are two types of tags: + 1. "Annotated tags", which reference a <> ID + which contains a tag message + 2. "Lightweight tags", which reference a commit, blob, or tree ID + directly ++ +Even though branches and tags both refer to a commit ID, Git +treats them very differently. +Branches are expected to change over time: when you make a commit, Git +will update your <> to point to the new commit. +Tags are usually not changed after they're created. + +[[HEAD]] +HEAD: `HEAD`:: + `HEAD` is where Git stores your current <>, + if there is a current branch. `HEAD` can either be: ++ +1. A symbolic reference to your current branch, for example `ref: + refs/heads/main` if your current branch is `main`. +2. A direct reference to a commit ID. In this case there is no current branch. + This is called "detached HEAD state", see the DETACHED HEAD section + of linkgit:git-checkout[1] for more. + +[[remote-tracking-branch]] +remote-tracking branches: `refs/remotes//`:: + A remote-tracking branch refers to a commit ID. + It's how Git stores the last-known state of a branch in a remote + repository. `git fetch` updates remote-tracking branches. When + `git status` says "you're up to date with origin/main", it's looking at + this. ++ +`refs/remotes//HEAD` is a symbolic reference to the remote's +default branch. This is the branch that `git clone` checks out by default. + +[[other-refs]] +Other references:: + Git tools may create references anywhere under `refs/`. + For example, linkgit:git-stash[1], linkgit:git-bisect[1], + and linkgit:git-notes[1] all create their own references + in `refs/stash`, `refs/bisect`, etc. + Third-party Git tools may also create their own references. ++ +Git may also create references other than `HEAD` at the base of the +hierarchy, like `ORIG_HEAD`. + +NOTE: Git may delete objects that aren't "reachable" from any reference +or <>. +An object is "reachable" if we can find it by following tags to whatever +they tag, commits to their parents or trees, and trees to the trees or +blobs that they contain. +For example, if you amend a commit with `git commit --amend`, +there will no longer be a branch that points at the old commit. +The old commit is recorded in the current branch's <>, +so it is still "reachable", but when the reflog entry expires it may +become unreachable and get deleted. + +the old commit will usually not be reachable, so it may be deleted eventually. +Reachable objects will never be deleted. + +[[index]] +THE INDEX +--------- +The index, also known as the "staging area", is a list of files and +the contents of each file, stored as a <>. +You can add files to the index or update the contents of a file in the +index with linkgit:git-add[1]. This is called "staging" the file for commit. + +Unlike a <>, the index is a flat list of files. +When you commit, Git converts the list of files in the index to a +directory <> and uses that tree in the new <>. + +Each index entry has 4 fields: + +1. The *file mode*, which must be one of: + - `100644`: regular file (with <> `blob`) + - `100755`: executable file (with type `blob`) + - `120000`: symbolic link (with type `blob`) + - `160000`: gitlink, for use with submodules (with type `commit`) +2. The *<>* ID of the file, + or (rarely) the *<>* ID of the submodule +3. The *stage number*, either 0, 1, 2, or 3. This is normally 0, but if + there's a merge conflict there can be multiple versions of the same + filename in the index. +4. The *file path*, for example `src/hello.py` + +It's extremely uncommon to look at the index directly: normally you'd +run `git status` to see a list of changes between the index and <>. +But you can use `git ls-files --stage` to see the index. +Here's the output of `git ls-files --stage` in a repository with 2 files: + +---- +100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md +100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py +---- + +[[reflogs]] +REFLOGS +------- + +Every time a branch, remote-tracking branch, or HEAD is updated, Git +updates a log called a "reflog" for that <>. +This means that if you make a mistake and "lose" a commit, you can +generally recover the commit ID by running `git reflog `. + +A reflog is a list of log entries. Each entry has: + +1. The *commit ID* +2. *Timestamp* when the change was made +3. *Log message*, for example `pull: Fast-forward` + +Reflogs only log changes made in your local repository. +They are not shared with remotes. + +You can view a reflog with `git reflog `. +For example, here's the reflog for a `main` branch which has changed twice: + +---- +$ git reflog main --date=iso --no-decorate +750b4ea main@{2025-09-29 15:17:05 -0400}: commit: Add README +4ccb6d7 main@{2025-09-29 15:16:48 -0400}: commit (initial): Initial commit +---- + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Documentation/glossary-content.adoc b/Documentation/glossary-content.adoc index e423e4765b71b0..20ba121314b9a4 100644 --- a/Documentation/glossary-content.adoc +++ b/Documentation/glossary-content.adoc @@ -297,8 +297,8 @@ This commit is referred to as a "merge commit", or sometimes just a identified by its <>. The objects usually live in `$GIT_DIR/objects/`. -[[def_object_identifier]]object identifier (oid):: - Synonym for <>. +[[def_object_identifier]]object identifier, object ID, oid:: + Synonyms for <>. [[def_object_name]]object name:: The unique identifier of an <>. The diff --git a/Documentation/meson.build b/Documentation/meson.build index e34965c5b0e236..ace0573e8272e0 100644 --- a/Documentation/meson.build +++ b/Documentation/meson.build @@ -192,6 +192,7 @@ manpages = { 'gitcore-tutorial.adoc' : 7, 'gitcredentials.adoc' : 7, 'gitcvs-migration.adoc' : 7, + 'gitdatamodel.adoc' : 7, 'gitdiffcore.adoc' : 7, 'giteveryday.adoc' : 7, 'gitfaq.adoc' : 7,