Raju Gandhi

Author
+ Follow
since Mar 09, 2022
Merit badge: grant badges
Biography
Raju is the founder of DefMacro Software, LLC. He lives in Columbus, Ohio, along with his wonderful wife, Michelle; their sons, Mason and Micah; and their three furry family members—their two dogs, Buddy and Skye, and Princess Zara, their cat.
Raju is a consultant, author, teacher, and regularly invited speaker at conferences around the world. In his career as both a software developer and a teacher, he believes in keeping things simple. His approach is always to understand and explain the “why,” as opposed to the “how.”
Raju blogs at https://www.looselytyped.com, and can be found on Twitter as @looselytyped. He’s always looking to make new friends, and you can find his contact information at https://www.rajugandhi.com.
For More
Cows and Likes
Cows
Total received
7
In last 30 days
0
Total given
0
Likes
Total received
10
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Raju Gandhi

It seems to me you want to create a new file, but have Git treat it like the "parent" (in that it maintains the Git history). The only (easy) way I can think of this is to create a branch, use `git mv` to rename the "parent" (Git now knows you've renamed the file and you can trace it's history), and make your modification. However, this means that you'll lose the "parent" in that branch.

If you simply want to compare performance, then look into `git worktree` that allows you to have two branches checked out at the same time—then you can run one branch, measure performance, and repeat with the other branch.

Now if you want to keep _both_ later on, that might be a tad tricky—b/c Git knows that the "parent" was renamed to "child" so when you merge Git will attempt to rename "parent" to "child". Not sure how you'd get around that.

Reference: See https://stackoverflow.com/questions/2314652/is-it-possible-to-move-rename-files-in-git-and-maintain-their-history

Hope this helps. Feel free to reach out if you have any other questions.

Raju

...so following a policy of "squash everything that goes to master" blindly is just nonsense.



If I'm working on or overseeing a larger feature, I perform an interactive rebase before I make a pull request to the main branch. During the rebase, I will squash strongly related commits together, and I will give all resulting commits descriptive names for the changes that they introduce. Finally, the feature branch containing the squashed and rebased commits is merged to main.



Great points. FWIW, when I say I disagree with "squashing", I should have been explicit—which is the use of the "squash" button that GitLab and GitHub offer. I am completely on-board with doing interactive rebases, and if need be, squashing commits to make atomic commits.

The bottom line is—make your history clean, explicit and decipherable. And interactive rebasing buys you the best of all worlds—you can make ad-hoc, WIP commits as you are working, thereby not forcing you to break flow just to organize commits, while at the same time, making sure you have good reset points in case things go awry or a branch point if you want to try something different.

Now, if I've read Raju's answers in this topic correctly, his point is that you don't need branches for this, you just branch off of a tagged commit.



Thank you for the credit, and that is usually my stance. My reasoning is simple—if you put a bug fix on a "stable" branch, you have to bring that back everywhere else—you choose cherry-picking to get that. I prefer to branch off the tag, make the hotfix, tag again, deploy, then merge that branch back into the mainline and throw the branch away.

I have said this before, and I'll say it again—I am not one to dismiss a workflow b/c if you've conceived it, you did so b/c you had good reasons. If your environment forces you to manage multiple versions simultaneously, perhaps having long running branches pointing to "stable versions" in production makes sense.

My biggest fear with long-lived branches is that it's super easy to switch to it, and accidentally commit. And usually, I've observed that with most long running branches, there is always something that is in one those branches that isn't in the integration branch—in other words, diffing a "stable" branch with the main development pipeline shows differences on both sides, when technically, it should be so that only the development pipeline has "more", or is "ahead" of the stable branches. However, if you can keep up the discipline, then so be it.

Appreciate the thoughts @Stephan, and the shout out.
@Biswajit

@Junilu brings up some very valuable points—workflows vary because how teams communicate, integrate and test their work vary. Leaving aside the vagaries of each individual team, I'd say that multiple team members should never work on the same branch. The confusion starts because we think of master, main etc as "special integration" branches—but to me, ANY branch that "integrates" the work of multiple developers is an integration branch.

Suppose you and I are working on a big feature—we create a branch off master—call it feat-a You and I then create our own feature branches off feat-a—we work, we create PRs, and merge. Occasionally we merge (or rebase) the master branch into feat-a (our "integration" branch) so we don't fall too behind, and when we are done with our feat, we issue a PR from feat-a into master. Done.

At the end of the day, both need to check-in their respective codes and preferably do a sqaush commit in 'master branch'. We then cherrypick individual  squash commits for each feature into a shortlived release branch for a releases.
The problem we face is , when we have multiple MRs (from multiple feature branches ) for same feature, sometimes it gets complicated to do cherry-pick into release branch. It seems easier if all work on same feature branch and submit just one MR when the work is ready to be merged into the master branch.



I wrote a book on Git, and I am confused about your workflow . Not to get on a soapbox, but I really don't understand this fascination that developers have grown for squashing, and cherry-picking. To me, those are the two smells in any workflow. In my book, I mentioned cherry-picking—b/c I feel the minute developers find out about it, there we are—looking at cherry-picking as a viable alternative to a reasonable flow involving merging branches.

Your problem, at least from the way you describe it, does not arise from how you are integrating work into feature (or integration) branches. Your problem arises from your choosing cherry-picking as a way to merge code. Any change you make to a reasonable integration workflow will most likely fail b/c cherry-picking is, IMO, a hack, not a solution.

I have said this in this thread, and I have said it in other threads—the point of an integration branch is much more than "bring work together". Its a way to verify if the work that just came together works together (via testing). Its a way to produce an an artifact that goes from DEV to PROD.

And to further highlight how much I feel thinking through a good workflow is, I am going to quote myself from this thread

The thing to really understand is this—how you branch, integration, tag, and build code in Git affects your entire release engineering process. This is far too often overlooked in many teams. Get the Git workflow right and suddenly the release engineering process comes into sharp focus.



If I were you, I'd ask myself, and the team—what can we do to not using cherry-picking as a way to get code in the release branch? B/c in essence—that's your problem—multiple folks issuing multiple MRs against master, and you aren't sure if you are getting ALL the ones you really want to the release branch. Ergo, your question.

Hope this helps. Feel free to reach out if you have any other questions.

Update: Edited for formatting
Ah! Damn! @Liutauras posted their response, so feel free to ignore mine as well. Apologies to all!

Regards,
Hi Liutauras,

Thanks for your response. I just wanted to clarify a few things:

- You quoted some of my statements, and responded to those. Those were not my suggestions—rather, they were my making sure I understand the OP (@Ketan's workflow). In other words, I wasn't describing GitFlow nor was I describing Trunk based, but elaborating on what Ketan said in their initial question so I could respond appropriately.

And actually I start finding Git Flow similar to what you'd know about Agile vs Waterfall in a sense, that one is more modern way of thinking, while another is more stubborn and dated.



I know many teams reach for GitFlow b/c it works, so I am hesitant to dismiss it as stubborn and dated. That's not to say you are wrong, but I am reticent to suggest one over the other till I understand the constraints that teams operate in.

That means that Master branch always should be relatively stable, that requires having quite a good test coverage (integration tests where applicable), what Junilu might alluded to as a required discipline.



In trunk based development I don't think that the master needs to be stable, especially if you reach for tools like feature toggles.

Other than that I agree with you, and you can find my thoughts (which align with yours) here—https://coderanch.com/t/750134/ide/Git-workflows#3483540

Hope this helps. Feel free to reach out if you have any other questions.
Hi Ketan,

Based on my understanding, if I put it in different words, a single branch will have all the features all team members are working on it. It is some configuration that will make it active/inactive. And based on the decision, making it active on pre-poroduction environment.



What you are referring to are "feature toggles"—a mechanism that allows you turn features on and off in a system. This, IMO, is a great way to develop software, but also requires rather disciplined engineering practices.

Which will be tested and the same artifact will move forward to production.



This is absolutely on point! Once you create an artifact[1] from a branch (say "release") that is the artifact that gets promoted from QA to Staging to Production. In other words, you do not rebuild the artifact once QA signs off. You deploy the very same artifact everywhere—maybe you have a pre-prod or even production because that is the only artifact that has been verified. Rebuilding can introduce changes (if you are building off different branches you might inadvertently introduce new and potentially untested features), but that's not it—simply rebuiding may change the dependency graph.

Hope this helps. Feel free to reach out if you have any other questions.

[1]: Some ecosystems like frontend applications in Vue or Angular make it harder to make one build that can be easily used between different environments. I am just putting this out there in case you are working with one of those.
Hi Geoff,

Git is a decentralized version control system. This, for an absolute beginner, is a huge advantage, b/c you don't need any setup to start working on Git. You don't need a server to serve up repositories, you don't need an admin to create branches yadi yadi yada. Like @Geoff and @Stephan already said, you can literally do



And boom! You have a fully functional Git repository. You can do almost everything right here—you don't need Github, or GitLab. The only piece you are missing out on is the "collaboration" piece, which, since you want something "private" doesn't matter at all.

There are tons of guides (text, video) that you can use to get some hands-on practice. And you should be able to follow all of them without ever leaving your machine. This is as private and safe as it can get.

To give you a sense of how I work, there are times when I am composing an email (say to a potential client) or writing a proposal—step 1—create a Git repository. Then put my files in that directory and manage my edits in there. That repository will never leave my machine! Once I am done composing the email or the proposal, I send it off, and then might just delete the folder!

Hope this helps. Feel free to reach out if you have any other questions.
Thanks @Tim!

I recently looked into cherry-picking as a possible solution to a problem I had



I don't encourage cherry-picking (in my book I listed in the "Things we didn't cover" chapter) b/c it can often be abused. Excessive use of cherry-picking is usually symptomatic of bad Git workflows (That's not to say I think that was your problem—it's just an observation). I can certainly help answer any questions you might have though.

Regards,
Thank you Sije! Much appreciated.

Please reach out if you have any questions about the book, it's contents and what I cover or don't cover. I have listed the ToC here https://i-love-git.com/ and you can see the detailed ToC in the Amazon preview (also linked on that site).

Regards,
@Paul

I thought that when I did a checkout ,git just over wrote my local copy with the entire file(s) that I checked out.



You are absolutely correct. But my point was you are checking out a "commit", and not files. Since Git stores the tree pointer in the commit, it knows which files a.k.a blobs (and sub-directories which are stored as nested trees inside other trees) made up the tree that was created when you made that commit.

So it unpacks the root "tree"—which tells it which "files" (blobs) are at the root level, and which sub-directories (nested trees) it needs to create at the root level. It then recurses the sub-trees and so on and so forth recreating the entire working directory just like it looked like when that commit was made.

This is also the reason why Git will sometimes not let you switch branches—b/c it essentially rewrites the entire working directory, but if you have modified or staged files, it can't do so without losing your changes. So it prompts you to either commit or stash them.
Hey, thanks for the kind words @Marco! I've been rather enamored by Git for a long time now, so if someone wants to talk Git, DevOps, Release engineering, containers, versioning, feedback loops, monitoring, eventing—well let's just say I won't shut up

Thanks again!
@Campbell Ritchie

And maybe the blobs are compressed? That would make it economical on memory space.



Yes. Blobs/trees/commits are compressed, using zlib. Pro Git has a great chapter on this as well—https://git-scm.com/book/en/v2/Git-Internals-Git-Objects and they describe how you can unpack those objects using Ruby at the very bottom.

@Paul Nisset

Especially if you have a lot of people checking out files but not pushing changes that often.



Not to nitpick but you don't checkout files in Git. You checkout commits—be that directly via git checkout or a git switch or what-have-you. Furthermore, since blobs/trees/commits are immutable, when you push or pull, Git traces the commit graph, and only fetches/pushes the "new" things.

Just for info, I looked at my own git archives to see what I could trim to save some space



In my experience, almost always when I've seen a large Git repository, it's because of binaries (which Git is terrible at managing). That's not to say that there isn't a reason to do this—assets like images for web projects, proprietary executables that do not lend themselves well to a package management system—all of these are good reasons to add and commit binaries. But if its just plain old source code, I'd never worry about large repos because Git compresses a lot of the objects, and the immutable nature of it's datastructure makes sharing easy.

One final thing to note—Git has another format call "Packfiles" that further improves disk usage. You can find more info here - https://git-scm.com/book/en/v2/Git-Internals-Packfiles

Quoting @Tim Holloway

So git doesn't carry much guilt.



Words to live by.

Hope this helps. Feel free to reach out if you have any other questions.

Regards,

[Edited for formatting]
Thank you @ Campbell Ritchie !

I most certainly will! Can't keep me away!

Hi Paul,

Not the author of the Git Merge Rebase book, but I thought I'd pitch in a few thoughts—one of the biggest misconceptions of Git is the belief that commits store diffs (or deltas). In actuality, every commit in Git stores an entire version of your project. This makes Git extremely efficient—b/c when you change from one commit to another (say you switch branches, or you checkout a commit or a tag) Git simply recreates your working directory to look like the version stored in your commit.

Another misconception is that Git stores files—this (as you might have concluded from the original article) is not true. Git stores the contents of files in blobs—and stores the metadata of the files themselves (name, path, type) somewhere else, that is the tree. The (root) tree represents the state of the index at the time you made the commit, so the commit simply stores a reference to the tree. This is another great trick—separating the contents of files (blobs) from the metadata about the files (trees) themselves.

All this leads to how Git is so efficient. This efficiency comes from Git's internal datastructure—which is comprised of blobs/trees/commits (as the article describes), except these are all immutable. This is a powerful idea, leveraged by functional programming languages like Clojure, wherein, if something is immutable, it can be shared indiscriminately. That is, even if the same file "blob" is stored in multiple commits, Git does not have to make multiple copies of it. It simply stores a reference to the blob in a tree, and that tree is recorded in a commit.

In other words, multiple commits can share the same blob!

I've written about this on my blog, if you'd like a more in-depth read:

https://looselytyped.com/blog/2014/08/31/gits-guts-part-i/
https://looselytyped.com/blog/2014/10/31/gits-guts-part-ii/

Here is the image that might explain what I've been trying to say - https://looselytyped.com/posts/2014-08-31-gits-guts-part-i/drawing.svg (The rectangles represent blobs, triangles represent trees and circles represent commits)—notice how multiple commits can reference the same blob.

What happens if the disk storing the repo runs out of space?



The same that would happen to any other system when you run out of disk space! FWIW, if you ran out of disk space, it will probably be the OS and other critical pieces of software that would break first

Hope this helps. Feel free to reach out if you have any other questions.

Regards,