JS monorepos in prod 5: merging Git repositories and protect commit historical past

At Adaltas, we preserve a number of open-source Node.js tasks organized as Git monorepos and printed on NPM. We shared our expertise to work with Lerna monorepos in a set of articles:

Now could be the flip of our in style open-source Node CSV challenge to be migrated to a monorepo. This text will stroll you thru the obtainable approaches, technics, and instruments used emigrate a number of Node.js tasks hosted on GitHub into the Lerna monorepo. On the finish, we offer a bash script we used for migrating the Node CSV challenge. This script might be utilized to a unique challenge with just a bit modification.

Necessities for migration

The Node CSV challenge combines 4 NPM packages to work with CSV recordsdata in Node.js wrapped by the umbrella csv package deal. Every NPM package deal has its wealthy commit historical past, and we wished to save lots of the utmost data from the outdated repositories. There are our necessities for migration:

  • protect commit historical past with most data (equivalent to tags, its messages, and merging commits)
  • ameliorate commit messages to observe the Standard Commits specification
  • protect GitHub points

Monorepo construction

Nicely, we’ve got 5 NPM packages emigrate to the Lerna monorepo:

We need to obtain a listing construction that appears like this:

packages/
  csv/
  csv-generate/
  csv-parse/
  csv-stringify/
  stream-transform/
lerna.json
package deal.json

Selecting Git log technique

When migrating repositories right into a monorepo, you merge their commit logs. There are 3 instructed methods within the picture beneath.


Git log strategies

  • Single department
    It gives an easy log containing solely commits on the default (grasp) branches of all packages. Completely different logs are joined sequentially by including the newest commit of the earlier package deal as a mother or father decide to the primary commit of the following package deal. This technique breaks the sorting of the log by the date of commits.
  • A number of branches with a standard mother or father
    This improves the visible notion of the log by splitting branches of various repositories. A brand new mother or father commit is added to all the primary commits of the branches. Ultimately, all of the branches are merged into the default department.
  • A number of branches with totally different mother and father
    This technique doesn’t rewrite the primary commits of outdated repositories. It requires minimal intervention into commit historical past and appears logically extra right as a result of initially, the repositories didn’t have a standard mother or father.

Merging commit logs

Lerna has a built-in mechanism for gathering current standalone NPM packages right into a monorepo preserving commit historical past. The lerna import command imports a package deal from an exterior repository into packages/. The sequence of instructions is fairly easy: it’s worthwhile to initialize Git and Lerna repositories, make the primary commit, after which begin importing packages from domestically cloned Git repositories. You could find fundamental utilization directions within the documentation right here.

Utilizing lerna import, you possibly can solely observe the first or the 2nd Git log technique described above. For the 2nd one, it’s worthwhile to create a separate department per importing repository like this:


git checkout -b package-1
lerna import /path/to/package-1

git checkout grasp

git checkout -b package-2
lerna import /path/to/package-2

lerna import gives an easy-to-use software emigrate repositories to the Lerna monorepo. Nevertheless, it flattens the commit historical past lowering merge commits, and it doesn’t migrate tags and their messages. Sadly, these limitations didn’t meet our requirement to save lots of most data from current repositories and we had to make use of a unique software.

The native git merge command gives merging unrelated histories utilizing the --allow-unrelated-histories choice. It preserves the total commit historical past of a focused department with its tags. On this case, you’ll obtain the third Git log technique.

Merging a commit historical past of an exterior repository right into a present one utilizing --allow-unrelated-histories so simple as working 2 instructions:


git distant add -f <external-repo-name> <external-repo-path>

git merge --allow-unrelated-histories <external-repo-name>/<branch-name>

Rewriting commit messages

To place extra order and transparency into the mixed commit log, we prefix all commit messages with their package deal names. Moreover, we make them appropriate with the Standard Commits specification which we observe in our newest tasks. This specification standardizes the commit messages making them extra readable and straightforward to automate.

To implement this, we have to rewrite all commit messages by prefixing them with the string like chore(): .

We selected the chore sort simply to make it appropriate with the specification, and we didn’t need to make advanced common expressions to totally help it.

There are 2 instruments to rewrite commit messages:

Following the Git advice, we select the git filter-repo. After putting in the software utilizing these directions, the command to rewrite the commit messages of a present repository is:

git filter-repo --message-callback 'return b"chore(): " + message'

To see extra utilization examples of rewriting repository historical past with git filter-repo, you possibly can observe this documentation.

Transferring GitHub points

After migrating repositories and publishing a brand new monorepo to GitHub, we need to switch current GitHub points from the outdated repositories. Points might be transferred from one repository to a different utilizing the GitHub interface. You’ll be able to observe this information to be taught the directions.

Sadly, on the time of this writing, there isn’t any risk to make a bulk points switch. Points should be transferred one after the other. However this can provide you an excuse to “neglect” to switch annoying pending points created by the challenge neighborhood;)

What about GitHub pull requests? There shall be a loss and we’ve got to reside with it. A very good factor is that hyperlinks between points written in commentaries and linked pull requests shall be saved due to redirecting.

Migration script

The migration bash script leverages the chosen approaches and instruments described above. It generates the ./node-csv listing containing the Node CSV challenge recordsdata reorganized as a Lerna monorepo.

#!/bin/sh
set -e

REPOS=(
  https://github.com/adaltas/node-csv
  https://github.com/adaltas/node-csv-generate
  https://github.com/adaltas/node-csv-parse
  https://github.com/adaltas/node-csv-stringify
  https://github.com/adaltas/node-stream-transform
)
OUTPUT_DIR=node-csv
PACKAGES_DIR=packages

rm -rf $OUTPUT_DIR && mkdir $OUTPUT_DIR && cd $OUTPUT_DIR
git init .
git distant add origin $REPOS[0]

for repo in $REPOS[@]; do
  
  splited=($repo//// )
  package deal=$splited[$#splited[@]-1]/node-/
  
  rm -rf $TMPDIR/$package deal && mkdir $TMPDIR/$package deal && git clone $repo $TMPDIR/$package deal
  git filter-repo 
    --source $TMPDIR/$package deal 
    --target $TMPDIR/$package deal 
    --message-callback "return b'chore($package deal): ' + message"
  
  git distant add -f $package deal $TMPDIR/$package deal
  git merge --allow-unrelated-histories $package deal/grasp -m "chore($package deal): merge department 'grasp' of $repo"
  
  mkdir -p $PACKAGES_DIR/$package deal
  recordsdata=$(discover . -maxdepth 1 | egrep -v ^./.git$ | egrep -v ^.$ | egrep -v ^./$PACKAGES_DIR$)
  for file in $recordsdata// /[@]; do
    mv $file $PACKAGES_DIR/$package deal
  carried out
  git add .
  git commit -m "chore($package deal): transfer all package deal recordsdata to $PACKAGES_DIR/$package deal"
  
  git department init/$package deal $package deal/grasp
carried out

rm $PACKAGES_DIR/**/CONTRIBUTING.md
rm $PACKAGES_DIR/**/CODE_OF_CONDUCT.md
rm -rf $PACKAGES_DIR/**/.github
git add .
git commit -m "chore: take away outdated packages recordsdata"

To run this script, merely create an executable file, for instance with the title migrate.sh, paste the script’s content material inside it, and run it with the command:

chmod u+x ./migrate.sh
./migrate.sh

Observe! Don’t neglect to put in git-filter-repo earlier than working the script.

Notes for every step of the script:

  • 1. Configure
    Configuration variables outline the record of repositories to be migrated, the vacation spot listing of the brand new Lerna monorepo, and the folder for packages inside it. You’ll be able to modify these variables to reuse this script to your challenge.
  • 2. Initialize a brand new repository
    We initialize a brand new repository. The primary repository can also be registered because the distant origin repository.
  • 3. Migrate repositories
    • 3.1. Get package deal title
      It extracts package deal names from their repositories hyperlinks. In our case, the repositories are prefixed with node- which we don’t need to preserve.
    • 3.2. Rewrite commit messages by way of a short lived repository
      So as to add a prefix to the commits of every package deal utilizing the sample chore(): , we have to make it individually for each repository. That is potential by way of a repository domestically cloned to a short lived folder.
    • 3.3. Merge the repository into monorepo
      At first, we add a domestically cloned repository as a distant to the monorepo. Then, we merge its commit historical past specifying a merge commit message.
    • 3.4. Transfer repository recordsdata to the packages folder
      After merging, the recordsdata of the merged repository seem below the monorepo root listing. Following the construction we need to obtain, we transfer these recordsdata to the packages listing and commit it.
    • 3.5. Create a brand new department
      The commit historical past is now related to our monorepos by a distant repository. The historical past shall be misplaced if the unique repository is erased. To retailer the historical past within the monorepo, we create a department which monitor the distant repository and prefixed it with init/.
  • 4. Cleanup and take away outdated recordsdata
    For the sake of illustration, we clear up some package deal recordsdata which might be outdated due to the migration. A few of these file shall be moved to the repository root listing.

Additional steps

The GIT repository is now prepared and, as such, qualifies as a monorepo. To make it usuable, additionnal recordsdata should be created equivalent to a root package deal.json file, the lerna.json configuration file if utilizing Lerna and a README file. Consult with the primary article of our serie to use the required adjustments and initiliaze your monorepo with Lerna.

Conclusion

Migration of current open-source tasks requires you to be tidy and meticulous as a result of slightly mistake can damage the job of your customers. All of the steps should be rigorously analyzed and nicely examined. On this article, we’ve got lined the scope of labor emigrate a number of Node.js tasks to the Lerna monorepo. We now have thought-about totally different approaches, technics and obtainable instruments to automate the migration on the instance of our Node CSV open-source challenge.