My observations from experimenting with mannequin merge, analysis, and fine-tuning
Let’s proceed our studying journey of Maxime Labonne’s llm-course, which is pure gold for the group. This time, we are going to deal with mannequin merge and analysis.
Maxime has an important article titled Merge Giant Language Fashions with mergekit. I extremely suggest you test it out first. We won’t repeat the steps he has already specified by his article, however we are going to discover some particulars I got here throughout that is likely to be useful to you.
We’re going to experiment with mannequin merge and mannequin analysis within the following steps:
- Utilizing LazyMergekit, we merge two fashions from the Hugging Face hub,
mistralai/Mistral-7B-Instruct-v0.2
andjan-hq/trinity-v1
. - Run AutoEval on the bottom mannequin
mistralai/Mistral-7B-Instruct-v0.2
. - Run AutoEval on the merged mannequin
MistralTrinity-7b-slerp
. - Effective-tune the merged mannequin with a custom-made instruction dataset.
- Run AutoEval on the fine-tuned mannequin.
Let’s dive in.
First, how will we choose which fashions to merge?
Figuring out whether or not two or a number of fashions may be merged entails evaluating a number of key attributes and concerns:
- Mannequin Structure: Mannequin structure is an important consideration when merging fashions. Make sure the fashions share a suitable structure (e.g., each transformer-based). Merging dissimilar architectures is usually difficult. The Hugging Face mannequin card often particulars a mannequin’s structure. In the event you can’t discover the mannequin structure data, you possibly can try to error with Maxime’s LazyMergekit, which we are going to discover later. In the event you encounter an error, it’s often due to the incompatibility of the mannequin architectures.
- Dependencies and Libraries: Be certain that…