{"id":195793,"date":"2025-03-15T11:33:03","date_gmt":"2025-03-15T16:33:03","guid":{"rendered":"https:\/\/narcolepticnerd.com\/2025\/03\/15\/introducing-paligemma-2-mix-a-vision-language-model-for-multiple-tasks\/"},"modified":"2025-03-15T11:33:03","modified_gmt":"2025-03-15T16:33:03","slug":"introducing-paligemma-2-mix-a-vision-language-model-for-multiple-tasks","status":"publish","type":"post","link":"https:\/\/narcolepticnerd.com\/2025\/03\/15\/introducing-paligemma-2-mix-a-vision-language-model-for-multiple-tasks\/","title":{"rendered":"Introducing PaliGemma 2 mix: A vision-language model for multiple tasks"},"content":{"rendered":"<p><\/p>\n<div>\n<p data-block-key=\"zt635\">This past December, <a href=\"https:\/\/developers.googleblog.com\/en\/introducing-paligemma-2-powerful-vision-language-models-simple-fine-tuning\/\">we launched PaliGemma 2<\/a>, an upgraded vision-language model in the <a href=\"https:\/\/ai.google.dev\/gemma\">Gemma<\/a> family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance.<\/p>\n<p data-block-key=\"equrv\">Now, we\u2019re thrilled to announce the launch of PaliGemma 2 mix checkpoints. PaliGemma 2 mix are models tuned to a mixture of tasks that allow directly exploring the model capabilities and using it out-of-the-box for common use cases.<\/p>\n<h2 data-block-key=\"bk99l\"><b>What\u2019s new in PaliGemma 2 mix?<\/b><\/h2>\n<ul>\n<li data-block-key=\"8t0ve\"><b>Multiple tasks with one model<\/b>: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation.<\/li>\n<\/ul>\n<ul>\n<li data-block-key=\"549q3\"><b>Developer-friendly sizes<\/b>: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).<\/li>\n<\/ul>\n<p data-block-key=\"2tcmi\">If you were already using the original PaliGemma mix checkpoints, you can directly upgrade to PaliGemma 2 without needing to do any changes. The model performs different tasks depending on how it\u2019s prompted. You can review the different prompt task syntax in the <a href=\"https:\/\/ai.google.dev\/gemma\/docs\/paligemma\/prompt-system-instructions\">official documentation<\/a> and learn more about how PaliGemma 2 was developed in our <a href=\"https:\/\/arxiv.org\/abs\/2412.03555\">technical report<\/a>.<\/p>\n<h3 data-block-key=\"3toap\"><b><br \/>Detection<\/b><\/h3>\n<ul>\n<li data-block-key=\"15llt\">Task: Detection (PaliGemma-2-3b-mix-224)<\/li>\n<li data-block-key=\"77rg3\">Input: &#8220;detect android\\n&#8221;<\/li>\n<\/ul>\n<\/div>\n<div>\n<p data-block-key=\"oy0xd\"><b>Result:<\/b> <code>a cow standing on a beach next to a sign that says warning dangerous rip current.<\/code><\/p>\n<p data-block-key=\"fr3vh\"><b>Optical Character Recognition (OCR)<\/b><\/p>\n<\/div>\n<div>\n<p data-block-key=\"oy0xd\"><b>Result:<\/b> <code>A cow standing on a beach next to a warning sign.<\/code><\/p>\n<\/div>\n<div>\n<p data-block-key=\"oy0xd\"><b>Result:<\/b><\/p>\n<p data-block-key=\"40f5\"><code>WARNING DANGEROUS<\/code><\/p>\n<p data-block-key=\"c3p01\"><code>RIP CURRENT<\/code><\/p>\n<h2 data-block-key=\"ebub4\"><b><br \/>Get Started Today<\/b><\/h2>\n<p data-block-key=\"mtog\">Ready to discover the potential of PaliGemma 2? Here is how you can explore the mix model capabilities:<\/p>\n<ul>\n<li data-block-key=\"7tg5p\"><b>Try out the mix model with a few clicks:<\/b> Explore the mix model capabilities directly on the <a href=\"https:\/\/huggingface.co\/spaces\/google\/paligemma2-10b-mix\">Hugging Face demo<\/a>.<\/li>\n<\/ul>\n<ul>\n<li data-block-key=\"9of3j\"><b>Learn how to run the model<\/b>: Try out the Keras <a href=\"https:\/\/ai.google.dev\/gemma\/docs\/paligemma\/inference-with-keras\">inference notebook<\/a> directly in Google Colab or locally.<\/li>\n<\/ul>\n<p data-block-key=\"8mgbv\">While PaliGemma 2 mix has strong performance across multiple tasks, you will get the best results by fine-tuning PaliGemma 2 in your own task or domain. To learn how to do it, dive into our <a href=\"https:\/\/ai.google.dev\/gemma\/docs\/paligemma\">comprehensive documentation<\/a>, check our official <a href=\"https:\/\/github.com\/google-gemini\/gemma-cookbook\/tree\/main\/PaliGemma\">example notebooks for Keras and JAX<\/a>, or use the <a href=\"https:\/\/github.com\/merveenoyan\/smol-vision\/blob\/main\/Fine_tune_PaliGemma.ipynb\">Hugging Face transformers example<\/a>. We\u2019re looking forward to seeing what you build with it!<\/p>\n<\/div>\n<p><a href=\"https:\/\/developers.googleblog.com\/en\/introducing-paligemma-2-mix\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":195794,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[4317],"tags":[4593,4663,4665,4666,4662,4667,4664],"class_list":["post-195793","post","type-post","status-publish","format-standard","has-post-thumbnail","category-software","tag-introducing","tag-mix","tag-model","tag-multiple","tag-paligemma","tag-tasks","tag-visionlanguage"],"acf":[],"_links":{"self":[{"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/posts\/195793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/comments?post=195793"}],"version-history":[{"count":0,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/posts\/195793\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/media\/195794"}],"wp:attachment":[{"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/media?parent=195793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/categories?post=195793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/narcolepticnerd.com\/wp-json\/wp\/v2\/tags?post=195793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}