LLM Archives - SD Times

IBM releases next generation of Granite LLMs

Jenna Barron — Mon, 21 Oct 2024 17:53:08 +0000

IBM has announced the third-generation of its open source Granite LLM family, which features a number of different models ideal for various use cases.

“Reflecting our focus on the balance between powerful and practical, the new IBM Granite 3.0 models deliver state-of-the-art performance relative to model size while maximizing safety, speed and cost-efficiency for enterprise use cases,” IBM wrote in a blog post.

The Granite 3.0 family includes general purpose models, more guardrail and safety focused ones, and mixture-of-experts models.

The main model in this family is Granite 3.0 8B Instruct, an instruction-tuned, dense decoder-only model that offers strong performance in RAG, classification, summarization, entity extraction, and tool use. It matches open models of similar sizes on academic benchmarks and exceeds them for enterprise tasks and safety, according to IBM.

“Trained using a novel two-phase method on over 12 trillion tokens of carefully vetted data across 12 different natural languages and 116 different programming languages, the developer-friendly Granite 3.0 8B Instruct is a workhorse enterprise model intended to serve as a primary building block for sophisticated workflows and tool-based use cases,” IBM wrote.

This release also includes new Granite Guardian models that safeguard against social bias, hate, toxicity, profanity, violence, and jailbreaking, as well as perform RAG-specific checks like groundedness, context relevant, and answer relevance.

There are also a number of other models in the Granite 3.0 family, including:

Granite-3.0-8B-Base, Granite-3.0-2B-Instruct and Granite-3.0-2B-Base, which are general purpose LLMs
Granite-3.0-3B-A800M-Instruct and Granite-3.0-1B-A400M-Instruct, which are Mixture of Experts models that minimize latency and cost
Granite-3.0-8B-Instruct-Accelerator, which are speculative decoders that offer better speed and efficiency

All of the models are available under the Apache 2.0 license on Hugging Face, and Granite 3.0 8B and 2B and Granite Guardian 3.0 8B and 2B are available for commercial use on watsonx.

The company also revealed that by the end of 2024, it plans to expand all model context windows to 128K tokens, further improve multilingual support, and introduce multimodal image-in, text-out capabilities.

And in addition to releasing these new Granite models, the company also revealed the upcoming availability of the newest version of the watsonx Code Assistant, as well as plans to release new tools for developers building, customizing, and deploying AI through watsonx.ai.

The post IBM releases next generation of Granite LLMs appeared first on SD Times.

Onymos unveils enhanced OCR component DocKnow with LLM API

SD Times Newswire — Wed, 09 Oct 2024 16:03:59 +0000

Onymos, developer of solutions transforming Software-as-a-Service (SaaS) for software and application development, today announced the release of an enhanced version of its intelligent document processing component, DocKnow. The latest version revolutionizes document processing with its new ability to integrate customer-specific large language models (LLMs), enabling enterprises to extract, process, and validate data from documents with unmatched precision and speed.

Onymos DocKnow eliminates the need for time-intensive and error-prone manual data processing by using enhanced optical character recognition (OCR) to extract information from both structured and unstructured data. This includes printed and handwritten text, numbers, dates, checkboxes, barcodes, QR codes, and more from any document, including personal identification, intake forms, and health and immunization records. DocKnow can also be easily integrated with any third-party back-end information management system – such as Salesforce, AWS, Azure, and Google – or health record system.

In this latest version, DocKnow is strengthened by:

A new customer-specific LLM API: This new API enables enterprises to train their own LLMs using their specific data, resulting in more accurate and domain-specific document processing. For instance, DocKnow reliably and instantly identifies inconsistent data across hundreds of pages.
A new, helpful AI assistant: “Doc,” the Onymos AI agent, enables document processing teams – which could include healthcare professionals, legal teams, university registrars, and more – to search through specific documents and hundreds of pages for immediate access to particular information and records.
An upgraded, customizable user interface (UI): The new, simple UI includes bounding boxes, automatic zoom-in/zoom-out, image enhancement, and skew correction, which dramatically improves readability for human reviewers. It allows full customization to match an enterprise’s brand, required functionality, and back-end systems. This gives enterprise software engineering and IT teams the ability to modify the component to meet their specific needs as if they had built it from the ground up themselves.

“We understand that many enterprises struggle with time-consuming and error-prone processes like document entry, validation, and retrieval, whether it’s for patient care, student registration, or case file review. While these enterprises have started integrating AI tools powered by LLMs like ChatGPT to help with these activities, they often encounter hallucinations and outdated training data issues,” shared Shiva Nathan, Founder and CEO of Onymos. “Our enhanced DocKnow addresses these challenges by streamlining document processing and empowering enterprises to train their own LLM models tailored to their specific needs, all while ensuring privacy and security.”

As with all Onymos software components, DocKnow is designed with a no-data architecture. This means that all data passing through the solution and used to train the LLM remains securely with the enterprise using the API – no bit or byte of data flows through any Onymos systems or clouds.

For more information on Onymos and DocKnow, visit onymos.com. You can also learn more about Onymos’ no-data architecture by downloading th e white paper here.

The post Onymos unveils enhanced OCR component DocKnow with LLM API appeared first on SD Times.

GraphRAG – SD Times Open Source Project of the Week

Jenna Barron — Fri, 12 Jul 2024 15:08:48 +0000

GraphRAG is an open source research project out of Microsoft for creating knowledge graphs from datasets that can be used in retrieval-augmented generation (RAG).

RAG is an approach in which data is fed into an LLM to give more accurate responses. For instance, a company might use RAG to be able to use its own private data in a generative AI app so that employees can get responses specific to their company’s own data, such as HR policies, sales data, etc.

How GraphRAG works is that the LLM creates the knowledge graph by processing the private dataset and creating references to entities and relationships in the source data. Then the knowledge graph is used to create a bottom-up clustering where data is organized into semantic clusters. At query time, both the knowledge graph and the clusters are provided to the LLM context window.

According to Microsoft researchers, it performs well in two areas that baseline RAG typically struggles with: connecting the dots between information and summarizing large data collections.

As a test of GraphRAG’s effectiveness, the researchers used the Violent Incident Information from News Articles (VIINA) dataset, which compiles information from news reports on the war in Ukraine. This was chosen because of its complexity, presence of differing opinions and partial information, and its recency, meaning it wouldn’t be included in the LLM’s training dataset.

Both the baseline RAG and GraphRAG were able to answer the question “What is Novorossiya?” Only GraphRAG was able to answer the follow-up question “What has Novorossiya done?”

“Baseline RAG fails to answer this question. Looking at the source documents inserted into the context window, none of the text segments discuss Novorossiya, resulting in this failure. In comparison, the GraphRAG approach discovered an entity in the query, Novorossiya. This allows the LLM to ground itself in the graph and results in a superior answer that contains provenance through links to the original supporting text,” the researchers wrote in a blog post.

The second area that GraphRAG succeeds at is summarizing large datasets. Using the same VIINA dataset, the researchers ask the question “What are the top 5 themes in the data?” Baseline RAG returns back five items about Russia in general with no relation to the conflict, while GraphRAG returns much more detailed answers that more closely reflect the themes of the dataset.

“By combining LLM-generated knowledge graphs and graph machine learning, GraphRAG enables us to answer important classes of questions that we cannot attempt with baseline RAG alone. We have seen promising results after applying this technology to a variety of scenarios, including social media, news articles, workplace productivity, and chemistry. Looking forward, we plan to work closely with customers on a variety of new domains as we continue to apply this technology while working on metrics and robust evaluation. We look forward to sharing more as our research continues,” the researchers wrote.

Read about other recent Open-Source Projects of the Week:

The post GraphRAG – SD Times Open Source Project of the Week appeared first on SD Times.

Elastic launches low-code interface for experimenting with RAG implementation

Jenna Barron — Fri, 28 Jun 2024 18:31:22 +0000

Elastic has just released a new tool called Playground that will enable users to experiment with retrieval-augmented generation (RAG) more easily.

RAG is a practice in which local data is added to an LLM, such as private company data or data that is more up-to-date than the LLMs training set. This allows it to give more accurate responses and reduces the occurrence of hallucinations.

Playground offers a low-code interface for adding data to an LLM for RAG implementations. They can use any data stored in an Elasticsearch index for this.

It also allows developers to A/B test LLMs from different model providers to see what suits their needs best.

The platform can utilize transformer models in Elasticsearch and also makes use of the Elasticsearch Open Inference API that integrates with inference providers, such as Cohere and Azure AI Studio.

“While prototyping conversational search, the ability to experiment with and rapidly iterate on key components of a RAG workflow is essential to get accurate and hallucination-free responses from LLMs,” said Matt Riley, global vice president and general manager of Search at Elastic. “Developers use the Elastic Search AI platform, which includes the Elasticsearch vector database, for comprehensive hybrid search capabilities and to tap into innovation from a growing list of LLM providers. Now, the playground experience brings these capabilities together via an intuitive user interface, removing the complexity from building and iterating on generative AI experiences, ultimately accelerating time to market for our customers.”

You may also like…

RAG is the next exciting advancement for LLMs

Forrester shares its top 10 emerging technology trends for 2024

The post Elastic launches low-code interface for experimenting with RAG implementation appeared first on SD Times.

RAG is the next exciting advancement for LLMs

Jenna Barron — Tue, 04 Jun 2024 17:58:41 +0000

One of the challenges with generative AI models has been that they tend to hallucinate responses. In other words, they will present an answer that is factually incorrect, but will be confident in doing so, sometimes even doubling down when you point out that what they’re saying is wrong.

“[Large language models] can be inconsistent by nature with the inherent randomness and variability in the training data, which can lead to different responses for similar prompts. LLMs also have limited context windows, which can cause coherence issues in extended conversations, as they lack true understanding, relying instead on patterns in the data,” said Chris Kent, SVP of marketing for Clarifai, an AI orchestration company.

Retrieval-augmented generation (RAG) is picking up traction because when applied to LLMs, it can help to reduce the occurrence of hallucinations, as well as offer some other additional benefits.

“The goal of RAG is to marry up local data, or data that wasn’t used in training the actual LLM itself, so that the LLM hallucinates less than it otherwise would,” said Mike Bachman, head of architecture and AI strategy at Boomi, an iPaaS company.

He explained that LLMs are typically trained on very general data and often older data. Additionally, because it takes months to train these models, by the time it is ready, the data has become even older.

For instance, the free version of ChatGPT uses GPT-3.5, which cuts off its training data in January 2022, which is nearly 28 months ago at this point. The paid version that uses GPT-4 gets you a bit more up-to-date, but still only has information from up to April 2023.

“You’re missing all of the changes that have happened from April of 2023,” Bachman said. “In that particular case, that’s a whole year, and a lot happens in a year, and a lot has happened in this past year. And so what RAG will do is it could help shore up data that’s changed.”

As an example, in 2010 Boomi was acquired by Dell, but in 2021 Dell divested the company and now Boomi is privately owned again. According to Bachman, earlier versions of GPT-3.5 Turbo were still making references to Dell Boomi, so they used RAG to supply the LLM with up-to-date knowledge of the company so that it would stop making those incorrect references to Dell Boomi.

RAG can also be used to augment a model with private company data to provide personalized results or to support a specific use case.

“I think where we see a lot of companies using RAG, is they’re just trying to basically handle the problem of how do I make an LLM have access to real-time information or proprietary information beyond the time period or data set under which it was trained,” said Pete Pacent, head of product at Clarifai.

For instance, if you’re building a copilot for your internal sales team, you could use RAG to be able to supply it with up-to-date sales information, so that when a salesperson asks “how are we doing this quarter?” the model can actually respond with updated, relevant information, said Pacent.

The challenges of RAG

Given the benefits of RAG, why hasn’t it seen greater adoption so far? According to Clarifai’s Kent, there are a couple factors at play. First, in order for RAG to work, it needs access to multiple different data sources, which can be quite difficult, depending on the use case.

RAG might be easy for a simple use case, such as conversation search across text documents, but much more complex when you apply that use case across patient records or financial data. At that point you’re going to be dealing with data with different sources, sensitivity, classification, and access levels.

It’s also not enough to just pull in that data from different sources; that data also needs to be indexed, requiring comprehensive systems and workflows, Kent explained.

And finally, scalability can be an issue. “Scaling a RAG solution across maybe a server or small file system can be straightforward, but scaling across an org can be complex and really difficult,” said Kent. “Think of complex systems for data and file sharing now in non-AI use cases and how much work has gone into building those systems, and how everyone is scrambling to adapt and modify to work with workload intensive RAG solutions.”

RAG vs fine-tuning

So, how does RAG differ from fine-tuning? With fine-tuning, you are providing additional information to update or refine an LLM, but it’s still a static mode. With RAG, you’re providing additional information on top of the LLM. “They enhance LLMs by integrating real-time data retrieval, offering more accurate and current/relevant responses,” said Kent.

Fine-tuning might be a better option for a company dealing with the above-mentioned challenges, however. Generally, fine-tuning a model is less infrastructure intensive than running a RAG.

“So performance vs cost, accuracy vs simplicity, can all be factors,” said Kent. “If organizations need dynamic responses from an ever-changing landscape of data, RAG is usually the right approach. If the organization is looking for speed around knowledge domains, fine-tuning is going to be better. But I’ll reiterate that there are a myriad of nuances that could change those recommendations.”

The post RAG is the next exciting advancement for LLMs appeared first on SD Times.

Safe AI development: Integrating explainability and monitoring from the start

Mike Finley — Wed, 29 May 2024 13:59:52 +0000

As artificial intelligence advances at breakneck speed, using it safely while also increasing its workload is a critical concern. Traditional methods of training safe AI have focused on filtering training data or fine-tuning models post-training to mitigate risks. However, in late May, Anthropic created a detailed map of the inner workings of its Claude 3 Sonnet model, revealing how neuron-like features affect its output. These interpretable features, which can be understood across languages and modalities like sound or images, are crucial for improving AI safety. Features inside the AI can highlight, in real time, how the model is processing prompts and images. With this information, it is possible to ensure that production-grade models avoid bias and unwanted behaviors that could put safety at risk.

Large language models, such as Claude 3 alongside its predecessor, Claude 2, and rival model GPT-4, are revolutionizing how we interact with technology. As all of these AI models gain intelligence, safety becomes the critical differentiator between them. Taking steps to increase interpretability sets the stage to make AI actions and decisions transparent, de-risking the scaled-up use of AI for the enterprise.

Explainability Lays the Foundation for Safe AI

Anthropic’s paper acts like an FMRI for the “Sonnet” AI model, providing an unprecedented view into the intricate layers of language models. Neural networks are famously complicated. As Emerson once said, “If our brains were so simple that we could understand them, we would not be able to understand them!”

Considerable research has focused on understanding how self-taught learning systems operate, particularly unsupervised or auto-encoder models that learn from unlabelled data without human intervention. Better understanding could lead to more efficient training methods, saving time and energy while enhancing precision, speed, and safety.

Historical studies on visual models, some of the earliest and largest before the advent of language models, visually demonstrated how each subsequent layer in the model adds complexity. Initial layers might identify simple edges, while deeper layers could discern corners and even complete features like eyes.

By extending this understanding to language models, research shows how layers evolve from recognizing basic patterns to integrating complex contexts. This creates AI that responds consistently to a wide variety of related inputs—an attribute known as “invariance.” For example, a chart showing how a business’ sales increase over time might trigger the same behavior as a spreadsheet of numbers or an analysts’ remarks discussing the same information. Thought impossible just two years ago, the impact of this “intelligence on tap” for business cannot be underestimated, so long as it is reliable, truthful, and unbiased…in a word, safe.

Anthropic’s research lays the groundwork for integrating explainability from the outset. This proactive approach will influence future research and development in AI safety.

The Promise of Opus! Demonstrating Scalability

Anthropic’s Opus is poised to scale these principles to a much larger model by proving the success of Sonnet’s interpretability, testing whether these features hold at an even grander scale. Key questions include whether higher levels in Opus are more abstract and comprehensive, and if these features remain understandable to us or surpass our cognitive capabilities.

With evolutions in AI safety and interpretability, competitors will be compelled to follow suit. This could usher in a new wave of research focused on creating transparent and safe AI systems across the industry.

This comes at an important time. As LLMs continue to advance in speed, context windows, and reasoning, their potential applications in data analysis are expanding. The integration of models like Claude 3 and GPT-4 exemplifies the cutting-edge possibilities in modern data analytics by simplifying complex data processing and paving the way for customized, highly effective business intelligence solutions.

Whether you’re a data scientist, part of an insights and analytics team, or a Chief Technology Officer, understanding these language models will be advantageous for unlocking their potential to enhance business operations across various sectors.

Guidance for Explainable Models

A practical approach to achieving explainability is to have language models articulate their decision-making processes. While this can lead to rationalizations, sound logic will ensure these explanations are robust and reliable. One approach is to ask a model to generate step-by-step rules for decision-making. This method, especially for ethical decisions, ensures transparency and accountability, filtering out unethical attributes while preserving standards.

For non-language models, explainability can be achieved by identifying “neighbors.” This involves asking the model to provide examples from its training data that are similar to its current decision, offering insight into the model’s thought process. A similar concept known as “support vectors” asks the model to choose examples that it believes separate the best options for a decision that it has to make.

In the context of unsupervised learning models, understanding these “neighbors” helps clarify the model’s decision-making path, potentially reducing training time and power requirements while enhancing precision and safety.

The Future of AI Safety and Large Language Models

Anthropic’s recent approach to safe AI not only paves the way for more secure AI systems but also sets a new industry standard that prioritizes transparency and accountability from the ground up.

As for the future of enterprise analytics, large language models should begin moving towards specialization of tasks and clusters of cooperating AIs. Imagine deploying an inexpensive and swift model to process raw data, followed by a more sophisticated model that synthesizes these outputs. A larger context model then evaluates the consistency of these results against extensive historical data, ensuring relevance and accuracy. Finally, a specialized model dedicated to truth verification and hallucination detection scrutinizes these outputs before publication. This layered strategy, known as a “graph” approach, would reduce costs while enhancing output quality and reliability, with each model in the cluster optimized for a specific task, thus providing clearer insights into the AI’s decision-making processes.

Incorporating this into a broader framework, language models become an integral component of infrastructure—akin to storage, databases, and compute resources—tailored to serve diverse industry needs. Once safety is a core feature, the focus can be on leveraging the unique capabilities of these models to enhance enterprise applications that will provide end-users with powerful productivity suites.

The post Safe AI development: Integrating explainability and monitoring from the start appeared first on SD Times.

Apple releases eight new open LLMs

Jenna Barron — Fri, 26 Apr 2024 16:58:36 +0000

Apple has released eight new small LLMs as part of CoreNet, which is the company’s library for training deep neural networks.

The models, called OpenELM (Open-source Efficient Language Models), come in eight different options: four are pre trained models and four are instruction tuned and each comes in sizes of 270M, 250M, 1.1B, and 3B parameters.

Because of the smaller model size, the models should be able to run directly on devices instead of having to connect back to a server to do calculations.

According to Apple, the goal of OpenELM is to “empower and enrich the open research community by providing access to state-of-the-art language models.”

The models are currently only available on Hugging Face and the source code was made available by Apple.

“The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model … This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors,” the Apple researchers wrote in a paper.

The post Apple releases eight new open LLMs appeared first on SD Times.

Snowflake releases Arctic, a cost-effective LLM for enterprise intelligence

Jenna Barron — Wed, 24 Apr 2024 16:01:16 +0000

The database company Snowflake is adding another large language model (LLM) into the AI ecosystem.

Snowflake Arctic is an LLM designed for complex enterprise workloads, with cost-effectiveness as a key highlight.

It can efficiently complete enterprise intelligence tasks like SQL generation, coding, and instruction following, meeting or exceeding benchmarks in those areas when compared to the models that had a much higher training budget than it needed. According to Snowflake, these metrics are important to businesses because those are the capabilities needed to build generative AI copilots.

Arctic is on par in terms of its enterprise intelligence with LLama 3 8B and Llama 2 70B, but used less than half of the training budgets as those models. It also was on par with Llama 3 70B, which had a training budget seventeen times higher.

According to Snowflake, the low training cost will enable companies to train custom models without needing to spend excessive amounts of money.

“Building top-tier enterprise-grade intelligence using LLMs has traditionally been prohibitively expensive and resource-hungry, and often costs tens to hundreds of millions of dollars,” the Snowflake AI Research team wrote in a blog post.

In addition to being cost-effective, Arctic is open source under the Apache 2.0 license. The company also made its data recipes and research insights available to the public as well.

“By delivering industry-leading intelligence and efficiency in a truly open way to the AI community, we are furthering the frontiers of what open source AI can do,” said Sridhar Ramaswamy, CEO of Snowflake. “Our research with Arctic will significantly enhance our capability to deliver reliable, efficient AI to our customers.”

The Arctic LLM is part of the Arctic family of AI models, which includes a text-embedding model as well.

Arctic LLM is now available through Hugging Face and will be coming soon to other model gardens like Snowflake Cortex, Amazon Web Services (AWS), Microsoft Azure, NVIDIA API catalog, Lamini, Perplexity, Replicate and Together.

The post Snowflake releases Arctic, a cost-effective LLM for enterprise intelligence appeared first on SD Times.

Databricks releases new open LLM

Jenna Barron — Wed, 27 Mar 2024 20:02:38 +0000

Databricks has just launched a new LLM designed to enable customers to build and fine-tune their own custom LLMs.

The company hopes that by releasing this model, it will further democratize access to AI and enable its customers to build their own models based on their own data.

According to Databricks, the new model, DBRX, outperforms the current open source LLMs using the standard benchmarks. It also beats out GPT 3.5 on several benchmarks as well.

It was created by Mosaic AI, trained on NVIDIA DGX Cloud, and built on the MegaBlocks open source project.

“At Databricks, our vision has always been to democratize data and AI. We’re doing that by delivering data intelligence to every enterprise — helping them understand and use their private data to build their own AI systems. DBRX is the result of that aim,” said Ali Ghodsi, co-founder and CEO at Databricks.

Dirk Groeneveld, principal software engineer at Allen Institute for Artificial Intelligence (AI2), added: “We’re at an important inflection point for AI that requires a community of researchers, engineers and technologists to better understand it and drive meaningful innovation. This is why our team at AI2 is deeply committed to advancing the science of Generative AI through open model development and are excited to see new models like DBRX bringing greater transparency, accessibility and collaboration to the industry.”

The post Databricks releases new open LLM appeared first on SD Times.

Predibase launches 25 fine-tuned LLMs

Jenna Barron — Tue, 20 Feb 2024 18:15:06 +0000

Predibase has just announced a new collection of fine-tuned LLMs in a suite called LoRA Land. LoRA Land contains just over 25 LLMs that have been optimized for specific purposes and that perform as well as or better than GPT-4.

Examples of the available models include code generation, customer support automation, SQL generation, and more.

According to the company, many organizations are beginning to realize the benefit of having smaller models that work really well for one specific purpose. Its own research shows that 65% of organizations are planning on deploying two or more fine-tuned LLMs over the next 12 months.

Predibase believes the release of LoRA Land will help companies implement these models without having to pay to build them from scratch, which is often not feasible in many company’s budgets.

There is also cost-savings in actually running fine-tuned LLMs because typically they would require a dedicated GPU for each model, whereas multiple LoRA Land models can be run from a single GPU. This is because the models were developed and trained using the company’s open source LoRAX project, which allows up to 100 fine-tuned models to be run from a single GPU.

“We built LoRA Land to provide a real world example of how smaller, task-specific fine-tuned models can cost-effectively outperform leading commercial alternatives. Predibase makes it much faster and far more resource efficient for organizations to fine-tune and serve their own LLMs, and we’re happy to provide the tools and infrastructure for teams that want to start deploying specialized LLMs to power their business,” the company wrote in a blog post.

The post Predibase launches 25 fine-tuned LLMs appeared first on SD Times.