Owners of RTX 40- and 30-series graphics cards can now setup their own personalised large language model (LLM) on their own PC. It's one that's eminently capable of sifting through old documents or distilling down the essence of YouTube videos.
Chat with RTX is now available to download from Nvidia's website for free from today, February 13. It works with any current or last generation graphics card with at least 8GB or more VRAM, which includes every desktop card bar the RTX 3050 6GB and excludes a few mid- to low-end laptop GPUs. It also requires 50—100GB of storage space on your PC, depending on the AI models downloaded.
There are two models to choose from: Mistral or Llama 2. The default is Mistral, and I'd recommend sticking with that.
The key parts of Chat with RTX are retrieval-augmented generation (RAG) and TensorRT-LLM. The former means you're able to give the LLM information and it which it will use alongside its internal training to generate accurate responses to your queries. The latter builds TensorRT engines that can exploit the silicon in Nvidia's GeForce GPUs to more efficiently run AI applications.
The result is an LLM you can feed your own data to (.txt, .pdf, and .doc filetypes) and which you can then query on that data.
For example, I've been playing around with the tool these past few days and since I create a lot of documents as a part of this job this feels like the prime dataset to stuff into its gaping maw. So I've set up Chat with RTX on my RTX 4080 powered PC (install size of 61.7GB) and fed the Mistral model over 1,300 of wonderful prose (ahem, or rather my news article drafts). I then set about asking it some questions.
First, I asked 'Could you name the articles where I mention Nvidia'
Out comes the above response listing three articles with their file path. Now, I've definitely spoken about Nvidia more than three times in 1,300 articles, so let's give that another try.
I ask again, rewording the query a little, 'Could you list every article in which I mention Nvidia'
This time eight articles are listed, this time with the Google Doc titles listed. I've mentioned Nvidia many more times than that, but you get the general idea of how this all works. Each response does appear grounded in truth, with each response citing the data used to generate it, if not always the whole truth. Simply using the Windows search function within the article dataset brings up 128 drafts including the term 'Nvidia' in the title, let alone the body copy.
Another example is if I ask Chat with RTX to tell me how many times I've used the word cheese, it tells me I've never used the phrase, citing an untitled and unrelated document as the source for the information. Nevertheless, it's probably right about the cheese thing. Until now, anyways.
Yet the tool is more exciting once you start asking it to summarise large quantities of information down into single bite-sized responses.
I asked Chat with RTX whether I should buy an Intel Core i9 14900K, and it came back to me with a trimmed version of my own 14900K review, which succinctly summarised it to “Based on the review, it appears that the Intel Core i9 14900K may not be worth the extra cost compared to the Core i9 13900K”.
Couldn't put it better myself.
I also asked Chat with RTX to summarise an article I wrote a while back on Alpine's F1 esports team, which it explained succinctly, and then to tell me about Intel's Meteor Lake processors, which I knew were covered a few times in the articles within the dataset.
Oh, and I asked it who I was. This was more to make myself feel important as the LLM returned a description of me in near-enough the same words as I used to describe myself for my site bio. Theoretically, you could just feed Chat with RTX a thousands documents on how great you are and create a narcissist's dream software.
Not that I'd do that, nope.
It's the summarising of large datasets that I could see this tool being useful for. Though I doubt everyone has such a need for that. The average PC user might not fancy a 100GB app to tell them what they already know. But, say you're working with a huge amount of responses to a survey and you want to quickly get an idea of the general thoughts and feelings of those who answered, this is one easy way to do it. But it is best used with caution and as only a guide to the inputted dataset, not as a way to accurately analyse it fully.
The other people it might appeal to are those that prefer to keep their content out of the cloud. The idea of asking an AI hosted God-knows-where to handle files that could contain sensitive information, or manuscripts for your big action movie idea, isn't that appealing to many. We've already seen what this looks like when it goes wrong courtesy of Samsung's employees. That's why a locally-run tool such as this might instead appeal.
The other use for Chat with RTX is to feed it YouTube videos then query it on the contents. I grabbed an episode of Chat Log, a podcast hosted by my colleagues Lauren Morton and Mollie Taylor, and fed it into the machine. The episode is titled 'Does the Steam Deck suit out PC gaming lifestyle so far?'
I asked, 'Is the Steam Deck easy to use day-to-day?' and a response prints out that sums up Lauren and Mollie's conversation with Tyler Colp on the matter.
I also then asked the obvious question, 'Does the Steam Deck suit their PC gaming lifestyle so far?' The response:
This feature works by downloading the transcript of the YouTube video, ingesting it, and using RAG to respond appropriately to a user's questions. It definitely appears to generate good summaries of a YouTube videos with plenty of talking, though due to the reliance of transcripts, you can't feed it anything that relies on visual information. Feed it the Grand Theft Auto VI trailer with next to no words throughout and you'll get nothing out of it.
I'm not so sure about the YouTube usage. On the one hand, I could see it being useful for a summary of a long live stream or event which you don't have time to watch yourself, though it's a chunky application to have around for those few instances where that's a thing. Similarly, the YouTube creator doesn't appear to get a view out of this, and I tend to fall in the 'AIs scraping information from creators online and offering nothing in exchange will break the very core of the internet as we know it' camp. This application alone might not make much of a difference, but I do strongly believe if you want the information provided by someone, you should at least support them creating more like it.
Anyways, the YouTube stuff takes a backseat for me with Chat with RTX. It's the mass local text file digestion that feels the most important piece of the software. As an application, it's pretty snappy. It generates responses swiftly once you hit send on a query. Though it does appear to gobble up around 85% of my VRAM—you need to be sure to close it properly with the off switch to release that back to the PC once you're done with it.
Chat with RTX is a fun concept, and a good way for Nvidia to show what's inference locally on its GeForce cards can do, but I'm not sure I'm going to keep it around on my PC. For one, it's absolutely massive due to the huge model data, but more so because the actual practical uses are pretty limited for me, personally.
Perhaps some clever clogs will come up with new and exciting ways to put it into practice now that it's available to the world. That could be you, providing you have the proper hardware. You can download Chat with RTX to try for yourself today.