The Future of Domain-Specific AI Search Lies in Targeted Agent Systems Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI TALKS, EDGE AI BLUEPRINTS as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

The Future of Domain-Specific AI Search Lies in Targeted Agent Systems

April 17, 2025 • EDGE AI FOUNDATION

Imagine your edge device having the ability to search for exactly what you need, exactly when you need it, without hallucinations or irrelevant information. That's the promise of Snipe Search's agent orchestration system, presented by co-founder Wassim Kezai in this eye-opening EDGE AI TALKS session.

Most organizations struggle when implementing RAG systems with their corporate data. The truth is, unstructured corporate knowledge is often messy and inconsistent, leading to unreliable AI responses. Semantic matching issues in traditional retrieval systems further compound these problems, especially when deployed at the edge where specific, accurate information is crucial.

Wasim unveils an innovative approach that deploys specialized AI "detective" agents to search for information from authoritative sources. Unlike brute force search methods, these agents intelligently target reliable information based on hierarchical importance. Web agents crawl and cross-reference websites, image agents find relevant visuals, scholar agents specialize in academic information, and video agents can even pinpoint the exact timestamp in video content that answers your query.

What sets this approach apart is its adaptability to domain-specific knowledge and verification frameworks. Companies can customize how information is validated based on their standards, ensuring relevance and accuracy. While traditional RAG systems respond in seconds, Snipe Search's 30-second average response time delivers significantly higher quality information – a worthwhile trade-off for mission-critical applications.

The platform integrates easily with any LLM or chatbot through Docker, API, or direct integration, making it accessible for organizations of all sizes. As edge computing continues to grow, having efficient, accurate search capabilities becomes increasingly important for reducing cloud dependencies, enhancing privacy, and delivering better user experiences.

Ready to transform how your edge devices access and utilize knowledge? Explore Snipe Search's platform launching in the coming weeks and discover how intelligent search can enhance your edge AI deployments.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1: 0:01

Hello, I'm a student of the University of California and I am a student of. Thank you, all right, we're back. Talks are back, yep, um we need to update our logo wall there we have. We have a bunch of new partners that have joined the foundation recently, so that thing needs to get a little updated. So note to self what's happening. Davis, how's Canada?

Speaker 2: 1:11

It's all good. Another day, another live stream. I'm actually dialing in from a new laptop, apple MacBook, here, so I'm testing out some of the features on this one. So lots of new stuff happening Studio lighting, I suppose. So hopefully that holds up.

Speaker 1: 1:26

Okay, cool, cool. Well, yeah, I just were you in Barcelona.

Speaker 2: 1:29

No, were you in Nuremberg, in a better world last week, unfortunately, not Lots of the team was, I know we have.

Speaker 1: 1:34

There's a huge presence there. You guys had a huge booth there. It's like a little mini city.

Speaker 2: 1:38

Yes, yes, a glimpse of the future. The future I've heard. I've heard good things. One of these days I'll see one.

Speaker 1: 1:45

I've been at previous yeses, but yeah, I mean, how was the show? It was great. I mean it was packed with people. We had a booth, a really nice booth, at gi foundation booth and, um, lots of walk up folks and packed, packed with meetings. We hosted the iot stars party on tuesday night like 200 people there. I had a panel with Zach Shelby and folks from Blues and hosted a panel with Wind River and then we did like six tech talks on the last day.

Speaker 2: 2:12

Oh, wow Mostly academic.

Speaker 1: 2:14

Yeah, which were pretty cool, pretty cool things.

Speaker 1: 2:17

Some interesting stuff around aerospace too, a lot of interesting kind of edge AI happening in the aerospace and radiation hardened systems and stuff. So yeah, it it's pretty full week. Pretty full week and, um, yeah, like I said, a lot of new partners joining and it was pretty cool, nice. Hey, let me do a couple of psas before we get into our topic of the day. I know we got a lot of people coming on the line here. Um, I think we had a world record registration for this thing, so no pressure. Um, so we've got a couple of things going on.

Speaker 1: 2:48

We're actually going to be the next place we're going to be as a community is at Computex, indovex in May and Taipei. And if you haven't been to Computex before, uh, it's another one of those gigantic shows, uh, that really rallies the whole kind of Taiwan and kind of Asia ecosystem. So we're going to be there May 20th 23rd. There's going to be an EJA pavilion there. We're going to have a bunch of partners there. We're also sponsoring yet another party called, appropriately, the Night Party. So yeah, if you find yourself in that part of the world or want to go to that part of the world, you should go there. So that's going to be coming up May 20th, 23rd in taipei. And then also we launched at our austin event our gear. So people keep asking like where you know, when do I get the gear? So here's like a nice hoodie and then on the back is like that so it's you know it's premium it's evolving.

Speaker 2: 3:44

Yeah, I, I like. I. I saw them in austin. I haven't worn one yet, but I mean, if they, if they feel as good as they look, I'm sure it's worth it, so yeah just introduce another one.

Speaker 1: 3:54

Then we have the coffee mugs and we have all this stuff. But if you want to go to uh uh hubi hobe forward slash edgi um, that's where we have the gear. We call it scholarship gear because the that's where we have the gear. We call it scholarship gear because the money that's made off of the gear goes into our scholarship fund, which supports fellowships and travel grants and underwriting all kinds of educational programs. So good way to sort of help develop kind of future edge AI expertise through the scholarship fund and also to look cool, be the coolest person in the room.

Speaker 2: 4:25

Uh or on the live stream, yeah, or on the next, next blueprint show.

Speaker 1: 4:29

I'll make sure I'm actually my room is it's a little warm here, otherwise I would wear it, but it's a little more warm. We need it here. So, yeah, maybe, yeah, in canada, maybe we should be selling these like flying off in the end, and tariff free, by the way, so, and tariff-free by the way.

Speaker 2: 4:44

So you can buy them in Canada. No extra tariffs, too kind of you.

Speaker 1: 4:48

No problem, okay, cool.

Speaker 2: 4:55

Well, let's bring on our guest, our special guest.

Speaker 1: 4:57

Wasim.

Speaker 3: 5:00

The man of the hour. There he is.

Speaker 1: 5:01

Hi, let me move you to the space of honor here, which is the big screen. So maybe you can kind of give us the five-minute intro of yourself and all that stuff and then we can kind of get into it.

Speaker 3: 5:14

Yes, sure, yeah, like you said, my name is Wassim. I am a software engineer and also the co-founder of Snipers. Like to give you just inside of my background and my career. I worked for diverse companies I worked for I am an ex of Audi and also I work also for Oracle as a software engineer. There, I mainly specialized in cloud and AI. That's how my main and it's been like maybe four or five years now I am involving in the Edge community in order to develop AI Edge components and try to bring innovation in that field and use cases.

Speaker 2: 5:57

This is the right place for Edge AI innovations. All those are safe discussion. Yeah.

Speaker 1: 6:02

And you're dialing in from Barcelona.

Speaker 3: 6:04

You mentioned Is that where you're at? Yeah, am. I'm originally from belgium, brussels, but yeah, at the moment I am uh working in barcelona and yeah, I quite enjoy it. Let's say it could be worse, could be worse could be worse awesome.

Speaker 1: 6:20

so, uh, the other thing just to take note is we have lots of folks on the live stream today asking, going to be asking lots of questions and things. We will collect those. We'll, Davis and I, will bucketize those, quantize those and at appropriate points we'll kind of cut in and we'll have Wasim address those and stuff. But I think what you're going to talk about today and actually we'll talk about the name Snipesearch- We'll talk about that.

Speaker 1: 6:47

But one of the things that people have been talking about and we have this working group for generative AI on the edge and we talk about use cases and scenarios and one of the things that a lot of companies have been trying to do is use like kind of rag architecture to train kind of language models with their own corporate data and which, you know, I would say mixed results is being generous. It's been kind of a real disaster for a lot of folks and part of it is because the corporate data is very unstructured and kind of messy and for those that have worked in big tech companies, we all know that our corporate data is kind of garbage most of it anyway. So garbage in, garbage out. But the RAG architecture itself has some challenges in terms of semantic kind of confusion.

Speaker 1: 7:36

I would say Right, and so one of the things is, when people bring language models to the edge, the scenarios are quite different, Right, they're much more specific. To the edge, the scenarios are quite different, right, they're much more specific and like, if you want to talk to your car engine about what's wrong with you, or your washing machine, about how to clean your clothes or things like that, there, you know, we don't want these things to write haikus and do cat poems. We want them to answer the questions right. So I think this was an interesting. For me this was a really interesting kind of exploration into how do you develop better training and kind of search models. You know what's the right word to?

Speaker 1: 8:08

to add the right intelligence in a very specific way to a to a language model instance yeah, the name of the game is domain specialization.

Speaker 2: 8:17

I think you'll see me I mean you'll you have? You have the floor to tell us all about this. But I think pete highlighted some good problems and I think you're you're think you're going to give us some solutions too. But, yeah, context-aware domain specialization. You don't need the whole history of the world in your edge. Ai chatbot yeah that's completely true.

Speaker 3: 8:34

I think it's how, actually one of the biggest constraints that we have with SLMs when they run on the edge, it's the context window. We can't give them a huge amount of data because the context side is small, because the number of parameters is small, and it's inherently from that, I think, a rag and search engine or other tool can bring the gap and bring, let's say, capability to the kind of SLM that they didn't have before just relying on what they are pre-trained on.

Speaker 1: 9:09

Right. So if you're walking up to a kiosk in Home Depot and asking a question, you're not going to say what's the meaning of life. You're going to ask where's the superglue?

Speaker 2: 9:17

If you are, there's a better place. That's the only question I ask.

Speaker 3: 9:20

Dial 26.

Speaker 1: 9:29

So yeah, yeah, yeah, so yeah, that's kind of important. I think, if these are really going to get deployed practically in these environments, we need a better way of kind of harvesting this kind of corporate knowledge, but we're harvesting the right knowledge in the right place at the right time. So I think that's what you're going to talk about today, which is pretty cool.

Speaker 3: 9:40

Yeah, exactly, I think like we have a lot of one of the ways that.

Speaker 3: 9:44

AI will enter in companies will be through these kind of tools because like you said it makes them domain-specific, and what we want as a business is to answer your needs and give your information that you want to your customer or to your user meaning it will be more, or less a big entry user, meaning it will be more or less, and there's a lot of research in that, in that area, and one of the application that we are seeing is seeing on the slm is how to enable like the slm with more capability, as he is in the constraint zone, meaning that he cannot have access, and we can also link it to not to only the internet but only to databases and so on, and to access and to search in that kind of knowledge that we have.

Speaker 1: 10:28

that will help him make better decisions on the edge Right a little more curated it's a curated training than the wide open web, because we know the internet's half garbage and half interesting 90% maybe, but that 10%.

Speaker 2: 10:45

it gets a lot done, and I mean I that's right, but a good segue.

Speaker 2: 10:48

Segue to your talk is you know, with with different tool chains, some rag works well, some doesn't. Depends on the data, depends on the model. I think that the the whole picture and this changing landscape. I think that's where snip search, snipe search, whatever you end up calling it. I think this is it's a. I got just a glimpse before, but it's it's a, it's a fresh look at, like you said, how this stuff will enter company life, how this will enter our actual, actual workflows, cause not all tools are equal.

Speaker 1: 11:14

Not all data is equal, but you got to get the job done somehow, right. Yeah, no that's really cool. Do you have a screen to share? Yeah, all right, we'll bring up the screen here. So actually, wasim and I saw each other. He was in austin at our event in austin, texas, which is good we enjoyed some.

Speaker 3: 11:29

Is it like a korean barbecue? Yeah, korean barbecue. I remember that everything.

Speaker 1: 11:32

There is barbecue something. Yeah, that's true. Well, you went one night. We went to a japanese restaurant with barbecue. It was like the korean restaurant barbecue. Yeah, cool, okay. So do you want me to share that? Let me know when you're ready to share it. Okay, we're going to share a screen. We see lots of people coming in here. Good morning from Germany, france, cambridge.

Speaker 2: 11:54

We've already had our first question, Pete. Here's the first question. Well, while you're working on your screen, here's the question.

Speaker 1: 11:59

Why should CIOs pay attention to the edge? So from Paula? I don't know, paula, maybe you're a CIOs pay attention to the edge. So from Paula? I don't know, paula, maybe you're a CIO, but the edge is edge. Ai is all about running AI workloads where the data is created right, and that's kind of the impetus behind that. The gravitational pull toward the edge means lower cost, lower power, more impact typically, and that can also mean things like privacy and latency and flexibility, agreed. So CIOs you know I mean who are in charge of kind of figuring out the information strategy. You know you want to move your compute as close to where the data is created as possible. That's kind of generally what you want to do. You want to avoid ingress and egress to clouds and OpEx costs and have more control over your processing in general. So I don't know, davis, you want to put your two cents in on that.

Speaker 2: 12:50

You covered the main ones. I mean, when I was at a previous role at a startup short, condensed pitch, it was cost like operational cost. You can't have 4G LTE high resolution bandwidth streaming all the time. Forget the AI part. It's just a no-brainer from a cost perspective to do as much as you can all the time. Forget the AI part. It's just a no brainer from a cost perspective to do as much as you can at the edge. That was my elevator pitch.

Speaker 1: 13:10

Cool Sounds good. All right, Waseem, let us know when you're ready to share.

Speaker 2: 13:15

Yeah, I'm ready. Have you presented?

Speaker 1: 13:16

yet you need to hit the present plus button because it's not quite showing up. I didn't hit it, that's all right.

Speaker 2: 13:26

It's live we're good, we still got plenty of time. I did set a reminder for questions, whatever platform or chat, and Pete and I will filter them to Asim, I think.

Speaker 1: 13:38

I'm going to bring it up. I think you might do some infinity screen here, by the way, asim, because you have our pictures up on your screen, so watch this Now you have the presentation there.

Speaker 3: 13:49

You go there you go. Okay, we are good.

Speaker 2: 13:52

Beautiful.

Speaker 3: 13:53

Perfect.

Speaker 1: 13:55

Now by the way, can we just spend two seconds on the name Snipesearch, Because I know in your demo it's called Snipesearch.

Speaker 3: 14:01

Yeah, exactly, we have like a location now. This is a work in progress.

Speaker 1: 14:05

The naming is still a work in progress and maybe we should throw it open to the, to the community here, for some ideas, as you, as you go through it, if you're inspired with a name. Uh, snip search, snip search. I don't know what, you know what we're gonna, what?

Speaker 2: 14:18

was not to be confused with strip search, because that's that's so. We want to go there.

Speaker 1: 14:25

But uh, anyway, just an interesting. This is the process, right, when you come up with an idea and a product name and you know someone else has the same name or whatever, so so, uh, just you know we'll cut. We'll see him some slack here, but it's an opportunity for the community to maybe get creative on it.

Speaker 3: 14:39

Yeah but and a funny fact is like all this project and this company is created on use case that you bring with the AJI Foundation actually.

Speaker 1: 14:49

That's right yeah.

Speaker 3: 14:51

It's. That's what is nice to have.

Speaker 1: 14:55

Maybe you should put some AJI snippets. Yeah, I'll think about it, cool. Why don't we give you the floor and take it away and then, like I said, Davis and I will call some questions in the background and we'll get on with it, Okay perfect, let me know when I start and we can go, okay. Hi everyone.

Speaker 3: 15:23

How's it going to all our gents? It's a pleasure to have you with us today. I will present to you Snipesource. It's our agent orchestration system that performs as an OSINT, swarm engine, um agents and swarm engine collaboration. And actually what? What is snipe search?

Speaker 3: 15:50

nature is only like some agent that we we have more or less trained, that we have give some tools that will go on the internet or go on your internal databases and actually gather the information in order to respond to your initial question, and actually it's something that integrates inside lm shot ports, your own website. I will showcase more of this just to showcase you the team that that is working on this, ahmed and me we are the two co-founders of Snipesource and just to give a bit of a roadmap of today's presentation, I will first introduce the AGI Foundation knowledge base. It's actually the knowledge that we gather in order to create the base, information to search on, in order to create the chatbot and the LLM for the HIL Foundation.

Speaker 3: 16:50

After this I will go through the Snipesource solution and how actually this agentic orchestration system can search efficiently and also accurately about the information on the defined scope, meaning like that is, web or databases, or on your YouTube channel or your website, depending on what scope that you give him. I will highlight the application that we did in order on this and then we will go to a live demo of Snipes Search and also the Snipes Search platform that will. That's also our platform that we have created in order to generate and to wrap current chatbot and LLM with our search capability. Like first of all, starting with this project, like said, it was a project that has been proposed by Pete last year that he wanted to create a knowledge base in order to leverage the access to the presentation, the PDFs, also the publication of the AHAI community and to create an LLM in order to upskill people, give access to information and also give access to people that are like involved in this, and it's more or less an entry point to leverage your skills in everything that is related to.

Speaker 3: 18:28

AI on the edge and we came up, like, with this solution that I thought the first, if I give a little bit of timeline about the project.

Speaker 3: 18:42

We first start with the retrieval of multi-generation system that we have actually linked to a tiny model that we have fine-tuned on. On. On q a answers response of of age ai content. That's what was the first project. Then, actually, we saw a lot of gaps that we needed to fill. We didn't have real-time information, meaning that everything that is published today or after the training of the model, we don't have access to it.

Speaker 3: 19:21

And also there was quite an issue on how we source references and also how we fact check information, meaning that we have trained our data, but when we train our data, we always train on the data that we have is not always 100 accurate, meaning that there was always some some garbage inside, and this actually will make our model sometimes hallucinate, and we want to bring a solution to this, and the solution that we found out is to to orchestrate, to orchestrate it ai agent in order to perform the search and to use actually search engines like google, like bing, like more other to use.

Speaker 3: 20:05

YouTube search engines to use also RAC, retrieve Logomotivity Generation Workflows. To use also Knowledge of Meta-Digital Workflows, all these techniques that actually have all good points and also bad points, and by concatenating all these tools the LLM is able to fill the majority of his gaps.

Speaker 3: 20:31

The knowledge base that we constructed and that everyone will have access in the future on the. The website of the AGI foundation is constructed about web pages that are relevant to it's constructed about web pages that are relevant to HAI and blog posts also that are relevant to HAI. We have created more or less a scrapper and a crawler that go on the net and target exactly only.

Speaker 3: 20:58

HAI knowledge content and we also have a transcript from the YouTube, daily Motion and other platform around more than 650 videos and around 1,000 documents that are composed about PDFs, presentations and also conference presentations, event presentations and actually yeah mainly all the events that happened in the NJF Foundation in the last three or four years were inside After how we have created that base knowledge.

Speaker 3: 21:37

we created some mappings in order to create some logic between people, organization, research topics and technologies, and also about collaboration, meaning that we did this using a knowledge graph. I will go in detail, more or less. It's more or less to create a relation between our base knowledge in order to understand who is linked with who, who is working with who on what, what company is working on what subject and so on. It's really to create it's more really an entity relationship matching model, like one of the limitations that we saw when we like I said, like we saw when we started developing this project at the beginning and we wanted to create this knowledge base and give it access through NLLM to the user of the AGI Foundation website website was the first one was the information overload, meaning that more you have information, more the context is huge, more it's difficult to find relevant information.

Speaker 3: 22:53

And knowing that with all, let's not say all, because like in the. Ai world when we say all it was six months ago. But yeah, we saw like, like rack systems and all this have like. An inherently issue is about semantic matching. We are using Cosinus similarity. Even thought we use a hybrid search or even thought we use a keyword search matching, there will be always a gap because there is probably norms.

Speaker 3: 23:30

There is there is some mathematical inherited issues that are not yet solved. We hope it will be solved in the future. But yeah, it's what we saw. Mainly, there is some mismatch and also there is a lot of work to be done in behind in order to have a rack system, like the effort that you have to put in behind to have a very nice rack system is huge, like pete said, and it was a very good point.

Speaker 3: 23:59

generally, when you are in companies, the data are not really well structured and it's more or less a mess, let's say, and the effort that you have to put in order to structure all this and to create your vector database with the correct chunking, with the correct link-segment, link-in-between chunk, it's a huge effort, it's achievable, but it demands a lot of resources, and that's what we saw and was like the motivation to bring a new solution.

Speaker 3: 24:33

that is a snipe search that we call snipe search that we say okay, we need to overcome more or less the gap of semantics and also the gap of the fact that you don't have this information accessible in real time, and that was one of the big blockers that made us think about bringing a new solution and we created SnapSearch. Like I said, it's an orchestration of A8 agents and each A8 agent behaves like an OSINT agent. Like you, have several tools that you can use in order to crawl the web efficiently or crawl the source of data that he's targeting.

Speaker 3: 25:17

Like I said, it's not only the web, but you can define your own scope, meaning that if you want to search in the whole web, it's your choice. If you want to choose to search only on the website that are related to your business score, you can also do it. And if you want to search inside the database or vector database, SQL database, all this is possible. Database or vector database, SQL database all this is possible, and we gave to this agent actually the tools in order to him to search efficiently inside this data source and also to match correctly and pertinently the information that are needed in order to answer the root question that you have and actually like there is a workflow that is going on on Snipesource, like the first

Speaker 3: 26:09

part is, like, the first thing that we do is like the user is asking his question and we will understand what is the intent of the question and we will do some search planning, meaning that we will, depending on the question that the user is asking, we will do some search planning, meaning that we will, depending on the question that usually is asking, we will do a search plan, a plan, a search plan that will actually we will plan how, what, we will plan what information we need in order to respond to that root query and we do it through an agent that will do this, and there is different way of creating that plan.

Speaker 3: 26:46

There is a tree of thought, there is a using chain of thought. There is also using a year shaker planner. There is a different solution in the letter.

Speaker 3: 26:57

Each one have his good point and his bad point and it's really a tradeoff to use between each of them and actually we launched this OSINT agent on the web in Parallel or on the sources in Parallel, and they will go and grab all the information that we need. It's like they have a mission that's why we call them detectives and they will create a report at the end. That's what. That's what they will produce on all the information that they have read, because they will read all the information. They will crawl the web and read PDFs and so on, but they use, like I said, they use a targeted philosophy meaning that they will not search everywhere.

Speaker 3: 27:42

They will really target authoritative sources. Meaning that the Authentic Agent has the capability to narrow down his scope. Meaning that if you ask him about an overview of the A100, he will go and check on NVIDIA website and then go down.

Speaker 3: 28:01

Meaning that he knows the agent knows the hierarchy of what is an alternative source and where more or less he needs to start his crawling and search and actually this is quite efficient and if you compare it to other solutions that are on the market that do more brute force or exhaustive search, here we are doing more and we are now without search, and it takes less time than normal.

Speaker 3: 28:37

I can give you an example like deep search or on open AI or other kind of solution of public city. There are four different type of agents, or Synth agents, that we have inside that orchestration. Each one is actually personalized and customized in order to target a type of source. To target a type of source, meaning that we have some web agents that are like that, have the tools and the capability to go on the web, crowd the web, read the web pages, cross, reference the different web pages, check the author of the web pages, the source of the web pages, the links of the web pages, the source of the web pages, the links of the web pages, the number of views, the citation, and all this and with all this information, he will create a final report with all the information, with the source of the information that answers the root question that you give him and here we go, doing this all the way.

Speaker 3: 29:38

We have an image one that will focus on gathering image, more or less. You are asking about comparing two concepts or the performance of a concept. He will go and search for some diagram that illustrates that concept. It's more or less his train on. This agent is more or less customized to do this. We have the score agent. This one will go only in academic databases, meaning Scoda, archive, ea and more, and he will search inside that content and that kind of score agent was really designed for the AJA Foundation in order to give the users the ability to have really scientific reports with really detailed reports.

Speaker 3: 30:29

That was really the purpose to create that score of agents.

Speaker 3: 30:33

And the results were quite nice, to be honest. And there is also the video agent. The video agent will go on the video search that you give him for example, you gave him youtube or the emotion and he will go search for the video that that answer your question. Meaning if you ask him about how to prepare, how to mount an mcu or specific mcu, he will go and find you the tutorial that explained to you. He will read the video and he will give you even top the minutes where the video is talking about answering your question.

Speaker 1: 31:13

Here is like the overall workflow.

Speaker 3: 31:16

to just summarize everything that I said what's happening?

Speaker 3: 31:20

we take the user query we perform an AI reasoning in order to do our search plan. Once we have our search plan, we launch, depending on which type of agent we are using. We launch the swarm of agents on the web and our target source. We grab all the data. Then there is a layer of verification where we fact check information. Actually and I will talk more in detail about the fact checking system that we put it's quite how you buy data, how you buy the source and the relevance of your data. It's really personal of the domain is very personal and domain specific. That's why you need to customize it depending on the final user that you are aiming to serve.

Speaker 1: 32:10

And then at the end we have our verified answer.

Speaker 3: 32:15

There is different way of planning your search, meaning that there is the most used one let's say, is the year-shakel planning.

Speaker 3: 32:25

It's more or less. I have a root question and I decompose this question in different sub-questions and I go in the sub-question and I also decompose, and I do this recursively until I am in the list, creating a tree structure from the root to the end. And actually I give like just an example if you want to compare two entities, the second level of the tree will be a find, the specification of the entity half and specification of the entity B, and so on. If you go one level down, it will be a find this kind of specification, this kind of specification, this kind of specification, this kind of specification. You can go logically more down and it's cool.

Speaker 3: 33:09

Your planning is very, very nice with search, it's, it's convey and it's actually much and completes the mindset of how we search for information you have also chain of thought it's more or less a step-by-step reasoning that you put in your LLM in order to learn him how he needs to plan the search. It's very customizable, but it's also how do you say it? It's less deterministic than the other one there is also tree of thought, a program of thought.

Speaker 3: 33:41

That are also other techniques of reasoning. A tree of thought is more or less to generate different tree of thought. It's more or less different Yerushalem plans. You score them, you compare them and you dynamically evaluate them and you choose at the end a final plan that is the best but it consumes a lot, Meaning that there is a lot of trade off. It's more or less the RIC plan, but dynamically done on the fly, meaning it consumes a lot of token and it's quite expensive sometimes. Like I said before, we do this reasoning and planning. Then we launched the parallel information retrieval.

Speaker 3: 34:29

We do our evaluation, meaning that, like I said, the agent go and come with a report with all the information, the source that answered the question that you give him for the mission that you give him, and once you have this know for the mission that you give him and once you have this, there is a layer that will take all the information from different uh agent and there is an algorithm, and that algorithm actually is customizable depending on how you interpret how, for you, is actually the truth, how you define the truth, because everyone doesn't define the truth the same way and, yeah, it's, it's quite something super nice. Also in in snipe search that actually will give a lot of consistency and and also demand specific validation framework in order to to validate the information, and the last layer will be to construct a coherent response for the user. And also that is customizable, meaning that depending on the output or format of the output that you want to give to the user, you can customize it quite quickly.

Speaker 3: 35:37

Snipesource have, like I said, a lot of benefits compared to traditional Rx system. It's Excel with long, complex context as we have, we can use, we can choose actually what is cool with Snipers, like it's an orchestration and we have the ability to choose the agent that you want for each step that you want. We can actually leverage the architecture of Transformer that you want in order to fulfill our final target. And I give an example if you have some 300 pages of PDF that have been found during the process of gathering information that you're a Synth agent. You need a long context LLM model in order to be able to read all this information and putting a long context model inside as an agent to process PDFs and so on.

Speaker 3: 36:32

It's a good idea. In the other side, if you want to generate the queries that you the queries, the target to everything, or the target, the source that you the targeted queries, the source that you want to read.

Speaker 3: 36:44

For this you need a more reasoning model than a long context model and actually you can replace it and you can create more or less the best, the best, the best orchestration that actually each agent will excel in the function that he have because he leveraged actually the architecture that let him achieve this. Yeah and yeah and one of the. The thing with snipe search is the latency. That it's good compared to the traditional racks generally. In, we are quite fast.

Speaker 1: 37:23

We are between 0 and 5 seconds per answer.

Speaker 3: 37:25

But if you compare it to Snipesource, Snipesource is around 30 seconds per answer. But this is quite expectable because we are going on the net we are searching on the net. We are doing things in parallel and we are searching on a bigger, huge amount of data than what we have with traditional RACs. We are doing things in parallel and we are searching on a bigger, huge amount of data than what we have with traditional right even though in Snipes search we use writes but we use them in order to interpret information into interpret uh informations now one thing that is also super nice with the Sn.

Speaker 3: 37:59

Search is Snipes. Search is a search engine. That is agnostic, meaning that Just a quick interrupt.

Speaker 1: 38:07

We had a couple of questions bubbling up, so before we get into it a little farther, I thought we would throw those on. Is that cool?

Speaker 3: 38:17

Yeah, no issue.

Speaker 1: 38:18

Here's one from Malik. Malik says the engine targets will target the internet or closed databases. I think you might have covered that one, but can you clarify?

Speaker 3: 38:33

Yeah, it depends Actually like the Snap Search will target a source of data.

Speaker 2: 38:38

okay, Now the source of data.

Speaker 3: 38:40

It could be the web, meaning it could be a uri linking stuff, meaning it should be a crawling system. He's have the ability to crawl the web and he also also have the ability to go in the database, meaning he can do create an sql query.

Speaker 3: 38:54

He can also create like a cipher in order to go and vectorize databases he have the ability to use whatever tools that the LLM is able to produce this, okay, and he have access to any type of data and actually it's your use case and where is? Your data that you want to access is that will define the way that you will, the tool that you will use, tool that you use actually.

Speaker 2: 39:24

So how does that influence let's call it, integration time or search time? I mean, if you're obviously looking at a smaller closed vector database that's lm ready, versus, you know, a large unstructured source, have you seen a lot of variability in implementation time or search time because of that?

Speaker 3: 39:39

there is like really a trade-off to do If you are doing like, if you are focusing on vector databases or SQL databases. There is, it's super fast, it's really super fast to interact.

Speaker 2: 39:53

Right makes sense.

Speaker 3: 39:54

Yeah, with these tools, but the integration and the construction of these databases take a lot of effort.

Speaker 2: 40:01

Also true. Yeah, Also true.

Speaker 3: 40:03

The search if you compare to the search one, if you go to and do a target search on the web. It's quite different. In this case you will take more time, but you don't need to prepare any data in the app.

Speaker 2: 40:16

It's already there. Yeah, it's already there. I like what you said. It depends where the source is. Your approach targets the source.

Speaker 3: 40:23

Right, exactly.

Speaker 1: 40:25

Cool. Let's try another one here from Paolo. Yeah, okay. Paolo asks how can the agent tell whether a paper is authoritative? So paper citations index, like how do you know what you're pulling Actually?

Speaker 3: 40:43

what you will do like there is different way of doing it. You can create a few shots in your LLM meaning prompts where you show him what is related to what and what is relative to what he can do a link but this technique will not work. Like you will have 90% of accuracy or something like this. I don't know exactly how, but there is a proportion that he will fail if you give him something that is too specific in a domain that is too specific.

Speaker 3: 41:14

He has not been trained like this information in the web was not pertinent, meaning that for him it will not be evident to the LLM, will not be able to rely on him way to understand this. A good way to do this is to create, more or less you can create, a list of authoritative sources by domain and you can say to the LLM OK, each time you have a query, go and check. Semantically.

Speaker 3: 41:44

You have only to say okay, if I have a question about AGI Foundation I know that AGI Foundation or about a partner like one of your partner that is working on a specific chip and you are asking the information about that specific chip and you are asking the information about yeah that specific chip we go to we know, that this is microcontrollers, then we can do a list of alternative links for microcontroller and he can actually know that the intent of the query is about microcontroller and he can go and check and infer that it is actually.

Speaker 2: 42:24

So it's the semantic part that gives you that ability the semantic understanding. Okay, yeah, that's cool.

Speaker 3: 42:31

I will not go further. Like to be honest. There is-. Offline yeah, there is things that there is we have a solution that we developed, but I will not go further on this. There is some that there is a. We have a solution that we developed, but I will not go further on this.

Speaker 2: 42:43

There is some stuff that we yeah, but there is a solution to that.

Speaker 1: 42:47

Okay, cool, let's see. Should we do one more Davis? What do you think?

Speaker 2: 42:52

There was a good one. I mean, it depends how much more Richel you have awesome. I mean, do you have a lot more slides?

Speaker 3: 42:57

We have four slides.

Speaker 2: 43:02

We have time to answer the question, okay. I, I think this one about the response time was also um. From rajagopal, that all rajagopal, that also I. I had two, so I mean, edge is off in real time. It's often about quick response times. Vector databases, rag databases offer quick responses, but the limitations you mentioned but this is a good question, so is 30 seconds is too long for response time in the context of edge AI. Is it Question mark?

Speaker 3: 43:22

It's too much. So, is working on the shortness response.

Speaker 2: 43:27

I mean, there's always authorization, there's quantization Models get smaller. This response time is probably not a one size fits all. It ranges. What do you think?

Speaker 3: 43:36

I think it's not about the overload of data that you will be scrubbing and handling, If actually you are able like. I said, one of the innovations that you bring here with Snapshot is that we target information, meaning that we are already diminishing drastically the amount of data that you are processing Now. If you are able to create super, super specific queries that target exactly what you are looking for, it will be super fast. But let's be realistic, and I don't think someone. To be honest, it's a big challenge to arrive to this efficiency.

Speaker 3: 44:15

Right right Of targeting.

Speaker 2: 44:17

But there's some tolerance, right. So I think that's the nugget of this point. Maybe, if you're an operator or machine or you're running some of these queries that you want reasoning like a human, would you take some time to think about it.

Speaker 3: 44:29

Yeah, yeah.

Speaker 2: 44:30

I think one of the big applications of edge AI.

Speaker 3: 44:32

It will be, like you said, I think you have mentioned a good point and I think the trend is going in that way.

Speaker 1: 44:40

The edge will be more or less a leverage in order to lower the cost and to enhance privacy to everything that is done on the cloud.

Speaker 2: 44:51

Like me, I'm seeing every application that is running on the cloud will need some edge.

Speaker 3: 44:58

Let's call it infrastructure. That's a leverage, ai agents, or AI, llm or SLM you can name it whatever you want in order to like diminish the cost that we have and also enhance privacy and also accuracy of what we are sending in the final, let's say end point that will take the final solution decision, all the final steps makes sense yeah, it's what we are saying now there is a lot of other like use cases that are more constrained, sensitives, like if you go to the space, you don't have access to the network or it's very expensive right, there is no cloud computing, or there is, but yeah, yeah yeah, but I think it will integrate in what, like we call the cloud world in that way and yeah yeah okay, nice, yeah, well, let's get.

Speaker 3: 45:54

I know you have a few few slides left. Let's get back in the back in the groove, yeah yeah, yeah, like just to highlight a bit about Snipeshare and something like From the project that we don't hear with the AGI we constructed that product, that is called Snipeshare and actually we noticed that in the market there is no no one that you are not able to do in the market to bring search capability in your chatbot or on your LLM in a very easy way that you can integrate directly.

Speaker 3: 46:27

That's something that we noticed and actually we saw an opportunity in this saying okay, we have developed search capability, we have developed infrastructure with an architecture of under question orchestration and the logic that is like targeting very well, information is able to go there. How can you bring this to be used to anyone? Actually create search capability using Snipeserve framework and actually directly integrated inside your website page or your company search engine or your LLM or your chatbot and so on? And one of the features that is cool here is like it can it can it can integrate with any LLM that you want, with any chatbot that you want.

Speaker 3: 47:22

It's more or less a Docker composed image that you can download and deploy, and also you can serve it as an API and so on. I will present at the end the different features of the platform, actually, but one of the big features that we bring here also it's universal integration in different kind of structure, and we saw actually that no one is doing this at the moment and there is a big gap in this in the market and we want to bring in, to make solution for, for for this, um, yeah, like, like another like, like I said I was talking about, like I said, I was talking about all the data source that we have set up, and one of, let's say, branch that the LLM can go and check is the TinyRIX system.

Speaker 3: 48:15

meaning that he has the ability to go in the web to search in every YouTube video provider is a publication on this and more or less here is what, how, how our internal system for the knowledge ai would look like we will. If you want to interact with the structure, we want to want to interact with the internal knowledge. He will take the query. Depending on the query, he'll be rooted rooted to the different node structure that we have, because we have two different structures.

Speaker 3: 49:00

One is graph one is a vector database and actually one will fulfill the gap of the other, because we have a big problem with first name and last names in vector and, in similarity, embedding that we are actually compensating with the graph store, meaning that that's why we bring two different type of data here and then we have the retrieval process and then we give it back to Snipesource in order to interpret it, and this tiny oracle system is fully trained on all the data that we gathered from the knowledge edge. Trained on all the data that we gathered from the knowledge edge.

Speaker 3: 49:46

Now, like I said, one, of the big advantages of using Snipesearch framework is that I said it's parallel. It brings a lot of precision, a lot of reality, because it has a robust citation tracking information. You can, like I said, put your personal and customized verification layer, meaning that you can buy and apply it to the domain specific of your company, of your business, of your foundation or whatever. It's adaptable and it's also have.

Speaker 3: 50:16

It's very, very easily integrable to any infrastructure that already exists, meaning that you don't need to build anything from scratch. You just need to integrate it in already your LLM or so on.

Speaker 3: 50:28

And also, something that Snipes Search brings is also privacy and security, because it will also give you the possibility to any developer to develop from scratch the model that he wants with the search capability and to handle them on cloud, on his cloud or on premise like he wants, and that can enhance a lot of privacy, also a lot of security about your data, and we know that that's a big point that's coming in the future.

Speaker 3: 51:02

It's already a big point, but it will be a big point that a lot of people will look at in the couple of years coming up. Now I will do a little live or we can do maybe some. Let's do the live demo, let's do the some do the benefit.

Speaker 2: 51:20

Let's do the live channel, let's do the demo.

Speaker 3: 51:21

Yeah let's do the demo everyone's asking for it.

Speaker 2: 51:25

Yeah, exactly yeah, let's.

Speaker 3: 51:30

Let's say what is demo, more or less what we are doing. Here we have our edge foundation l. We have here the different type of source that we can go in, Like here we are using the web one. We have asked an overview of the IA100. Here we'll go like the output that we are doing here. It's like a personal output. We wanted to emulate a search engine and in the same time synthetize answer, and you can see here that we can. It's here. Yeah, the answer is here like.

Speaker 3: 52:12

I will give you like just a little example.

Speaker 3: 52:15

He will go and he will check, like for the different source, and you can see that okay he go yeah, like we are comparing our 100 and h100 and you can see that he found the three documents that he will pass his uh his answer on. And you can see that the first document that he found was the first document that he had, was actually the official document of NVIDIA of A100. And he found another document of GPU positioning. It's also not relative and there is a comparative analysis of A100 and H100 in various workrooms. He can synthesize the direct answer meaning that here you have the details. Uh answered meaning that a shovel slated performance into two advanced that that you justify your your, your the things

Speaker 2: 53:10

and because it's like sorry yeah so you've already put this query in you. You're showing us outputs that were produced with and then what's unique about your search is this relevance authority level updates. You've created some kind of ontology or some kind of just sources that you've made accessible to the end user, so it's not just behind the scenes this is happening. You're trying to make it more transparent as well, it seems. Yeah, exactly.

Speaker 3: 53:44

You can see what we did. Like you said, the first innovation that you bring is, like you can see here that it's targeting exactly what we want. We are comparing between A100 and H100. You can see that he gathered the NVIDIA Huber architecture in depth, the technical block, and he gathered the NVIDIA A100 test code from the data sheet. He found this and on that I'm sorry, but the interface is a bit yeah.

Speaker 2: 54:11

How long would it take to produce an output like this? I mean we can- 30 seconds 30, okay, we can lunch it.

Speaker 3: 54:21

What about, can you ask?

Speaker 1: 54:22

a more general question about you know, give me some examples of using Edge AI for pedestrian safety.

Speaker 2: 54:31

Yeah, yes, let's try it. I think it's a little ironic we're talking about A100s and Edge AI.

Speaker 1: 54:39

I think the keynote's about to start by the way yeah, true, okay, I see, I see.

Speaker 3: 54:45

We can ask a more broader question what do you want to?

Speaker 1: 54:47

ask yeah, just ask about pedestrian safety.

Speaker 2: 54:50

I don't know what would come up. Can you use edge AI for pedestrian safety? Can you Edge AI for pedestrian safety, can you? Yeah, cool, yeah, let's try this query. And then there's a couple of questions, pete.

Speaker 3: 55:08

We can also bring it down. Let's see.

Speaker 2: 55:11

You need to change that a little bit. Yeah, none of those.

Speaker 3: 55:17

I-A-N yeah.

Speaker 2: 55:18

Okay, and then you need an E in the safety. Hey, maybe it's that good it knows what you mean anyway, I will do it. Okay, let's solve that.

Speaker 3: 55:32

Yeah, we can maybe answer some questions in the time he looks for the information.

Speaker 2: 55:36

Yeah, okay fair.

Speaker 1: 55:40

Here's a question from raja gopal what's the major difference between snipe search and open web?

Speaker 3: 55:45

ui from my, from my knowledge, open web ui is is only the web, the web interface. It's a. It's a open-source framework to construct LLM interfaces.

Speaker 2: 56:03

It doesn't interact with the it's not the search part, no, no, no, it's only the UI open-source interface Okay. Okay, that makes sense. There's another one from Mohamed. Can Snipeswitch be adapted to private information systems, if you see that one there?

Speaker 1: 56:25

you go, it is.

Speaker 3: 56:28

It is actually targeted and it's actually built for private information.

Speaker 2: 56:34

Yeah it's, yeah it's for that and a follow-up question how scalable are the agents? I know you talked about a lot of different dimensions agnosticity, longer context. How scalable is the platform is the follow-up question.

Speaker 3: 56:49

It depends on your budget, to be honest. Yeah, to be honest, it really depends on the budget. It could be really highly scalable, because you're launching in parallel a lot of agents and you do it asynchronously. But if you have a lot of money and you want to send 1,000, agents. There is a token consumption that you have to scale on 1,000.

Speaker 2: 57:12

It scales with the complexity of the problem and the available resources. And, yeah, that makes sense.

Speaker 1: 57:20

How's our query doing?

Speaker 2: 57:21

I think there was a challenge with it. It might have been because, of the inputs. I think Pete and I put you in the hot seat with we gave you a random query.

Speaker 2: 57:31

But yeah, well, I mean you have the link available, right? I think you actually put it in the slides for everyone that they can access the server. They can access on this. Yeah, I mean the, the tiny reg stuff. I took some some shots while you, while you're presenting, I mean there's a lot of, there's a several uh components that you guys have put together here that are behind the scenes, that I think uh need to be appreciated like the tiny rag, like the, like the actual search itself. I think that's where the UI is the UI, and I know it's part of the demo, but I think there's also a follow-up question on the open UI side. I know we're very close on time, pete. I don't know how you want to handle it.

Speaker 1: 58:09

We got like a minute left. What was this? The VLM question Is that it.

Speaker 2: 58:13

Yeah, that's a new one. Yeah, Are VLMs being used in vision language models?

Speaker 3: 58:18

Okay, we use OCR. Okay, yeah, we use them, but we don't use LLMs, multimodal LLMs, we don't use them. We use OCR to extract information.

Speaker 1: 58:33

Yeah, okay, that's what we use.

Speaker 3: 58:34

Sounds good. Yeah, yeah, nice, yeah, I think I can show also the platform.

Speaker 1: 58:41

I think we have about a minute left.

Speaker 3: 58:45

I can show the result that we have here. Oh, nice.

Speaker 2: 58:48

Ok, there you go. Excellent. I like the confidence breakdown. I think that's unique.

Speaker 3: 59:00

Yeah.

Speaker 1: 59:00

I think so, but more or less you have everything here.

Speaker 3: 59:06

You have here if you compare stuff.

Speaker 1: 59:10

I don't know why the server is down.

Speaker 3: 59:12

There is an issue with that. And here you have the links where you can actually check for the different elements and so on, like the source elements, cool.

Speaker 2: 59:22

Yeah.

Speaker 1: 59:22

Good, hey, we're going to Go ahead.

Speaker 3: 59:27

Yeah.

Speaker 3: 59:27

I want just to showcase a bit the platform that we have. It's a straightforward platform that we have built to create and you have different things that you can monitor here and also a good feature that will be available for everyone. It's like you can create directly your search engine with the plan right, the different source that you can configure, like pin search and so on. Image you can customize with the different source image that you want to target. You can add the different databases vector or sql databases and actually you can create your engine and you can monitor the engine and how we use that platform and yeah, and this will be more or less we are planning to launch this in two or three weeks, awesome.

Speaker 1: 1:00:10

All right, we'll keep an eye out for it, waseem. Thank you so much. Thanks everyone for joining in and we'll look forward to the production.

Speaker 2: 1:00:19

Another great show. Yeah, thanks Waseem, this was exciting.

Speaker 3: 1:00:21

Yeah thanks.

Speaker 2: 1:00:21

It was very nice To the future. See ya To the future, thank you.