EDGE AI POD
Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.
These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.
Join us to stay informed and inspired!
EDGE AI POD
The Future of Domain-Specific AI Search Lies in Targeted Agent Systems
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Imagine your edge device having the ability to search for exactly what you need, exactly when you need it, without hallucinations or irrelevant information. That's the promise of Snipe Search's agent orchestration system, presented by co-founder Wassim Kezai in this eye-opening EDGE AI TALKS session.
Most organizations struggle when implementing RAG systems with their corporate data. The truth is, unstructured corporate knowledge is often messy and inconsistent, leading to unreliable AI responses. Semantic matching issues in traditional retrieval systems further compound these problems, especially when deployed at the edge where specific, accurate information is crucial.
Wasim unveils an innovative approach that deploys specialized AI "detective" agents to search for information from authoritative sources. Unlike brute force search methods, these agents intelligently target reliable information based on hierarchical importance. Web agents crawl and cross-reference websites, image agents find relevant visuals, scholar agents specialize in academic information, and video agents can even pinpoint the exact timestamp in video content that answers your query.
What sets this approach apart is its adaptability to domain-specific knowledge and verification frameworks. Companies can customize how information is validated based on their standards, ensuring relevance and accuracy. While traditional RAG systems respond in seconds, Snipe Search's 30-second average response time delivers significantly higher quality information – a worthwhile trade-off for mission-critical applications.
The platform integrates easily with any LLM or chatbot through Docker, API, or direct integration, making it accessible for organizations of all sizes. As edge computing continues to grow, having efficient, accurate search capabilities becomes increasingly important for reducing cloud dependencies, enhancing privacy, and delivering better user experiences.
Ready to transform how your edge devices access and utilize knowledge? Explore Snipe Search's platform launching in the coming weeks and discover how intelligent search can enhance your edge AI deployments.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
Introduction and Updates
Speaker 1Hello, I'm a student of the University of California and I am a student of. Thank you, all right, we're back. Talks are back, yep, um we need to update our logo wall there we have. We have a bunch of new partners that have joined the foundation recently, so that thing needs to get a little updated. So note to self what's happening. Davis, how's Canada?
Speaker 2It's all good. Another day, another live stream. I'm actually dialing in from a new laptop, apple MacBook, here, so I'm testing out some of the features on this one. So lots of new stuff happening Studio lighting, I suppose. So hopefully that holds up.
Speaker 1Okay, cool, cool. Well, yeah, I just were you in Barcelona.
Speaker 2No, were you in Nuremberg, in a better world last week, unfortunately, not Lots of the team was, I know we have.
Speaker 1There's a huge presence there. You guys had a huge booth there. It's like a little mini city.
Speaker 2Yes, yes, a glimpse of the future. The future I've heard. I've heard good things. One of these days I'll see one.
Speaker 1I've been at previous yeses, but yeah, I mean, how was the show? It was great. I mean it was packed with people. We had a booth, a really nice booth, at gi foundation booth and, um, lots of walk up folks and packed, packed with meetings. We hosted the iot stars party on tuesday night like 200 people there. I had a panel with Zach Shelby and folks from Blues and hosted a panel with Wind River and then we did like six tech talks on the last day.
Speaker 2Oh, wow Mostly academic.
Speaker 1Yeah, which were pretty cool, pretty cool things.
Upcoming AGI Foundation Events and Gear
Speaker 1Some interesting stuff around aerospace too, a lot of interesting kind of edge AI happening in the aerospace and radiation hardened systems and stuff. So yeah, it it's pretty full week. Pretty full week and, um, yeah, like I said, a lot of new partners joining and it was pretty cool, nice. Hey, let me do a couple of psas before we get into our topic of the day. I know we got a lot of people coming on the line here. Um, I think we had a world record registration for this thing, so no pressure. Um, so we've got a couple of things going on.
Speaker 1We're actually going to be the next place we're going to be as a community is at Computex, indovex in May and Taipei. And if you haven't been to Computex before, uh, it's another one of those gigantic shows, uh, that really rallies the whole kind of Taiwan and kind of Asia ecosystem. So we're going to be there May 20th 23rd. There's going to be an EJA pavilion there. We're going to have a bunch of partners there. We're also sponsoring yet another party called, appropriately, the Night Party. So yeah, if you find yourself in that part of the world or want to go to that part of the world, you should go there. So that's going to be coming up May 20th, 23rd in taipei. And then also we launched at our austin event our gear. So people keep asking like where you know, when do I get the gear? So here's like a nice hoodie and then on the back is like that so it's you know it's premium it's evolving.
Speaker 2Yeah, I, I like. I. I saw them in austin. I haven't worn one yet, but I mean, if they, if they feel as good as they look, I'm sure it's worth it, so yeah just introduce another one.
Speaker 1Then we have the coffee mugs and we have all this stuff. But if you want to go to uh uh hubi hobe forward slash edgi um, that's where we have the gear. We call it scholarship gear because the that's where we have the gear. We call it scholarship gear because the money that's made off of the gear goes into our scholarship fund, which supports fellowships and travel grants and underwriting all kinds of educational programs. So good way to sort of help develop kind of future edge AI expertise through the scholarship fund and also to look cool, be the coolest person in the room.
Speaker 2Uh or on the live stream, yeah, or on the next, next blueprint show.
Speaker 1I'll make sure I'm actually my room is it's a little warm here, otherwise I would wear it, but it's a little more warm. We need it here. So, yeah, maybe, yeah, in canada, maybe we should be selling these like flying off in the end, and tariff free, by the way, so, and tariff-free by the way.
Speaker 2So you can buy them in Canada. No extra tariffs, too kind of you.
Speaker 1No problem, okay, cool.
Speaker 2Well, let's bring on our guest, our special guest.
Speaker 1Wasim.
Speaker 3The man of the hour. There he is.
Speaker 1Hi, let me move you to the space of honor here, which is the big screen. So maybe you can kind of give us the five-minute intro of yourself and all that stuff and then we can kind of get into it.
Speaker 3Yes, sure, yeah, like you said, my name is Wassim. I am a software engineer and also the co-founder of Snipers. Like to give you just inside of my background and my career. I worked for diverse companies I worked for I am an ex of Audi and also I work also for Oracle as a software engineer. There, I mainly specialized in cloud and AI. That's how my main and it's been like maybe four or five years now I am involving in the Edge community in order to develop AI Edge components and try to bring innovation in that field and use cases.
Speaker 2This is the right place for Edge AI innovations. All those are safe discussion. Yeah.
Speaker 1And you're dialing in from Barcelona.
Speaker 3You mentioned Is that where you're at? Yeah, am. I'm originally from belgium, brussels, but yeah, at the moment I am uh working in barcelona and yeah, I quite enjoy it. Let's say it could be worse, could be worse could be worse awesome.
Speaker 1so, uh, the other thing just to take note is we have lots of folks on the live stream today asking, going to be asking lots of questions and things. We will collect those. We'll, Davis and I, will bucketize those, quantize those and at appropriate points we'll kind of cut in and we'll have Wasim address those and stuff. But I think what you're going to talk about today and actually we'll talk about the name Snipesearch- We'll talk about that.
Speaker 1But one of the things that people have been talking about and we have this working group for generative AI on the edge and we talk about use cases and scenarios and one of the things that a lot of companies have been trying to do is use like kind of rag architecture to train kind of language models with their own corporate data and which, you know, I would say mixed results is being generous. It's been kind of a real disaster for a lot of folks and part of it is because the corporate data is very unstructured and kind of messy and for those that have worked in big tech companies, we all know that our corporate data is kind of garbage most of it anyway. So garbage in, garbage out. But the RAG architecture itself has some challenges in terms of semantic kind of confusion.
Speaker 1I would say Right, and so one of the things is, when people bring language models to the edge, the scenarios are quite different, Right, they're much more specific. To the edge, the scenarios are quite different, right, they're much more specific and like, if you want to talk to your car engine about what's wrong with you, or your washing machine, about how to clean your clothes or things like that, there, you know, we don't want these things to write haikus and do cat poems. We want them to answer the questions right. So I think this was an interesting. For me this was a really interesting kind of exploration into how do you develop better training and kind of search models. You know what's the right word to?
Speaker 1to add the right intelligence in a very specific way to a to a language model instance yeah, the name of the game is domain specialization.
Speaker 2I think you'll see me I mean you'll you have? You have the floor to tell us all about this. But I think pete highlighted some good problems and I think you're you're think you're going to give us some solutions too. But, yeah, context-aware domain specialization. You don't need the whole history of the world in your edge. Ai chatbot yeah that's completely true.
Speaker 3I think it's how, actually one of the biggest constraints that we have with SLMs when they run on the edge, it's the context window. We can't give them a huge amount of data because the context side is small, because the number of parameters is small, and it's inherently from that, I think, a rag and search engine or other tool can bring the gap and bring, let's say, capability to the kind of SLM that they didn't have before just relying on what they are pre-trained on.
Speaker 1Right. So if you're walking up to a kiosk in Home Depot and asking a question, you're not going to say what's the meaning of life. You're going to ask where's the superglue?
Speaker 2If you are, there's a better place. That's the only question I ask.
Speaker 3Dial 26.
Speaker 1So yeah, yeah, yeah, so yeah, that's kind of important. I think, if these are really going to get deployed practically in these environments, we need a better way of kind of harvesting this kind of corporate knowledge, but we're harvesting the right knowledge in the right place at the right time. So I think that's what you're going to talk about today, which is pretty cool.
Speaker 3Yeah, exactly, I think like we have a lot of one of the ways that.
Speaker 3AI will enter in companies will be through these kind of tools because like you said it makes them domain-specific, and what we want as a business is to answer your needs and give your information that you want to your customer or to your user meaning it will be more, or less a big entry user, meaning it will be more or less, and there's a lot of research in that, in that area, and one of the application that we are seeing is seeing on the slm is how to enable like the slm with more capability, as he is in the constraint zone, meaning that he cannot have access, and we can also link it to not to only the internet but only to databases and so on, and to access and to search in that kind of knowledge that we have.
Speaker 1that will help him make better decisions on the edge Right a little more curated it's a curated training than the wide open web, because we know the internet's half garbage and half interesting 90% maybe, but that 10%.
Speaker 2it gets a lot done, and I mean I that's right, but a good segue.
Speaker 2Segue to your talk is you know, with with different tool chains, some rag works well, some doesn't. Depends on the data, depends on the model. I think that the the whole picture and this changing landscape. I think that's where snip search, snipe search, whatever you end up calling it. I think this is it's a. I got just a glimpse before, but it's it's a, it's a fresh look at, like you said, how this stuff will enter company life, how this will enter our actual, actual workflows, cause not all tools are equal.
Speaker 1Not all data is equal, but you got to get the job done somehow, right. Yeah, no that's really cool. Do you have a screen to share? Yeah, all right, we'll bring up the screen here. So actually, wasim and I saw each other. He was in austin at our event in austin, texas, which is good we enjoyed some.
Speaker 3Is it like a korean barbecue? Yeah, korean barbecue. I remember that everything.
Speaker 1There is barbecue something. Yeah, that's true. Well, you went one night. We went to a japanese restaurant with barbecue. It was like the korean restaurant barbecue. Yeah, cool, okay. So do you want me to share that? Let me know when you're ready to share it. Okay, we're going to share a screen. We see lots of people coming in here. Good morning from Germany, france, cambridge.
Speaker 2We've already had our first question, Pete. Here's the first question. Well, while you're working on your screen, here's the question.
RAG Limitations and Search Challenges
Speaker 1Why should CIOs pay attention to the edge? So from Paula? I don't know, paula, maybe you're a CIOs pay attention to the edge. So from Paula? I don't know, paula, maybe you're a CIO, but the edge is edge. Ai is all about running AI workloads where the data is created right, and that's kind of the impetus behind that. The gravitational pull toward the edge means lower cost, lower power, more impact typically, and that can also mean things like privacy and latency and flexibility, agreed. So CIOs you know I mean who are in charge of kind of figuring out the information strategy. You know you want to move your compute as close to where the data is created as possible. That's kind of generally what you want to do. You want to avoid ingress and egress to clouds and OpEx costs and have more control over your processing in general. So I don't know, davis, you want to put your two cents in on that.
Speaker 2You covered the main ones. I mean, when I was at a previous role at a startup short, condensed pitch, it was cost like operational cost. You can't have 4G LTE high resolution bandwidth streaming all the time. Forget the AI part. It's just a no-brainer from a cost perspective to do as much as you can all the time. Forget the AI part. It's just a no brainer from a cost perspective to do as much as you can at the edge. That was my elevator pitch.
Speaker 1Cool Sounds good. All right, Waseem, let us know when you're ready to share.
Speaker 2Yeah, I'm ready. Have you presented?
Speaker 1yet you need to hit the present plus button because it's not quite showing up. I didn't hit it, that's all right.
Speaker 2It's live we're good, we still got plenty of time. I did set a reminder for questions, whatever platform or chat, and Pete and I will filter them to Asim, I think.
Speaker 1I'm going to bring it up. I think you might do some infinity screen here, by the way, asim, because you have our pictures up on your screen, so watch this Now you have the presentation there.
Speaker 3You go there you go. Okay, we are good.
Speaker 2Beautiful.
Speaker 3Perfect.
Speaker 1Now by the way, can we just spend two seconds on the name Snipesearch, Because I know in your demo it's called Snipesearch.
Speaker 3Yeah, exactly, we have like a location now. This is a work in progress.
Speaker 1The naming is still a work in progress and maybe we should throw it open to the, to the community here, for some ideas, as you, as you go through it, if you're inspired with a name. Uh, snip search, snip search. I don't know what, you know what we're gonna, what?
Speaker 2was not to be confused with strip search, because that's that's so. We want to go there.
Speaker 1But uh, anyway, just an interesting. This is the process, right, when you come up with an idea and a product name and you know someone else has the same name or whatever, so so, uh, just you know we'll cut. We'll see him some slack here, but it's an opportunity for the community to maybe get creative on it.
Speaker 3Yeah but and a funny fact is like all this project and this company is created on use case that you bring with the AJI Foundation actually.
Speaker 1That's right yeah.
Speaker 3It's. That's what is nice to have.
Speaker 1Maybe you should put some AJI snippets. Yeah, I'll think about it, cool. Why don't we give you the floor and take it away and then, like I said, Davis and I will call some questions in the background and we'll get on with it, Okay perfect, let me know when I start and we can go, okay. Hi everyone.
Speaker 3How's it going to all our gents? It's a pleasure to have you with us today. I will present to you Snipesource. It's our agent orchestration system that performs as an OSINT, swarm engine, um agents and swarm engine collaboration. And actually what? What is snipe search?
Speaker 3nature is only like some agent that we we have more or less trained, that we have give some tools that will go on the internet or go on your internal databases and actually gather the information in order to respond to your initial question, and actually it's something that integrates inside lm shot ports, your own website. I will showcase more of this just to showcase you the team that that is working on this, ahmed and me we are the two co-founders of Snipesource and just to give a bit of a roadmap of today's presentation, I will first introduce the AGI Foundation knowledge base. It's actually the knowledge that we gather in order to create the base, information to search on, in order to create the chatbot and the LLM for the HIL Foundation.
Agent Orchestration System Architecture
Speaker 3After this I will go through the Snipesource solution and how actually this agentic orchestration system can search efficiently and also accurately about the information on the defined scope, meaning like that is, web or databases, or on your YouTube channel or your website, depending on what scope that you give him. I will highlight the application that we did in order on this and then we will go to a live demo of Snipes Search and also the Snipes Search platform that will. That's also our platform that we have created in order to generate and to wrap current chatbot and LLM with our search capability. Like first of all, starting with this project, like said, it was a project that has been proposed by Pete last year that he wanted to create a knowledge base in order to leverage the access to the presentation, the PDFs, also the publication of the AHAI community and to create an LLM in order to upskill people, give access to information and also give access to people that are like involved in this, and it's more or less an entry point to leverage your skills in everything that is related to.
Speaker 3AI on the edge and we came up, like, with this solution that I thought the first, if I give a little bit of timeline about the project.
Speaker 3We first start with the retrieval of multi-generation system that we have actually linked to a tiny model that we have fine-tuned on. On. On q a answers response of of age ai content. That's what was the first project. Then, actually, we saw a lot of gaps that we needed to fill. We didn't have real-time information, meaning that everything that is published today or after the training of the model, we don't have access to it.
Speaker 3And also there was quite an issue on how we source references and also how we fact check information, meaning that we have trained our data, but when we train our data, we always train on the data that we have is not always 100 accurate, meaning that there was always some some garbage inside, and this actually will make our model sometimes hallucinate, and we want to bring a solution to this, and the solution that we found out is to to orchestrate, to orchestrate it ai agent in order to perform the search and to use actually search engines like google, like bing, like more other to use.
Speaker 3YouTube search engines to use also RAC, retrieve Logomotivity Generation Workflows. To use also Knowledge of Meta-Digital Workflows, all these techniques that actually have all good points and also bad points, and by concatenating all these tools the LLM is able to fill the majority of his gaps.
Speaker 3The knowledge base that we constructed and that everyone will have access in the future on the. The website of the AGI foundation is constructed about web pages that are relevant to it's constructed about web pages that are relevant to HAI and blog posts also that are relevant to HAI. We have created more or less a scrapper and a crawler that go on the net and target exactly only.
Speaker 3HAI knowledge content and we also have a transcript from the YouTube, daily Motion and other platform around more than 650 videos and around 1,000 documents that are composed about PDFs, presentations and also conference presentations, event presentations and actually yeah mainly all the events that happened in the NJF Foundation in the last three or four years were inside After how we have created that base knowledge.
Speaker 3we created some mappings in order to create some logic between people, organization, research topics and technologies, and also about collaboration, meaning that we did this using a knowledge graph. I will go in detail, more or less. It's more or less to create a relation between our base knowledge in order to understand who is linked with who, who is working with who on what, what company is working on what subject and so on. It's really to create it's more really an entity relationship matching model, like one of the limitations that we saw when we like I said, like we saw when we started developing this project at the beginning and we wanted to create this knowledge base and give it access through NLLM to the user of the AGI Foundation website website was the first one was the information overload, meaning that more you have information, more the context is huge, more it's difficult to find relevant information.
Speaker 3And knowing that with all, let's not say all, because like in the. Ai world when we say all it was six months ago. But yeah, we saw like, like rack systems and all this have like. An inherently issue is about semantic matching. We are using Cosinus similarity. Even thought we use a hybrid search or even thought we use a keyword search matching, there will be always a gap because there is probably norms.
Speaker 3There is there is some mathematical inherited issues that are not yet solved. We hope it will be solved in the future. But yeah, it's what we saw. Mainly, there is some mismatch and also there is a lot of work to be done in behind in order to have a rack system, like the effort that you have to put in behind to have a very nice rack system is huge, like pete said, and it was a very good point.
Speaker 3generally, when you are in companies, the data are not really well structured and it's more or less a mess, let's say, and the effort that you have to put in order to structure all this and to create your vector database with the correct chunking, with the correct link-segment, link-in-between chunk, it's a huge effort, it's achievable, but it demands a lot of resources, and that's what we saw and was like the motivation to bring a new solution.
Speaker 3that is a snipe search that we call snipe search that we say okay, we need to overcome more or less the gap of semantics and also the gap of the fact that you don't have this information accessible in real time, and that was one of the big blockers that made us think about bringing a new solution and we created SnapSearch. Like I said, it's an orchestration of A8 agents and each A8 agent behaves like an OSINT agent. Like you, have several tools that you can use in order to crawl the web efficiently or crawl the source of data that he's targeting.
Speaker 3Like I said, it's not only the web, but you can define your own scope, meaning that if you want to search in the whole web, it's your choice. If you want to choose to search only on the website that are related to your business score, you can also do it. And if you want to search inside the database or vector database, SQL database, all this is possible. Database or vector database, SQL database all this is possible, and we gave to this agent actually the tools in order to him to search efficiently inside this data source and also to match correctly and pertinently the information that are needed in order to answer the root question that you have and actually like there is a workflow that is going on on Snipesource, like the first
Speaker 3part is, like, the first thing that we do is like the user is asking his question and we will understand what is the intent of the question and we will do some search planning, meaning that we will, depending on the question that the user is asking, we will do some search planning, meaning that we will, depending on the question that usually is asking, we will do a search plan, a plan, a search plan that will actually we will plan how, what, we will plan what information we need in order to respond to that root query and we do it through an agent that will do this, and there is different way of creating that plan.
Speaker 3There is a tree of thought, there is a using chain of thought. There is also using a year shaker planner. There is a different solution in the letter.
Search Planning and Reasoning Methods
Speaker 3Each one have his good point and his bad point and it's really a tradeoff to use between each of them and actually we launched this OSINT agent on the web in Parallel or on the sources in Parallel, and they will go and grab all the information that we need. It's like they have a mission that's why we call them detectives and they will create a report at the end. That's what. That's what they will produce on all the information that they have read, because they will read all the information. They will crawl the web and read PDFs and so on, but they use, like I said, they use a targeted philosophy meaning that they will not search everywhere.
Speaker 3They will really target authoritative sources. Meaning that the Authentic Agent has the capability to narrow down his scope. Meaning that if you ask him about an overview of the A100, he will go and check on NVIDIA website and then go down.
Speaker 3Meaning that he knows the agent knows the hierarchy of what is an alternative source and where more or less he needs to start his crawling and search and actually this is quite efficient and if you compare it to other solutions that are on the market that do more brute force or exhaustive search, here we are doing more and we are now without search, and it takes less time than normal.
Speaker 3I can give you an example like deep search or on open AI or other kind of solution of public city. There are four different type of agents, or Synth agents, that we have inside that orchestration. Each one is actually personalized and customized in order to target a type of source. To target a type of source, meaning that we have some web agents that are like that, have the tools and the capability to go on the web, crowd the web, read the web pages, cross, reference the different web pages, check the author of the web pages, the source of the web pages, the links of the web pages, the source of the web pages, the links of the web pages, the number of views, the citation, and all this and with all this information, he will create a final report with all the information, with the source of the information that answers the root question that you give him and here we go, doing this all the way.
Speaker 3We have an image one that will focus on gathering image, more or less. You are asking about comparing two concepts or the performance of a concept. He will go and search for some diagram that illustrates that concept. It's more or less his train on. This agent is more or less customized to do this. We have the score agent. This one will go only in academic databases, meaning Scoda, archive, ea and more, and he will search inside that content and that kind of score agent was really designed for the AJA Foundation in order to give the users the ability to have really scientific reports with really detailed reports.
Speaker 3That was really the purpose to create that score of agents.
Speaker 3And the results were quite nice, to be honest. And there is also the video agent. The video agent will go on the video search that you give him for example, you gave him youtube or the emotion and he will go search for the video that that answer your question. Meaning if you ask him about how to prepare, how to mount an mcu or specific mcu, he will go and find you the tutorial that explained to you. He will read the video and he will give you even top the minutes where the video is talking about answering your question.
Speaker 1Here is like the overall workflow.
Speaker 3to just summarize everything that I said what's happening?
Speaker 3we take the user query we perform an AI reasoning in order to do our search plan. Once we have our search plan, we launch, depending on which type of agent we are using. We launch the swarm of agents on the web and our target source. We grab all the data. Then there is a layer of verification where we fact check information. Actually and I will talk more in detail about the fact checking system that we put it's quite how you buy data, how you buy the source and the relevance of your data. It's really personal of the domain is very personal and domain specific. That's why you need to customize it depending on the final user that you are aiming to serve.
Speaker 1And then at the end we have our verified answer.
Speaker 3There is different way of planning your search, meaning that there is the most used one let's say, is the year-shakel planning.
Speaker 3It's more or less. I have a root question and I decompose this question in different sub-questions and I go in the sub-question and I also decompose, and I do this recursively until I am in the list, creating a tree structure from the root to the end. And actually I give like just an example if you want to compare two entities, the second level of the tree will be a find, the specification of the entity half and specification of the entity B, and so on. If you go one level down, it will be a find this kind of specification, this kind of specification, this kind of specification, this kind of specification. You can go logically more down and it's cool.
Speaker 3Your planning is very, very nice with search, it's, it's convey and it's actually much and completes the mindset of how we search for information you have also chain of thought it's more or less a step-by-step reasoning that you put in your LLM in order to learn him how he needs to plan the search. It's very customizable, but it's also how do you say it? It's less deterministic than the other one there is also tree of thought, a program of thought.
Speaker 3That are also other techniques of reasoning. A tree of thought is more or less to generate different tree of thought. It's more or less different Yerushalem plans. You score them, you compare them and you dynamically evaluate them and you choose at the end a final plan that is the best but it consumes a lot, Meaning that there is a lot of trade off. It's more or less the RIC plan, but dynamically done on the fly, meaning it consumes a lot of token and it's quite expensive sometimes. Like I said before, we do this reasoning and planning. Then we launched the parallel information retrieval.
Speaker 3We do our evaluation, meaning that, like I said, the agent go and come with a report with all the information, the source that answered the question that you give him for the mission that you give him, and once you have this know for the mission that you give him and once you have this, there is a layer that will take all the information from different uh agent and there is an algorithm, and that algorithm actually is customizable depending on how you interpret how, for you, is actually the truth, how you define the truth, because everyone doesn't define the truth the same way and, yeah, it's, it's quite something super nice. Also in in snipe search that actually will give a lot of consistency and and also demand specific validation framework in order to to validate the information, and the last layer will be to construct a coherent response for the user. And also that is customizable, meaning that depending on the output or format of the output that you want to give to the user, you can customize it quite quickly.
Speaker 3Snipesource have, like I said, a lot of benefits compared to traditional Rx system. It's Excel with long, complex context as we have, we can use, we can choose actually what is cool with Snipers, like it's an orchestration and we have the ability to choose the agent that you want for each step that you want. We can actually leverage the architecture of Transformer that you want in order to fulfill our final target. And I give an example if you have some 300 pages of PDF that have been found during the process of gathering information that you're a Synth agent. You need a long context LLM model in order to be able to read all this information and putting a long context model inside as an agent to process PDFs and so on.
Speaker 3It's a good idea. In the other side, if you want to generate the queries that you the queries, the target to everything, or the target, the source that you the targeted queries, the source that you want to read.
Speaker 3For this you need a more reasoning model than a long context model and actually you can replace it and you can create more or less the best, the best, the best orchestration that actually each agent will excel in the function that he have because he leveraged actually the architecture that let him achieve this. Yeah and yeah and one of the. The thing with snipe search is the latency. That it's good compared to the traditional racks generally. In, we are quite fast.
Speaker 1We are between 0 and 5 seconds per answer.
Speaker 3But if you compare it to Snipesource, Snipesource is around 30 seconds per answer. But this is quite expectable because we are going on the net we are searching on the net. We are doing things in parallel and we are searching on a bigger, huge amount of data than what we have with traditional RACs. We are doing things in parallel and we are searching on a bigger, huge amount of data than what we have with traditional right even though in Snipes search we use writes but we use them in order to interpret information into interpret uh informations now one thing that is also super nice with the Sn.
Speaker 3Search is Snipes. Search is a search engine. That is agnostic, meaning that Just a quick interrupt.
Speaker 1We had a couple of questions bubbling up, so before we get into it a little farther, I thought we would throw those on. Is that cool?
Speaker 3Yeah, no issue.
Speaker 1Here's one from Malik. Malik says the engine targets will target the internet or closed databases. I think you might have covered that one, but can you clarify?
Speaker 3Yeah, it depends Actually like the Snap Search will target a source of data.
Speaker 2okay, Now the source of data.
Speaker 3It could be the web, meaning it could be a uri linking stuff, meaning it should be a crawling system. He's have the ability to crawl the web and he also also have the ability to go in the database, meaning he can do create an sql query.
Speaker 3He can also create like a cipher in order to go and vectorize databases he have the ability to use whatever tools that the LLM is able to produce this, okay, and he have access to any type of data and actually it's your use case and where is? Your data that you want to access is that will define the way that you will, the tool that you will use, tool that you use actually.
Speaker 2So how does that influence let's call it, integration time or search time? I mean, if you're obviously looking at a smaller closed vector database that's lm ready, versus, you know, a large unstructured source, have you seen a lot of variability in implementation time or search time because of that?
Speaker 3there is like really a trade-off to do If you are doing like, if you are focusing on vector databases or SQL databases. There is, it's super fast, it's really super fast to interact.
Speaker 2Right makes sense.
Speaker 3Yeah, with these tools, but the integration and the construction of these databases take a lot of effort.
Speaker 2Also true. Yeah, Also true.
Snipe Search Benefits and Platform Demo
Speaker 3The search if you compare to the search one, if you go to and do a target search on the web. It's quite different. In this case you will take more time, but you don't need to prepare any data in the app.
Speaker 2It's already there. Yeah, it's already there. I like what you said. It depends where the source is. Your approach targets the source.
Speaker 3Right, exactly.
Speaker 1Cool. Let's try another one here from Paolo. Yeah, okay. Paolo asks how can the agent tell whether a paper is authoritative? So paper citations index, like how do you know what you're pulling Actually?
Speaker 3what you will do like there is different way of doing it. You can create a few shots in your LLM meaning prompts where you show him what is related to what and what is relative to what he can do a link but this technique will not work. Like you will have 90% of accuracy or something like this. I don't know exactly how, but there is a proportion that he will fail if you give him something that is too specific in a domain that is too specific.
Speaker 3He has not been trained like this information in the web was not pertinent, meaning that for him it will not be evident to the LLM, will not be able to rely on him way to understand this. A good way to do this is to create, more or less you can create, a list of authoritative sources by domain and you can say to the LLM OK, each time you have a query, go and check. Semantically.
Speaker 3You have only to say okay, if I have a question about AGI Foundation I know that AGI Foundation or about a partner like one of your partner that is working on a specific chip and you are asking the information about that specific chip and you are asking the information about yeah that specific chip we go to we know, that this is microcontrollers, then we can do a list of alternative links for microcontroller and he can actually know that the intent of the query is about microcontroller and he can go and check and infer that it is actually.
Speaker 2So it's the semantic part that gives you that ability the semantic understanding. Okay, yeah, that's cool.
Speaker 3I will not go further. Like to be honest. There is-. Offline yeah, there is things that there is we have a solution that we developed, but I will not go further on this. There is some that there is a. We have a solution that we developed, but I will not go further on this.
Speaker 2There is some stuff that we yeah, but there is a solution to that.
Speaker 1Okay, cool, let's see. Should we do one more Davis? What do you think?
Speaker 2There was a good one. I mean, it depends how much more Richel you have awesome. I mean, do you have a lot more slides?
Speaker 3We have four slides.
Speaker 2We have time to answer the question, okay. I, I think this one about the response time was also um. From rajagopal, that all rajagopal, that also I. I had two, so I mean, edge is off in real time. It's often about quick response times. Vector databases, rag databases offer quick responses, but the limitations you mentioned but this is a good question, so is 30 seconds is too long for response time in the context of edge AI. Is it Question mark?
Speaker 3It's too much. So, is working on the shortness response.
Speaker 2I mean, there's always authorization, there's quantization Models get smaller. This response time is probably not a one size fits all. It ranges. What do you think?
Speaker 3I think it's not about the overload of data that you will be scrubbing and handling, If actually you are able like. I said, one of the innovations that you bring here with Snapshot is that we target information, meaning that we are already diminishing drastically the amount of data that you are processing Now. If you are able to create super, super specific queries that target exactly what you are looking for, it will be super fast. But let's be realistic, and I don't think someone. To be honest, it's a big challenge to arrive to this efficiency.
Speaker 3Right right Of targeting.
Speaker 2But there's some tolerance, right. So I think that's the nugget of this point. Maybe, if you're an operator or machine or you're running some of these queries that you want reasoning like a human, would you take some time to think about it.
Speaker 3Yeah, yeah.
Speaker 2I think one of the big applications of edge AI.
Speaker 3It will be, like you said, I think you have mentioned a good point and I think the trend is going in that way.
Speaker 1The edge will be more or less a leverage in order to lower the cost and to enhance privacy to everything that is done on the cloud.
Speaker 2Like me, I'm seeing every application that is running on the cloud will need some edge.
Speaker 3Let's call it infrastructure. That's a leverage, ai agents, or AI, llm or SLM you can name it whatever you want in order to like diminish the cost that we have and also enhance privacy and also accuracy of what we are sending in the final, let's say end point that will take the final solution decision, all the final steps makes sense yeah, it's what we are saying now there is a lot of other like use cases that are more constrained, sensitives, like if you go to the space, you don't have access to the network or it's very expensive right, there is no cloud computing, or there is, but yeah, yeah yeah, but I think it will integrate in what, like we call the cloud world in that way and yeah yeah okay, nice, yeah, well, let's get.
Speaker 3I know you have a few few slides left. Let's get back in the back in the groove, yeah yeah, yeah, like just to highlight a bit about Snipeshare and something like From the project that we don't hear with the AGI we constructed that product, that is called Snipeshare and actually we noticed that in the market there is no no one that you are not able to do in the market to bring search capability in your chatbot or on your LLM in a very easy way that you can integrate directly.
Speaker 3That's something that we noticed and actually we saw an opportunity in this saying okay, we have developed search capability, we have developed infrastructure with an architecture of under question orchestration and the logic that is like targeting very well, information is able to go there. How can you bring this to be used to anyone? Actually create search capability using Snipeserve framework and actually directly integrated inside your website page or your company search engine or your LLM or your chatbot and so on? And one of the features that is cool here is like it can it can it can integrate with any LLM that you want, with any chatbot that you want.
Speaker 3It's more or less a Docker composed image that you can download and deploy, and also you can serve it as an API and so on. I will present at the end the different features of the platform, actually, but one of the big features that we bring here also it's universal integration in different kind of structure, and we saw actually that no one is doing this at the moment and there is a big gap in this in the market and we want to bring in, to make solution for, for for this, um, yeah, like, like another like, like I said I was talking about, like I said, I was talking about all the data source that we have set up, and one of, let's say, branch that the LLM can go and check is the TinyRIX system.
Speaker 3meaning that he has the ability to go in the web to search in every YouTube video provider is a publication on this and more or less here is what, how, how our internal system for the knowledge ai would look like we will. If you want to interact with the structure, we want to want to interact with the internal knowledge. He will take the query. Depending on the query, he'll be rooted rooted to the different node structure that we have, because we have two different structures.
Speaker 3One is graph one is a vector database and actually one will fulfill the gap of the other, because we have a big problem with first name and last names in vector and, in similarity, embedding that we are actually compensating with the graph store, meaning that that's why we bring two different type of data here and then we have the retrieval process and then we give it back to Snipesource in order to interpret it, and this tiny oracle system is fully trained on all the data that we gathered from the knowledge edge. Trained on all the data that we gathered from the knowledge edge.
Speaker 3Now, like I said, one, of the big advantages of using Snipesearch framework is that I said it's parallel. It brings a lot of precision, a lot of reality, because it has a robust citation tracking information. You can, like I said, put your personal and customized verification layer, meaning that you can buy and apply it to the domain specific of your company, of your business, of your foundation or whatever. It's adaptable and it's also have.
Speaker 3It's very, very easily integrable to any infrastructure that already exists, meaning that you don't need to build anything from scratch. You just need to integrate it in already your LLM or so on.
Speaker 3And also, something that Snipes Search brings is also privacy and security, because it will also give you the possibility to any developer to develop from scratch the model that he wants with the search capability and to handle them on cloud, on his cloud or on premise like he wants, and that can enhance a lot of privacy, also a lot of security about your data, and we know that that's a big point that's coming in the future.
Speaker 3It's already a big point, but it will be a big point that a lot of people will look at in the couple of years coming up. Now I will do a little live or we can do maybe some. Let's do the live demo, let's do the some do the benefit.
Speaker 2Let's do the live channel, let's do the demo.
Speaker 3Yeah let's do the demo everyone's asking for it.
Speaker 2Yeah, exactly yeah, let's.
Q&A and Platform Features
Speaker 3Let's say what is demo, more or less what we are doing. Here we have our edge foundation l. We have here the different type of source that we can go in, Like here we are using the web one. We have asked an overview of the IA100. Here we'll go like the output that we are doing here. It's like a personal output. We wanted to emulate a search engine and in the same time synthetize answer, and you can see here that we can. It's here. Yeah, the answer is here like.
Speaker 3I will give you like just a little example.
Speaker 3He will go and he will check, like for the different source, and you can see that okay he go yeah, like we are comparing our 100 and h100 and you can see that he found the three documents that he will pass his uh his answer on. And you can see that the first document that he found was the first document that he had, was actually the official document of NVIDIA of A100. And he found another document of GPU positioning. It's also not relative and there is a comparative analysis of A100 and H100 in various workrooms. He can synthesize the direct answer meaning that here you have the details. Uh answered meaning that a shovel slated performance into two advanced that that you justify your your, your the things
Speaker 2and because it's like sorry yeah so you've already put this query in you. You're showing us outputs that were produced with and then what's unique about your search is this relevance authority level updates. You've created some kind of ontology or some kind of just sources that you've made accessible to the end user, so it's not just behind the scenes this is happening. You're trying to make it more transparent as well, it seems. Yeah, exactly.
Speaker 3You can see what we did. Like you said, the first innovation that you bring is, like you can see here that it's targeting exactly what we want. We are comparing between A100 and H100. You can see that he gathered the NVIDIA Huber architecture in depth, the technical block, and he gathered the NVIDIA A100 test code from the data sheet. He found this and on that I'm sorry, but the interface is a bit yeah.
Speaker 2How long would it take to produce an output like this? I mean we can- 30 seconds 30, okay, we can lunch it.
Speaker 3What about, can you ask?
Speaker 1a more general question about you know, give me some examples of using Edge AI for pedestrian safety.
Speaker 2Yeah, yes, let's try it. I think it's a little ironic we're talking about A100s and Edge AI.
Speaker 1I think the keynote's about to start by the way yeah, true, okay, I see, I see.
Speaker 3We can ask a more broader question what do you want to?
Speaker 1ask yeah, just ask about pedestrian safety.
Speaker 2I don't know what would come up. Can you use edge AI for pedestrian safety? Can you Edge AI for pedestrian safety, can you? Yeah, cool, yeah, let's try this query. And then there's a couple of questions, pete.
Speaker 3We can also bring it down. Let's see.
Speaker 2You need to change that a little bit. Yeah, none of those.
Speaker 3I-A-N yeah.
Speaker 2Okay, and then you need an E in the safety. Hey, maybe it's that good it knows what you mean anyway, I will do it. Okay, let's solve that.
Speaker 3Yeah, we can maybe answer some questions in the time he looks for the information.
Speaker 2Yeah, okay fair.
Speaker 1Here's a question from raja gopal what's the major difference between snipe search and open web?
Speaker 3ui from my, from my knowledge, open web ui is is only the web, the web interface. It's a. It's a open-source framework to construct LLM interfaces.
Speaker 2It doesn't interact with the it's not the search part, no, no, no, it's only the UI open-source interface Okay. Okay, that makes sense. There's another one from Mohamed. Can Snipeswitch be adapted to private information systems, if you see that one there?
Speaker 1you go, it is.
Speaker 3It is actually targeted and it's actually built for private information.
Speaker 2Yeah it's, yeah it's for that and a follow-up question how scalable are the agents? I know you talked about a lot of different dimensions agnosticity, longer context. How scalable is the platform is the follow-up question.
Speaker 3It depends on your budget, to be honest. Yeah, to be honest, it really depends on the budget. It could be really highly scalable, because you're launching in parallel a lot of agents and you do it asynchronously. But if you have a lot of money and you want to send 1,000, agents. There is a token consumption that you have to scale on 1,000.
Speaker 2It scales with the complexity of the problem and the available resources. And, yeah, that makes sense.
Speaker 1How's our query doing?
Speaker 2I think there was a challenge with it. It might have been because, of the inputs. I think Pete and I put you in the hot seat with we gave you a random query.
Speaker 2But yeah, well, I mean you have the link available, right? I think you actually put it in the slides for everyone that they can access the server. They can access on this. Yeah, I mean the, the tiny reg stuff. I took some some shots while you, while you're presenting, I mean there's a lot of, there's a several uh components that you guys have put together here that are behind the scenes, that I think uh need to be appreciated like the tiny rag, like the, like the actual search itself. I think that's where the UI is the UI, and I know it's part of the demo, but I think there's also a follow-up question on the open UI side. I know we're very close on time, pete. I don't know how you want to handle it.
Speaker 1We got like a minute left. What was this? The VLM question Is that it.
Speaker 2Yeah, that's a new one. Yeah, Are VLMs being used in vision language models?
Speaker 3Okay, we use OCR. Okay, yeah, we use them, but we don't use LLMs, multimodal LLMs, we don't use them. We use OCR to extract information.
Speaker 1Yeah, okay, that's what we use.
Speaker 3Sounds good. Yeah, yeah, nice, yeah, I think I can show also the platform.
Speaker 1I think we have about a minute left.
Speaker 3I can show the result that we have here. Oh, nice.
Speaker 2Ok, there you go. Excellent. I like the confidence breakdown. I think that's unique.
Speaker 3Yeah.
Speaker 1I think so, but more or less you have everything here.
Speaker 3You have here if you compare stuff.
Speaker 1I don't know why the server is down.
Speaker 3There is an issue with that. And here you have the links where you can actually check for the different elements and so on, like the source elements, cool.
Speaker 2Yeah.
Speaker 1Good, hey, we're going to Go ahead.
Speaker 3Yeah.
Speaker 3I want just to showcase a bit the platform that we have. It's a straightforward platform that we have built to create and you have different things that you can monitor here and also a good feature that will be available for everyone. It's like you can create directly your search engine with the plan right, the different source that you can configure, like pin search and so on. Image you can customize with the different source image that you want to target. You can add the different databases vector or sql databases and actually you can create your engine and you can monitor the engine and how we use that platform and yeah, and this will be more or less we are planning to launch this in two or three weeks, awesome.
Speaker 1All right, we'll keep an eye out for it, waseem. Thank you so much. Thanks everyone for joining in and we'll look forward to the production.
Speaker 2Another great show. Yeah, thanks Waseem, this was exciting.
Speaker 3Yeah thanks.
Speaker 2It was very nice To the future. See ya To the future, thank you.