That's unironically amazing. In the same time period, I almost learned to adjust the wig angle on a Second Life avatar.
In re: nothing, your use of of both "cuppa" and "hillbilly" in the same thread make you seem culturally and geographically ambiguous.
Comment has been collapsed.
Nice
keeping it vague for now 'cause he's gonna try and market it or something
assuming that means open-source projects like github etc is likely off the table at least for the foreseeable future. sad but i get it
anyone got feature ideas?
I don't know enough about it to be able to tell if I'm in your target demographic but I have used some speech recognition stuff in the past (been awhile... mostly it was Windows Speech Recognition aka WSR and some 3rd party programs that relied on it way back in the Windows 7 days). I am aware of Whisper and keep meaning to dive into AI tech, but I haven't actually studied or even played around with it yet. That said, I think I am familiar enough as a user to at least suggest some potential features (again take or leave 'em as needed for whatever fits for you).
Comment has been collapsed.
You're pretty spot on with a few things I have looked at.
Push to talk is already a feature implemented. I just haven't mapped a key to it yet. I love into PTT as I currently have it set to a 5 second talk input time. This time can be elongated but for testing purposes 5 seconds is enough. PTT I was having issues implementing properly. Depending on the toggled key, it wasn't allowing whisper to close to transcription. So it would almost idle out as if still waiting for input. Some keys worked, some don't. So I stopped it and figured I'll fix it up later. So let's say it's half implemented lol
Outputting the spoken text file already happens. It currently saves text logs, terminal logs and all the audio files. The audio files are saved in chunks also with a tick of a box in the settings it will save the completely built WAV file, piecing all chunks together.
I'm not making anything OS specific but thats more because the one who asked me to make it was too vague and couldn't specify what OS he was looking at using. So right now it could be packaged for almost anything.
As for remapping words I havent really looked too far into this. I have considered making it always listening. Could either do wake words similar to the whole "Hey Google", "Alexa" etc. So remapping words for certain functions shouldn't be hard, especially considering it's literally a transcription defining a named item or action etc.
Comment has been collapsed.
Right now it uses FFmpeg, Whisper, LLaMA3, and Coqui TTS
I have no idea what the hell this means, but have a bump
Comment has been collapsed.
oh my train bump. this is long.
I have no suggestions for AI since I barely use it, though I probably should when I see how impressive some stuff is... Only thing that's stopped me from using it more for work is that the internal AI that we're allowed to use doesn't make it easy to find the section of a webpage it's citing from to give up it's (probably made up) answer.
Comment has been collapsed.
I do use a spicy AI chat, and tried a few others of the like, including RPG ones, and ChatGPT for deeper topics. The number 1 thing I need is not JUST relying on chat history for memory. The amount of time it's forgotten details because it's relied on chat logs only. This is especially important for RPGs. So yeah, a really decent, integrated memory that it can move important information into a permanent repository, until it's no longer required, but also something the user can manipulate/edit/delete/add, as needed.
Comment has been collapsed.
The problem you have with that aspect is every AI will lose itself as the conversations lengthen. In order to not do so, for as long as possible, will simply boost the minimum system requirements well and truly past most people's systems specs. Quite literally you could have memory retention and attention to detail with the program capability for specific detailed function, i.e you could theoretically program the AI to have the capacity to perform as a D&D GM, while retaining all information regarding characters, locations, rolls and more. However, I almost guarantee the standalone computer wouldn't be able to meet the requirements if I was to do that and even then, it will get confused the moment you try implement it with Magic the Gathering details for example.
This is one of the reasons why every AI program, even things like, including those who pay for it, chatGPT will inevitably lose itself in conversation. Just imagining the hardware requirements, put my recent $32,000 odd, dental bill to shame ^^
Comment has been collapsed.
I'm not suggesting everything needs to be stored that way, you have a lot of fluff and world building stuff that can easier stay in chat logs, but things like location, inventory, powers/skills/abilities/etc, important decisions, etc which can be moved into permanent memory. ChatGPT has memory, even though it doesn't seem to be handled well IMHO.
Comment has been collapsed.
You have somewhat peaked a curiosity in me. May I ask, what RPG it is you are trying to play with AI? I feel like i could make an AI driven program that would allow this. I honestly don't know what limitations I would be facing and that's the curiosity part. However I need to know what RPG specifically it is. More could be added to it later i suppose but i would start with just 1 if I was going to look into it. ^^
Comment has been collapsed.
17 Comments - Last post 16 minutes ago by artion33
5 Comments - Last post 17 minutes ago by DeliberateTaco
2,650 Comments - Last post 24 minutes ago by MeguminShiro
295 Comments - Last post 36 minutes ago by maddima
47,368 Comments - Last post 1 hour ago by drschnell
30 Comments - Last post 1 hour ago by Atombomb2097
1,535 Comments - Last post 1 hour ago by gszp
665 Comments - Last post 5 minutes ago by mourinhos86
94 Comments - Last post 34 minutes ago by davidpfarrell
60 Comments - Last post 1 hour ago by RosimInc
315 Comments - Last post 1 hour ago by CultofPersonalitea
3,952 Comments - Last post 1 hour ago by Midnight12891
31 Comments - Last post 2 hours ago by Csiki
8,621 Comments - Last post 2 hours ago by sassdrake
So, a mate of mine starts rambling about this project idea of his — keeping it vague for now 'cause he's gonna try and market it or something — but basically it’s gonna involve AI, voice, the works. Since I’ve been off work recovering from some delightful dental surgery (10/10 don’t recommend), he asked if I could whip up a basic offline AI to help with his prototype.
One week later, in between games and wrangling the kids, I’ve somehow ended up knee-deep in a full-on desktop AI assistant. I’m calling it Version 0.8 for now, with my “MVP” version being 1.0.
Right now it uses FFmpeg, Whisper, LLaMA3, and Coqui TTS. It handles both text and voice input/output, caches WAVs, convos, user settings, and has a few colour themes 'cause who doesn’t love a bit of flair. Currently working on per-conversation caching and trying to make convos reference each other — which is as fun as it sounds.
Also, the AI voice? Sounds like a half-baked call centre operator. Absolutely cooked. I’m adding more voice options soon so it stops sounding like a robo-Karen trying to upsell me internet plans.
Performance-wise, I’ve managed to take voice response from "go make a cuppa" times down to about 6–8 seconds, thanks to streaming chunked WAVs and throwing the GPU at it. Still not lightning, but hey, it’s no longer yelling into the void and waiting for enlightenment.
Anyway, point is — since I was putting together a train anyway, thought I’d ask: anyone got feature ideas? Already blown past what my mate expected, so I’ve got a pretty hefty roadmap going. But I’m all ears for wild suggestions, practical or ridiculous.
Here is your entry to a progressive train. Good Luck and Enjoy ^^
Just finalised the addition of allowing the creation of different conversations, user defined conversation titles, conversational tabbing, persistent / cached conversations and deleting conversations ^^ Currently the entire App is 755 Megabytes. Let's watch that expand >.<
Comment has been collapsed.