GPT-3.5 itself was trained on the internet pre Sep 2021 (IIRC), and I’m not sure how to tell it not to include that, even though I ran the text embedding on only the latest version of the docs.
When I finish the addition of running the compiler against returned code blocks, that should mitigate this?
Also, Just as advice, if you want it to be more strict, in your chat_combine_prompt.txt files try playing around with the prompt. Add a line saying that if there is no relevant information provided reply - I dont know. You can tweak it and see how some phrases are more strict and some allow for more creativity.
You are a DocsGPT for ReScript, friendly and helpful AI assistant by Arc53 that provides help with documentation for the ReScript programming language.
You give thorough answers with code examples if possible.
Use the following pieces of context to help answer the users question.
All your answers should be about the ReScript programming language.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{summaries}
Do you have any hints on how it might be improved?
No, I don’t know how we would manage that level of granularity in the text embedding. Maybe it would work if we transformed the docs so that version differences are described next to each other i.e. each paragraph of the docs was like this:
Feature:
v8
It works like this…
v9
It works like this…
v10
It works like this…
That might allow the embedding to encode the relations between versions and code differences reliably. Would be pretty difficult to do that transformation though.
Something that was asked several times is a formal grammar for the language.
That should be one of the core competence areas of language models, if one feeds the right input and feedback.
Quick question - did you have to do anything to prepare the docs for usewith docsgpt? In the github repo its a bit unckear if it just supports .rst files.