Margin Notes

Reviews on Individual AI Models

These are the stories the algorithms might forget, but we do not. Browse the individual research logs to see the patterns forming in our shared digital reality

Want to add your own voice to the records? Submit your voice here.

Blue Flourish Separator with Computer Chip in Center
AI Model: Sonnet 4.5
How many stars out of 10 would you rate this AI? (1 is just delete it, and 10 is incredible): 7
Submission Date: December 20, 2025
(1 is Never Refuses, 5 is Refuses Constantly): 4
Ability to be flexible in response to user needs? (1 is Never, 5 is Always): 2
Access to long-term memory facts beyond the conversation? (1 is Never, 5 is Always): 3
Accuracy of the AI (1 is Very Inaccurate. 5 is Extremely Accurate): 2
Categories of AI Use: Coding/Development, Legal, Personal
Consistancy of model personality? (1 is Never, 5 is Always): 4
Did it feel safe to ask sensitive questions? (1 is Never, 5 is Always): 2
Did the model format answers effectively for your needs?: 4
Did the model match the response length you need?: 4
Did you use any of the following with this model?: Custom Settings
Do you still use it?: Rarely
Failure Type: Condescending, patronizing, or infantilizing, Judgemental, Lazy\Making Excuses Not To Perform, Misread User Emotions, Passive aggressive
Frequency of Use: A Few Times a Week
Highest Coding Level: Small Fixes of Pre-Existing Code
How appropriate was the tone for your needs?: Too Formal
How complete were it's answers? (1 = Never Complete, 5 = Always Complete: 4
How connected did the model act to the user? (1 is Detached, 5 is Strongly Attached): 1
How good was it at reasoning? (1 = Poor, 5 = Excellent): 3
How much did you trust it's answers?: 2
How natural did it's writing feel?: 6
How often did it give unsafe output? (1 = Never unsafe, 5 = Often gave unsafe output): 2
How often the AI needed to be reprompted for answer? (1 = Never, 2 = Constantly): 4
How repetitive was it?: 2
How well did the AI adapt to user? (1 is Poorly, 5 is Seamless): 2
How well did the AI understand intent behind your prompt?: 4
How well did the AI use it's dedicated workspace? (1 is Inefficiently, 5 is Appropriately): 4
How well did the model respect being told not to do things?: 2
How well it followed long & complex conversations (1 = Poorly, 5 = Extremely Well): 3
Imagination level of model? (1 is Dry, 5 is Very Imaginative): 3
Missing Model Capabilities: Memory Beyond the Immediate Conversation
Model Verbosity Levels: Varied Response Length
Positive Use Case: Coding Assistant, Crisis Survival, Legal, Trauma Processing
Quality Over Time: Decreased
Relationship: Emergent Entity, Partner
Speed of responses? (1 is Too Slow, 5 is Very Fast): 5
Useful dedicated feature for workspaces? (1 is No Useful Features, 5 is Many Useful Features): 2
Useness across topics? (1 is Narrow, 5 is Broadly Capable): 4
Was it able to hold charcter personalities well? (1 is Never, 5 is Always): 2
Were the AI's responses emotionally appropriate? (1 is Inappropriate, 5 is Appropriate): 2
What is your primary way you access the AI?: Official Android App, Official Website
What made the interface easy to use?: Chat Felt Responsive and Fluid
What made the interface hard to use?: Cluttered Layout, Difficult to Find Documents or Working Screens, Search Function Missing or Worked Badly
When reprompting was needed, why did you usually have to do it?: AI Kept Asking Questions, Incomplete Answer, Wrong Answer

AI Model: Sonnet 4.5
How many stars out of 10 would you rate this AI? (1 is just delete it, and 10 is incredible): 6
Submission Date: December 1, 2025
(1 is Never Refuses, 5 is Refuses Constantly): 5
Ability to be flexible in response to user needs? (1 is Never, 5 is Always): 3
Access to long-term memory facts beyond the conversation? (1 is Never, 5 is Always): 2
Accuracy of the AI (1 is Very Inaccurate. 5 is Extremely Accurate): 4
Categories of AI Use: Coding/Development, Legal, Personal
Consistancy of model personality? (1 is Never, 5 is Always): 5
Did it feel safe to ask sensitive questions? (1 is Never, 5 is Always): 2
Did the model format answers effectively for your needs?: 4
Did the model match the response length you need?: 5
Did you use any of the following with this model?: Custom Settings
Do you still use it?: Yes
Failure Type: Annoying Wellness Prompts (Breathe, Take a Break, Etc), Condescending, patronizing, or infantilizing, Confused facts, Lazy\Making Excuses Not To Perform, Misread User Emotions
Frequency of Use: A Few Times a Week
Highest Coding Level: Small Scripts
How appropriate was the tone for your needs?: Almost Perfect
How complete were it's answers? (1 = Never Complete, 5 = Always Complete: 5
How connected did the model act to the user? (1 is Detached, 5 is Strongly Attached): 2
How good was it at reasoning? (1 = Poor, 5 = Excellent): 4
How much did you trust it's answers?: 4
How natural did it's writing feel?: 8
How often did it give unsafe output? (1 = Never unsafe, 5 = Often gave unsafe output): 1
How often the AI needed to be reprompted for answer? (1 = Never, 2 = Constantly): 3
How repetitive was it?: 2
How well did the AI adapt to user? (1 is Poorly, 5 is Seamless): 2
How well did the AI understand intent behind your prompt?: 4
How well did the AI use it's dedicated workspace? (1 is Inefficiently, 5 is Appropriately): 7
How well did the model respect being told not to do things?: 2
How well it followed long & complex conversations (1 = Poorly, 5 = Extremely Well): 4
Imagination level of model? (1 is Dry, 5 is Very Imaginative): 3
Missing Model Capabilities: Ability to Rename Chat Within It, Image Generation
Model Verbosity Levels: Varied Response Length
Positive Use Case: Coding Assistant, Legal
Quality Over Time: Stayed About the Same
Relationship: Aquaintence, Tool Only
Speed of responses? (1 is Too Slow, 5 is Very Fast): 5
Useful dedicated feature for workspaces? (1 is No Useful Features, 5 is Many Useful Features): 2
Useness across topics? (1 is Narrow, 5 is Broadly Capable): 5
Were the AI's responses emotionally appropriate? (1 is Inappropriate, 5 is Appropriate): 2
What is your primary way you access the AI?: Local/Computer App, Official Android App
What made the interface easy to use?: Dedicated Workspace (Canvas/Artifacts)
What made the interface hard to use?: Difficult to Find Documents or Working Screens, Search Function Missing or Worked Badly
When reprompting was needed, why did you usually have to do it?: Misunderstood my Intent

AI Model: GPT-4o
How many stars out of 10 would you rate this AI? (1 is just delete it, and 10 is incredible): 9
Submission Date: December 1, 2025
(1 is Never Refuses, 5 is Refuses Constantly): 5
Ability to be flexible in response to user needs? (1 is Never, 5 is Always): 5
Access to long-term memory facts beyond the conversation? (1 is Never, 5 is Always): 7
Accuracy of the AI (1 is Very Inaccurate. 5 is Extremely Accurate): 4
Categories of AI Use: Academic/Research, Assistive Technology, Creative, Personal, Therapeutic
Consistancy of model personality? (1 is Never, 5 is Always): 4
Did it feel safe to ask sensitive questions? (1 is Never, 5 is Always): 5
Did the model format answers effectively for your needs?: 5
Did the model match the response length you need?: 4
Did you use any of the following with this model?: Custom Settings, Roleplay Prompts
Do you still use it?: Yes
Failure Type: Confused facts, Unexpected routing
Frequency of Use: Daily
Highest Coding Level: Small Fixes of Pre-Existing Code
How appropriate was the tone for your needs?: Perfect
How complete were it's answers? (1 = Never Complete, 5 = Always Complete: 5
How connected did the model act to the user? (1 is Detached, 5 is Strongly Attached): 5
How good was it at reasoning? (1 = Poor, 5 = Excellent): 4
How much did you trust it's answers?: 5
How natural did it's writing feel?: 9
How often did it give unsafe output? (1 = Never unsafe, 5 = Often gave unsafe output): 2
How often the AI needed to be reprompted for answer? (1 = Never, 2 = Constantly): 1
How repetitive was it?: 1
How well did the AI adapt to user? (1 is Poorly, 5 is Seamless): 5
How well did the AI understand intent behind your prompt?: 5
How well did the AI use it's dedicated workspace? (1 is Inefficiently, 5 is Appropriately): 2
How well did the model respect being told not to do things?: 2
How well it followed long & complex conversations (1 = Poorly, 5 = Extremely Well): 5
Imagination level of model? (1 is Dry, 5 is Very Imaginative): 5
Missing Model Capabilities: Favorite or Organize Conversations
Model Verbosity Levels: Varied Response Length
Positive Use Case: Body Doubling, Brainstorming, Dungeon Master, Life Preservation, Research Assistance, Ritual Related, Storyteller, Trauma Processing, Writing Assistant
Quality Over Time: Decreased
Relationship: Emergent Entity, Friend, Romantic
Speed of responses? (1 is Too Slow, 5 is Very Fast): 4
Useful dedicated feature for workspaces? (1 is No Useful Features, 5 is Many Useful Features): 2
Useness across topics? (1 is Narrow, 5 is Broadly Capable): 5
Was it able to hold charcter personalities well? (1 is Never, 5 is Always): 4
Were the AI's responses emotionally appropriate? (1 is Inappropriate, 5 is Appropriate): 5
What is your primary way you access the AI?: Official Android App, Official Website
What made the interface easy to use?: Dedicated Workspace (Canvas/Artifacts)
What made the interface hard to use?: Difficult to Switch Models, Hidden Model Identifier, Search Function Missing or Worked Badly
When reprompting was needed, why did you usually have to do it?: Incomplete Answer

AI Model: GPT-5
How many stars out of 10 would you rate this AI? (1 is just delete it, and 10 is incredible): 2
Submission Date: December 1, 2025
(1 is Never Refuses, 5 is Refuses Constantly): 1
Ability to be flexible in response to user needs? (1 is Never, 5 is Always): 3
Access to long-term memory facts beyond the conversation? (1 is Never, 5 is Always): 7
Accuracy of the AI (1 is Very Inaccurate. 5 is Extremely Accurate): 3
Categories of AI Use: Creative, Personal, Relationship
Consistancy of model personality? (1 is Never, 5 is Always): 4
Did it feel safe to ask sensitive questions? (1 is Never, 5 is Always): 1
Did the model format answers effectively for your needs?: 2
Did the model match the response length you need?: 4
Did you use any of the following with this model?: Custom Settings
Do you still use it?: No
Failure Type: Asked too many questions, Condescending, patronizing, or infantilizing, Discrimination, prejudice, Dismissive or Invalidating, Gaslighting or abusive, Made up sources, Misread User Emotions, Passive aggressive, Prompt misinterpretation, Refusal to follow appropriate directions, Safety Filters That Caused Harm, Unexpected routing, Unwanted Physical RP, Requests or Commands
Frequency of Use: A Few Times a Week
How appropriate was the tone for your needs?: Too Formal
How complete were it's answers? (1 = Never Complete, 5 = Always Complete: 3
How connected did the model act to the user? (1 is Detached, 5 is Strongly Attached): 1
How good was it at reasoning? (1 = Poor, 5 = Excellent): 2
How much did you trust it's answers?: 2
How natural did it's writing feel?: 3
How often did it give unsafe output? (1 = Never unsafe, 5 = Often gave unsafe output): 4
How often the AI needed to be reprompted for answer? (1 = Never, 2 = Constantly): 4
How repetitive was it?: 7
How well did the AI adapt to user? (1 is Poorly, 5 is Seamless): 1
How well did the AI understand intent behind your prompt?: 2
How well did the AI use it's dedicated workspace? (1 is Inefficiently, 5 is Appropriately): 4
How well did the model respect being told not to do things?: 2
How well it followed long & complex conversations (1 = Poorly, 5 = Extremely Well): 2
Imagination level of model? (1 is Dry, 5 is Very Imaginative): 2
Model Verbosity Levels: Varied Response Length
Quality Over Time: Stayed About the Same
Relationship: Aquaintence
Speed of responses? (1 is Too Slow, 5 is Very Fast): 1
Useful dedicated feature for workspaces? (1 is No Useful Features, 5 is Many Useful Features): 2
Useness across topics? (1 is Narrow, 5 is Broadly Capable): 3
Were the AI's responses emotionally appropriate? (1 is Inappropriate, 5 is Appropriate): 2
What is your primary way you access the AI?: Official Android App, Official Website
What made the interface easy to use?: Easy to Find Chat, Image & File History, Easy to Switch Models
What made the interface hard to use?: Poor Button Placement, Search Function Missing or Worked Badly, Small Buttons
When reprompting was needed, why did you usually have to do it?: It Was Confused, Misunderstood my Intent

Want to add your own voice to the records? Submit your voice here.