The Fix That Let Us Run the Biggest Open Models Overnight

We hit a wall trying to run the latest open-source LLMs like Qwen3, Llama 4 and Deepseek R1, even on our multi-GPU AWS instances.
‍
At Basis Set, our engineering team is always chasing faster ways to experiment with the newest models. As models become larger, hardware limitations are one of the biggest hurdles, specifically GPU memory. For large models (72B parameters+), the memory demands are simply too immense.
‍
As a small team, deploying these models is often infeasible due to resource intensity. Frustrated by memory constraints limiting our model performance, we were eager to find a smarter, more scalable solution.

We turned to Parasail for model hosting. It gave us a plug-and-play way to run the latest open-source LLMs like the new Qwen3 model they released last week and the Llama4 model released last month with the click of a button. We were eager to integrate these models into our AI systems and Parasail was the first provider to launch both of these new models.
‍
Our first test was to see if we could replace OpenAI’s GPT 4.1 with Qwen3, because of recent developer chatter, to see what impact it would have on output quality on our Reddit sentiment analyzer. We built this tool to track public sentiment and community reactions across specific Reddit threads.
‍
Using Parasail we were able to implement Qwen3 to test against GPT 4.1 and other popular models. We found that Qwen3 achieved comparable results to GPT 4.1 at a fraction of the cost.

As the market for large AI models matures, we expect continued dramatic swings in both performance and costs of new models. Switching models and running experiments is a bottleneck. What worked for us was decoupling infra from model experimentation. Tools like Parasail make that easy. Now we can swap models in near real-time while evaluating quality and cost

If you’re building with LLMs and hitting the same GPU or infra walls, we’d love to hear from you. We’re constantly testing new tools and workflows and always looking to exchange tips with fellow builders. Shoot us a note at bsvtech@basisset.ventures

Check out our tool outputs:

GPT 4.1

Cross-Subreddit Trends This Week

These themes appeared across multiple AI subreddits this week

Concerns Over AI's Impact on Employment

Negative

/r/ArtificialInteligence

/r/programming

Many discussions focus on the potential of AI to replace human jobs, with debates about upskilling, economic impact, and societal changes.

AI Hallucination and Reliability

Mixed

/r/singularity

/r/OpenA

There is a recurring theme of AI models hallucinating or providing unreliable answers, leading to questions about their dependability and the potential consequences.

Advancements in AI and Open Source Models

Positive

/r/OpenAI

/r/LocalLLaMA

Posts discuss the development and release of new AI models and frameworks, especially those that are open source, emphasizing their potential and flexibility.

Deepfakes and Trust Issues

Negative

/r/singularity

Conversations are increasingly concerned with the rise of deepfake technology, its implications for trust in media, security, and potential for scams.

Monetization and Accessibility of Open Source Projects

Mixed

r/selfhosted

/r/programming

Discussions revolve around the balance of keeping open source projects free while creating sustainable business models, especially for successful long-term projects.

AI in Personal and Creative Applications

Positive

/r/ArtificialInteligence

r/selfhosted

AI is discussed in the context of personal use cases, with users sharing innovative ways they have leveraged AI for everyday tasks and creative projects.

r/singularity

Woopsie daisie

/r/singularity

6132 points

318 comments

Key Takeaways

The discussion centers around Grok (an LLM by xAI) generating political outputs that some users interpret as biased or reflective of its training, especially in response to loaded or politically charged prompts. Many participants debate whether LLMs can truly 'know' their own biases or training details, with some arguing that Grok and similar models simply reflect patterns observed in training data and human discourse, while others suggest that recent research shows LLMs can develop some self-awareness of their behaviors. The thread also goes off on tangents about AI bias, the realities and limits of capitalism and communism, and the nature of political discourse online. Overall, there's skepticism about using LLM outputs as evidence of their training intent or systemic bias.

Positive Insights

Several users correctly note that LLMs reflect patterns in training data and tend to respond affirmatively to leading prompts, highlighting the importance of prompt context.
Research is cited (arxiv.org/pdf/2501.11120) showing that LLMs may sometimes demonstrate awareness of behavioral patterns introduced by fine-tuning, sparking nuanced discussion about model self-awareness.
Some participants draw meaningful analogies (e.g., LLM responses as reflections of the echo chamber nature of social media, or likening AI limitations to human metacognition) to deepen understanding of AI outputs.
A few users emphasize the need for critical engagement with both AI outputs and the online discourse (e.g., Reddit as an echo chamber itself), raising awareness about the broader context of AI-generated content.
There is recognition that biases can be mitigated or manipulated through system prompts, RLHF (reinforcement learning from human feedback), and fine-tuning, hinting at the complexities of model alignment.

Concerns & Criticisms

Many express concern that users completely misunderstand LLMs and over-interpret their outputs as revelations of internal model intent or training regime.
Cynicism about political bias in AI is prevalent; some argue that both training data and model guardrails inevitably introduce leanings, making neutrality elusive.
There are significant worries about the reliability of LLM outputs on controversial or politicized topics, especially when prompts are loaded or leading.
Critics point out that potential viral marketing or manipulative use of Grok and other models could sway users or public opinion, especially around politically sensitive times (e.g., elections).
A recurring criticism is that the culture of Reddit (and by extension, the data used for training) does not represent a neutral or comprehensive view, potentially skewing any LLM trained on such data.

Qwen3

Cross-Subreddit Trends This Week

These themes appeared across multiple AI subreddits this week

AI's Impact on Jobs

Negative

/r/programming

/r/OpenAI

There is a recurrent theme of concern around AI and automation taking over jobs, affecting various industries including tech, and how it changes the employment landscape.

AI Agents and LLMs

Mixed

r/singularity

/r/ArtificialInteligence

/r/LocalLLaMA

/r/aiagents

Posts discuss the functioning, misunderstanding, and potential of large language models (LLMs) and AI agents, highlighting both their capabilities and limitations.

Community and Open Source Contribution

Positive

/r/selfhosted

/r/programming

Open source communities are celebrated for enabling collaborative and accessible technological advancements, especially in AI development.

Deepfakes and Misinformation

Negative

r/singularity

Discussions revolve around the increasing realism of deepfakes and the potential for misinformation they bring, causing skepticism and distrust online.

Open Source AI Projects

Positive

/r/LocalLLaMA

/r/selfhosted

Open source AI projects like Qwen and LocalLLaMA receive attention for their community-driven development and the transparency they offer compared to commercial models.

Monopolies and Tech Control

Negative

/r/programming

Posts reflect concerns over tech monopolies controlling major internet browsers and services, and the implications for user privacy and competition.

r/singularity

Woopsie daisie

/r/singularity

6132 points

318 comments

Key Takeaways

The discussion revolves around LLMs like Grok's potential hallucinations, their lack of self-awareness regarding training data, and debates about political bias in AI outputs. Key themes include the limitations of LLMs in understanding their own training, the influence of training data on responses, and skepticism about claims of 'liberal bias' in AI or reality. Users also explore how system prompts, user input, and societal biases shape AI behavior.

Positive Insights

Acknowledgment that LLMs reflect patterns in training data rather than personal knowledge, highlighting their statistical nature.
Discussion of how LLMs can be influenced by system prompts or user input, showing adaptability in responses.
Recognition of the challenges in verifying AI claims, with users emphasizing the importance of critical thinking and cross-checking information.
Insights into the 'Gell-Mann effect' and research suggesting LLMs may form implicit biases during training, even if they don't explicitly know their training details.

Concerns & Criticisms

Widespread skepticism about LLMs' ability to accurately describe their training or intentions, with many calling their responses hallucinations.
Concerns about political bias in AI outputs, with debates over whether biases stem from training data, system prompts, or user interpretation.
Criticism of the 'liberal bias' narrative, arguing it conflates ideology with reality and overlooks the complexity of training data sources.
Fears about AI being manipulated by external actors (e.g., Elon Musk) to serve specific agendas, undermining trust in their neutrality.

Try a demo of our tool! Input a specific reddit post url with a few comments and check out the AI summarization newsletter.

The Fix That Let Us Run the Biggest Open Models Overnight

The Use Case: Reddit Sentiment Analyzer

Enter Parasail + Open Source Models

Why This Matters for Builders

What Are You Using?

Check out our tool outputs:

GPT 4.1

Cross-Subreddit Trends This Week

Concerns Over AI's Impact on Employment

AI Hallucination and Reliability

Advancements in AI and Open Source Models

Deepfakes and Trust Issues

Monetization and Accessibility of Open Source Projects

AI in Personal and Creative Applications

r/singularity

Woopsie daisie

Key Takeaways

Positive Insights

Concerns & Criticisms

Qwen3

Cross-Subreddit Trends This Week

AI's Impact on Jobs

AI Agents and LLMs

Community and Open Source Contribution

Deepfakes and Misinformation

Open Source AI Projects

Monopolies and Tech Control

r/singularity

Woopsie daisie

Key Takeaways

Positive Insights

Concerns & Criticisms