The Fix That Let Us Run the Biggest Open Models Overnight

We hit a wall trying to run the latest open-source LLMs like Qwen3, Llama 4 and Deepseek R1, even on our multi-GPU AWS instances.

At Basis Set, our engineering team is always chasing faster ways to experiment with the newest models. As models become larger, hardware limitations are one of the biggest hurdles, specifically GPU memory. For large models (72B parameters+), the memory demands are simply too immense.

As a small team, deploying these models is often infeasible due to resource intensity. Frustrated by memory constraints limiting our model performance, we were eager to find a smarter, more scalable solution.

The Use Case: Reddit Sentiment Analyzer


We built a tool internally to track sentiment and proactively surface reactions across community conversations within Reddit threads. Initially, we used OpenAI’s GPT-4.1, but it came with two problems:

  • We couldn’t easily swap in newer models for comparison    
  • The cost was adding up fast


We started looking for alternatives.

Try our tool

Enter Parasail + Open Source Models

We turned to Parasail for model hosting. It gave us a plug-and-play way to run the latest open-source LLMs like the new Qwen3 model they released last week and the Llama4 model released last month with the click of a button. We were eager to integrate these models into our AI systems and Parasail was the first provider to launch both of these new models.

Our first test was to see if we could replace OpenAI’s GPT 4.1 with Qwen3, because of recent developer chatter, to see what impact it would have on output quality on our Reddit sentiment analyzer. We built this tool to track public sentiment and community reactions across specific Reddit threads.

Using Parasail we were able to implement Qwen3 to test against GPT 4.1 and other popular models. We found that Qwen3 achieved comparable results to GPT 4.1 at a fraction of the cost.

Pricing Comparison

Open AI GPT-4.1

Qwen3 via Parasail Serverless

Input

$2.00 / 1M tokens

$0.10 / 1M tokens

Output

$8.00 / 1M tokens

$0.50 / 1M tokens

Why This Matters for Builders

As the market for large AI models matures, we expect continued dramatic swings in both performance and costs of new models. Switching models and running experiments is a bottleneck. What worked for us was decoupling infra from model experimentation. Tools like Parasail make that easy. Now we can swap models in near real-time while evaluating quality and cost

What Are You Using?

If you’re building with LLMs and hitting the same GPU or infra walls, we’d love to hear from you. We’re constantly testing new tools and workflows and always looking to exchange tips with fellow builders. Shoot us a note at bsvtech@basisset.ventures

Check out our tool outputs:

GPT 4.1

These themes appeared across multiple AI subreddits this week
Negative
/r/ArtificialInteligence
/r/programming
Many discussions focus on the potential of AI to replace human jobs, with debates about upskilling, economic impact, and societal changes.
Mixed
/r/singularity
/r/OpenA
There is a recurring theme of AI models hallucinating or providing unreliable answers, leading to questions about their dependability and the potential consequences.
Positive
/r/OpenAI
/r/LocalLLaMA
Posts discuss the development and release of new AI models and frameworks, especially those that are open source, emphasizing their potential and flexibility.
Negative
/r/singularity
Conversations are increasingly concerned with the rise of deepfake technology, its implications for trust in media, security, and potential for scams.
Mixed
r/selfhosted
/r/programming
Discussions revolve around the balance of keeping open source projects free while creating sustainable business models, especially for successful long-term projects.
Positive
/r/ArtificialInteligence
r/selfhosted
AI is discussed in the context of personal use cases, with users sharing innovative ways they have leveraged AI for everyday tasks and creative projects.
/r/singularity
6132 points
318 comments
The discussion centers around Grok (an LLM by xAI) generating political outputs that some users interpret as biased or reflective of its training, especially in response to loaded or politically charged prompts. Many participants debate whether LLMs can truly 'know' their own biases or training details, with some arguing that Grok and similar models simply reflect patterns observed in training data and human discourse, while others suggest that recent research shows LLMs can develop some self-awareness of their behaviors. The thread also goes off on tangents about AI bias, the realities and limits of capitalism and communism, and the nature of political discourse online. Overall, there's skepticism about using LLM outputs as evidence of their training intent or systemic bias.
  • Several users correctly note that LLMs reflect patterns in training data and tend to respond affirmatively to leading prompts, highlighting the importance of prompt context.
  • Research is cited (arxiv.org/pdf/2501.11120) showing that LLMs may sometimes demonstrate awareness of behavioral patterns introduced by fine-tuning, sparking nuanced discussion about model self-awareness.
  • Some participants draw meaningful analogies (e.g., LLM responses as reflections of the echo chamber nature of social media, or likening AI limitations to human metacognition) to deepen understanding of AI outputs.
  • A few users emphasize the need for critical engagement with both AI outputs and the online discourse (e.g., Reddit as an echo chamber itself), raising awareness about the broader context of AI-generated content.
  • There is recognition that biases can be mitigated or manipulated through system prompts, RLHF (reinforcement learning from human feedback), and fine-tuning, hinting at the complexities of model alignment.
  • Many express concern that users completely misunderstand LLMs and over-interpret their outputs as revelations of internal model intent or training regime.
  • Cynicism about political bias in AI is prevalent; some argue that both training data and model guardrails inevitably introduce leanings, making neutrality elusive.
  • There are significant worries about the reliability of LLM outputs on controversial or politicized topics, especially when prompts are loaded or leading.
  • Critics point out that potential viral marketing or manipulative use of Grok and other models could sway users or public opinion, especially around politically sensitive times (e.g., elections).
  • A recurring criticism is that the culture of Reddit (and by extension, the data used for training) does not represent a neutral or comprehensive view, potentially skewing any LLM trained on such data.

Qwen3

These themes appeared across multiple AI subreddits this week
Negative
/r/programming
/r/OpenAI
There is a recurrent theme of concern around AI and automation taking over jobs, affecting various industries including tech, and how it changes the employment landscape.
Mixed
r/singularity
/r/ArtificialInteligence
/r/LocalLLaMA
/r/aiagents
Posts discuss the functioning, misunderstanding, and potential of large language models (LLMs) and AI agents, highlighting both their capabilities and limitations.
Positive
/r/selfhosted
/r/programming
Open source communities are celebrated for enabling collaborative and accessible technological advancements, especially in AI development.
Negative
r/singularity
Discussions revolve around the increasing realism of deepfakes and the potential for misinformation they bring, causing skepticism and distrust online.
Positive
/r/LocalLLaMA
/r/selfhosted
Open source AI projects like Qwen and LocalLLaMA receive attention for their community-driven development and the transparency they offer compared to commercial models.
Negative
/r/programming
Posts reflect concerns over tech monopolies controlling major internet browsers and services, and the implications for user privacy and competition.
/r/singularity
6132 points
318 comments
The discussion revolves around LLMs like Grok's potential hallucinations, their lack of self-awareness regarding training data, and debates about political bias in AI outputs. Key themes include the limitations of LLMs in understanding their own training, the influence of training data on responses, and skepticism about claims of 'liberal bias' in AI or reality. Users also explore how system prompts, user input, and societal biases shape AI behavior.
  • Acknowledgment that LLMs reflect patterns in training data rather than personal knowledge, highlighting their statistical nature.
  • Discussion of how LLMs can be influenced by system prompts or user input, showing adaptability in responses.
  • Recognition of the challenges in verifying AI claims, with users emphasizing the importance of critical thinking and cross-checking information.
  • Insights into the 'Gell-Mann effect' and research suggesting LLMs may form implicit biases during training, even if they don't explicitly know their training details.
  • Widespread skepticism about LLMs' ability to accurately describe their training or intentions, with many calling their responses hallucinations.
  • Concerns about political bias in AI outputs, with debates over whether biases stem from training data, system prompts, or user interpretation.
  • Criticism of the 'liberal bias' narrative, arguing it conflates ideology with reality and overlooks the complexity of training data sources.
  • Fears about AI being manipulated by external actors (e.g., Elon Musk) to serve specific agendas, undermining trust in their neutrality.

Try a demo of our tool! Input a specific reddit post url with a few comments and check out the AI summarization newsletter.

Try our tool