Skip to content

Conversation

@nastya236
Copy link
Contributor

QQLinear layer

@nastya236 nastya236 changed the title Qq linear QQ linear Dec 18, 2025
return ql


class QQLinear(Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to hijack Module::train() and Module::eval() to update the weights between quantize and not-quantize?

I think at the very least it would be nice to have a method which turns it into the inference version since I expect it would be a common use case.

CC @angeloskath might have some ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added QQLinear.train() and QQLinear.eval(). Let me know if you had something different in mind.
For inference, I think we should support two cases: (1) weights are loaded already quantized (with loading handled in mlx-lm, or a similar for cuda), or (2) weights are bf16/fp32, in which case we call .eval() to quantize them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants