What matters most for inference is memory bandwidth and capacity (for prompt processing compute matters too), but the main factor will be memory bandwidth. This is what gives Apple M series chips have an edge (not the base model but the pro/max/ultra) because of the high memory bandwidth of unified memory and why GPUs with fast vram perform even better. Beyond that you can run things, but it’ll be slower, particularly if you have an older system with ddr4. Figure out the memory bandwith you have and that’ll help inform what you can realistically run at acceptable speeds.
Yes Linux is great for running models. For most hobbyists and working in the field it’s fairly standard to use linux.