I definitely think that you can start from some high-level loose (but still quantitative goals):
"The user wants to see the web page fully rendered on average no more than 500ms after the first byte has been received"
That's a performance requirement / story that can be written very early on - and once enough of the system has been written (e.g. at integration
test or system test) then it can be measured - and you can see how close you are to the goal. If you have sufficient instrumentation, you can see where the latency budget is being spent, and do some quick tidy-up performance work before the system is live.
However - beware over-optimization, benchmarking on systems that aren't similar to PROD and expecting too much of the overall performance to be dependent on code choices.
I sometimes say: "For most applications, it doesn't matter at all whether you choose a foreach loop or a stream". You might be able to concoct a microbench where you think you can see a difference, but in a real application, that difference will be swamped by aggregation effects and noise in the system.