An amplified zener is a bit of a sneaky trick. A zener is a diode that has been selected so it reverse breaks at a very specific voltage, the zener voltage. It still conducts like a diode the other way. All by itself a (for example) 5V zener does not conduct much at all until it gets to just about 5V reverse bias, then it conducts all the current fed to it until it burns up. So they make decent voltage references and shunt regulators. But they do generate heat equal to the zener voltage times whatever current is going through them. A 1W 5V zener can be expected to dissipate 1W when the current through it is 1W/5V = 200ma. There are high power zeners, but not as many as there used to be.
You can use a lower power, maybe 1/2W zener by connecting it between the collector and base of an NPN transistor, and connecting a resistor between the base and emitter. The resistor value is fairly arbitrary, as all it has to conduct is the leakages of the zener and collector base junction before the zener starts really conducting. This is generally microamps. When the zener breaks over and conduct, most of the current goes into the base of the transistor, and is amplified by the current gain of the transistor. So for a transistor with gain of maybe 50, the transistor conducts fifty times as much current as the zener feeding its base. The total breakover voltage of this composite zener is the actual zener voltage (5V in this imaginary example) plus the voltage needed to make the base-emitter conduct, generally 0.6 to 0.7V. A composite amplified zener with a 5V zener, and an NPN with a gain of 50 will have a "zener voltage" of 5V + 0.6V as an estimate, and the transistor will both conduct 50 times the current and produce 50 times the heat of the zener.
A TO-220 can dissipate roughly 2W with just its tab sticking in the air, but with proper heat sinking, a TO-220 can dissipate 50W or so. So you can make a 20W-50W zener with a 1W zener feeding the base of the NPN. And TO-220s are definitely designed to get heat out of the transistor and into a heat sink.
One result of that line of thinking is that you want a TO-220 with enough dissipation capability for the job, and a reasonably high current gain. I went to the Mouser link you provided (thank you!) and sorted them by price, least expensive first. Then I started looking at the datasheets for power and Hfe.
The cheapest one is listed as having a current gain maybe as low as 8 or 9. AAAACK! We want more gain. I didn't find one I really liked until I got down to the MJ15030G, $1.59 each, but with a typical gain of 100 or over for currents up to 2A. We're talking about something under an amp or two here, so with a current gain of 100-ish, the transistor would be eating nearly all the power dissipation. That gives a much wider selection of lower power zeners. It's a good place to start. As a possible alternate, I changed the current specification to 5A and greater and found the D44H11, $0.90 each and with a current gain of 100-200 typically over the expected current ranges. Voltage capability isn't needed much in this application, as you're likely to see well under 30V collector base even considering transients.
The necessary base emitter resistor is ... well, arbitrary. It just has to hold the base-emitter down well under 0.6V Until you definitely want the zener to conduct. So it's time to pick a zener, then pick the resistor. A BZX79-3V3 will give you 3.3V nominal before conducting, plus 0.6V nominal on the transistor base-emitter for a combined 3.9V or so. The BZX79-3V3 has a nominal leakage of 5uA. Either NPN will have Iceo of about 10uA, so the resistor needs to keep 15uA under 0.5V or so; pick a resistor smaller than 0.5V/15uA, which is 3.3K. A 1K to 3K there will work fine. Actually, you could probably use anything between 1K and 10K.
The actual breakover voltage will be somewhat sloppy. A BZX79-3V3 doesn't break at exactly 3.3000v, and a bipolar does not conduct exactly at 0.6V. The composite will have a zener voltage of maybe 3.9 to 4.1 depending on how the devices happen to come it. This is one reason not to use a MOSFET for the transistor. They have huge "current gains" OK, but there is usually one to two volts of tolerance on exactly when they'll start conducting. Bipolars are less variable about their cut-in voltage.
One more fringe kind of thing you can do is to use a Vbe multiplier. This eliminates the zener entirely, but at the cost of more thinking and worrying about the cut-in voltage of a bipolar. If you use an NPN and put a resistor (R1) from collector to base and a resistor (R2) from base to emitter, then start increasing the voltage from collector to emitter, the transistor does not conduct until Vbe gets to about 0.6V, the sloppy diode drop of a silicon junction. What does happen is that current leaks through the two resistors in series as a voltage divider. So the base-emitter voltage rises according to the voltage divider. For a total voltage from collector to emitter of "V", then Vbe is
V * R2/(R1+R2)
But when the Vbe hits 06V, the transistor starts conducting. It does this just enough to keep the base voltage from rising more. So the total voltage on the collector to emitter rises until it's Vbe times (R1+R2)/R2. That is, it looks like a zener that's a multiplied version of Vbe. R1 and R2 have to be picked so they let through enough base current to make this all come true, so there's some math work to do there, but this all works, and is the preferred way of biasing linear power amplifiers, as the temperature dependence of Vbe is also amplified by it. Since it's just a resistor ratio, you can make one of the resistors a pot and adjust the total breakover voltage.
Sigh. So many designs to do, so little time.

"It's not what we don't know that gets us in trouble. It's what we know for sure that just ain't so"
Mark Twain