Skip to content

Conversation

@glenn-slayden
Copy link
Contributor

Currently, the status of the PC register value is inconsistent during any callback, depending on the precise details of the conditions that generate the callback.

This PR changes this behavior so that the PC value observed during any FETCH, READ or WRITE callback has a consistent interpretation, which is that it always still reflects the unadvanced PC value from the beginning of the instruction that is enacting that callback operation.

Of course the actual effective address for the operation (as passed in through the callback argument) is unaffected.

Because this PR inherently addresses the outwards visibility of a stored state value, it is obviously a breaking change relative to the prior behavior of your code. For this reason, I am offering this PR as a courtesy only, which you may feel free to ignore or adopt as you wish.


Legal notice (do not delete)

Contributors are required to assign copyright to the original author of the project so that he retains full ownership. This ensures that other entities can use the software more easily, as they only need to deal with a single copyright holder. It also provides the original author with the assurance that he can make decisions in the future without needing to consult or obtain consent from all contributors.

By submitting this pull request (PR), you agree to the following:

You hereby assign the copyright in the code included in this pull request to the original author of the Z80 library, Manuel Sainz de Baranda y Goñi, to be licensed under the same terms as the rest of the code. You also agree to relinquish any and all copyright interest in the software, to the detriment of your heirs and successors.

@glenn-slayden
Copy link
Contributor Author

I just noticed that your current code attempts to fix this issue—for IMO only, where it matters more substantially for correctness—through the use of a post-hoc correction value, as listed in the im0_pc_decrement_table.

I did not test this PR under IM0 conditions, since my use case does not use IM0. But it is likely that this PR introduces a severe correctness bug that would be exposed in IM0 mode, since double adjustment would be occurring.

I believe this PR change of properly fixing the PC value to have a consistent interpretation under all interrupt modes is cleaner than doing the post-hoc adjustment for the IM0 case. If this cursory analysis is correct, adopting this PR would allow the im0_pc_decrement_table to simply be removed altogether, giving predicable PC interpretation in all modes.

However, there may be further complications that I do not understand, and I do not have the ability to fully test the necessary further code modification of removing your PC adjustment, as is probably required for correct IM0 operation.

I'm sure you realized all this right away as soon as you saw this PR, but I just thought I'd suggest that perhaps getting rid of im0_pc_decrement_table, if simple enough, might be appealing to you.

@glenn-slayden
Copy link
Contributor Author

Temporarily retracted.

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

There is a strong reason to increment PC before calling callbacks that receive a memory address as an argument, which depends on PC, for example, when reading instruction bytes. The reason is that when using the value of PC to calculate the memory address argument passed to the callback, we are already performing a memory access to (uint16_t)((uint8_t)self + offsetof(Z80, pc)). The callback, obviously, can modify the value of Z80::pc, so if we increment PC after returning from the callback, the compiler will generate code to read Z80::pc again after returning from the callback in order to increment it.

In other words, if we increment PC before calling the callbacks, we will only need to read its value once, whereas if we increment it afterward, we will need to read it twice.

Regarding im0_pc_decrement_table, it will always be necessary because the code emulating the IM0 needs to call the function that emulates the instruction to be executed, and that function, in most cases, will increment the value of PC. There are special cases for jumps and the CB prefix that are handled differently, but in all others, regardless of where PC is incremented within the function, the fact remains that it will be incremented. The correction of PC must be made after returning from the function that emulates the instruction, which is why the decrement table is needed.

However, there is an optimization that could be applied in certain cases, for example, in push, out (c),J, and possibly some others: we could increment PC before calling the callback that receives SP as an argument instead of PC. Incrementing PC before calling the callback would allow, on some architectures such as RISC-V, where the caller function is responsible for saving self, this step to be skipped. After returning from the callback, the function could simply return the number of clock cycles (without needing to restore self).

Regarding the purpose of this change: You should know that our intention has never been to guarantee the coherence of registers during the execution of an instruction. In other words, the correctness of the PC value and other registers is not ensured until the instruction has finished executing. As a result, we do not guarantee that a callback can access either the initial or final value of a register. Keep in mind that this emulator is used in projects involving very slow microcontrollers with very limited resources, and we have always prioritized speed when optimizing it.

That said, if what you need is to know when an instruction starts, you might want to take a look at my Z80InsnClock library. It could be useful to you.

@redcode
Copy link
Owner

redcode commented Mar 15, 2025

Thank you for the PR.

As Sofía has explained, PC is incremented before being used to compute the memory address passed to the callback for speed reasons.

In any case, perhaps it would be a good idea, as far as possible, to always increment PC at the beginning of the function, and not at the end as is done in some places, whenever the function calls a callback or another non-inline function to eliminate the need for the caller to restore its argument on some architectures.

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

In any case, perhaps it would be a good idea, as far as possible, to always increment PC at the beginning of the function, and not at the end as is done in some places, whenever the function calls a callback or another non-inline function to eliminate the need for the caller to restore its argument on some architectures.

This would only be necessary in some cases. For example, it would not be here, since it is already necessary to restore self in order to access A:

INSN(ld_a_vbc) {Q_0 MEMPTR = BC + 1; A = READ(BC); PC++; return 7;}

On the other hand, I don't think it is worthwhile to make callbacks always able to access the final value of PC, because it would complicate block instructions like this one:

#define LDXR(operator)							   \
	zuint8 t = READ(HL operator);					   \
									   \
	WRITE(DE operator, t);						   \
	t += A;								   \
									   \
	if (--BC)							   \
		{			    /* HF, NF = 0		*/ \
		FLAGS = F_SZC		  | /* SF, ZF, CF unchanged	*/ \
			((PC >> 8) & YXF) | /* YF = PCi.13; XF = PCi.11 */ \
			PF;		    /* PF = 1			*/ \
									   \
		MEMPTR = PC + 1;					   \
		return 21;						   \
		}							   \
									   \
	FLAGS = (zuint8)(	 /* HF, PF, NF = 0	 */		   \
		F_SZC	       | /* SF, ZF, CF unchanged */		   \
		((t & 2) << 4) | /* YF = (A + [HLi]).1	 */		   \
		(t & XF));	 /* XF = (A + [HLi]).3	 */		   \
									   \
	PC += 2;							   \
	return 16

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

I'm going to make the optimizations I said, and improve some other things a little bit. I'm going to modify quite a few lines.

@glenn-slayden
Copy link
Contributor Author

glenn-slayden commented Mar 15, 2025

Thank you both for the reply. I assume that you briefly examined the code that I naively proposed. While I largely agree with your use-case, API design, and functional (domain-relevant) arguments against making the PC register more predictable, I think the performance related arguments are not as strong (but again I defer to your expertise)…

My thought here is that excellent performance could be maintained, possibly even improved, and at least any losses mitigated through careful use of a saved local const pointer variable that allows computing the final target only once. The compiler could ultimately be convinced to optimize across the callback if all goes well.

The prevailing, unadjusted scalar PC value can also be preserved as a private local—and optionally even incremented as the instruction proceeds (like now), without affecting the published value, and this scalar is used to store back at the end. In this way, the number of main memory fetches is exactly the same as it is now.

Also, the various details and edge cases of the so-called predictable PC behavior would need to be formally nailed down in an API statement, with explicit rules or caveats about when the new guarantees are active, and when they are not.

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

Also, the various details and edge cases of the so-called predictable PC behavior would need to be formally nailed down in an API statement, with explicit rules or caveats about when the new guarantees are active, and when they are not.

One question: why do you need PC to be predictable after the opcode fetch? take into account that PC points to the first byte of the instruction during the invocation of Z80::fetch_opcode.

@glenn-slayden
Copy link
Contributor Author

My understanding is that with multiple prefixes, it is not the case that Z80::fetch|_opcode exclusively identifies exactly-and-only once per instruction, on its first byte…?

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

My understanding is that with multiple prefixes, it is not the case that Z80::fetch|_opcode exclusively identifies exactly-and-only once per instruction, on its first byte…?

  • For unprefixed instructions and instructions with the prefix ED, DD or FD, PC will always point to the first byte of the instruction during Z80::fetch_opcode.
  • For instructions with the CB prefix, PC will point to the first byte of the instruction ONLY during the 1st invocation of Z80::fetch_opcode. However, during the 2nd invocation, PC will point to the address passed to the callback + 1, which still makes it possible to know the initial value of PC.

@agaxia
Copy link
Collaborator

agaxia commented Mar 15, 2025

You can get the T-state of the current M-cycle relative to the start of the instruction using my Z80InsnClock library, and know wether the opcode fetch corresponds to the first byte of the insn:

#include <Z80.h>
#include <Z80IncnClock.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
	uint8_t *memory;
	Z80 cpu;
	Z80InsnClock insn_clock;
} Machine;


uint8_t machine_cpu_fetch_opcode(Machine *self, uint16_t address)
	{
	uint8_t opcode = self->memory[address];
	uint8_t insn_cycles = z80_insn_clock_m1(&self->insn_clock, opcode);

	if (!insn_cycles && Z80_PC(self->cpu) == address)
		{
		/* this is first opcode fetch of the instruction */
		}

	else	{
		/* this is not the first opcode fetch of the instruction */
		}

	return opcode;
	}


uint8_t machine_cpu_read(Machine *self, uint16_t address)
	{
	uint8_t insn_cycles = z80_insn_clock_m(&self->insn_clock);
	/* ... */
	return self->memory[address];
	}


void machine_cpu_write(Machine *self, uint16_t address, uint8_t value)
	{
	uint8_t insn_cycles = z80_insn_clock_m(&self->insn_clock);
	/* ... */
	self->memory[address] = value;
	}


uint8_t machine_cpu_in(Machine *self, uint16_t address)
	{
	uint8_t insn_cycles = z80_insn_clock_m(&self->insn_clock);
	/* ... */
	return 0xFF /* read byte from device */;
	}


void machine_cpu_out(Machine *self, uint16_t address, uint8_t value)
	{
	uint8_t insn_cycles = z80_insn_clock_m(&self->insn_clock);
	/* ... */
	/* write value to device */
	}


uint8_t machine_insn_clock_read(Machine *self, uint16_t address)
	{
	return self->memory[address];
	}


void machine_initialize(Machine *self)
	{
	self->memory           = malloc(65536);
	self->cpu.context      = self;
	self->cpu.fetch_opcode = (Z80Read)machine_cpu_fetch_opcode;
	self->cpu.nop          = nullptr;
	self->cpu.fetch        =
	self->cpu.read         = (Z80Read )machine_cpu_read;
	self->cpu.write        = (Z80Write)machine_cpu_write;
	self->cpu.in           = (Z80Read )machine_cpu_in;
	self->cpu.out          = (Z80Write)machine_cpu_out;
	self->cpu.halt         = nullptr;
	self->cpu.ld_i_a       =
	self->cpu.ld_r_a       =
	self->cpu.reti         =
	self->cpu.retn         = nullptr;
	self->cpu.hook         = nullptr;
	self->cpu.illegal      = nullptr;

	z80_insn_clock_initialize(
		&self->insn_clock,
		&self->cpu.af, &self->cpu.bc, &self->cpu.hl,
		self, (Z80InsnClockRead)machine_insn_clock_read);
	}


void machine_power_on(Machine *self)
	{
	memset(self->memory, 0, 65536);
	z80_power(&self->cpu, true);
	z80_insn_clock_start(&self->insn_clock, self->cpu.resume);
	}


void machine_reset(Machine *self)
	{
	z80_reset(&self->cpu);
	z80_insn_clock_start(&self->insn_clock, self->cpu.resume);
	}

@glenn-slayden
Copy link
Contributor Author

This is exactly what I was looking for, thanks so much!

@redcode
Copy link
Owner

redcode commented Mar 16, 2025

@agaxia's example is great, although figuring out wether the opcode fetch corresponds to the first byte of the instruction is trivial even when you don't need to know the intra-instruction T-states.

Example:

uint8_t machine_cpu_fetch_opcode(Machine *self, uint16_t address)
	{
	uint8_t opcode = self->memory[address];

	if (Z80_PC(self->cpu) == address)
		{
		/* this is the first opcode fetch of the instruction */
		}

	else	{
		/* this is not the first opcode fetch of the instruction */
		}

	return opcode;
	}

PC is updated only once per instruction. Therefore, PC == address will only be true during the first opcode fetch. As @agaxia has noted, CB instructions update PC early:

INSN(cb_prefix)
	{
	R++;
	return cb_insn_table[DATA[1] = FETCH_OPCODE((PC += 2) - 1)](self);
	}

but during the second opcode fetch PC will not be equal to the fetch address.

As a final note, DD and FD prefixes cause the emulator to increment Z80::cycles by 4 T-states. This is necessary to be able to stop the emulation during long sequences of these prefixes if we run out clock cycles, but in this case, PC still points to the first byte of the instruction, that is, to the last DD or FD prefix fetched.

@agaxia
Copy link
Collaborator

agaxia commented Mar 16, 2025

@agaxia's example is great, although figuring out wether the opcode fetch corresponds to the first byte of the instruction is trivial even when you don't need to know the intra-instruction T-states.

Example:

uint8_t machine_cpu_fetch_opcode(Machine *self, uint16_t address)
	{
	uint8_t opcode = self->memory[address];

	if (Z80_PC(self->cpu) == address)
		{
		/* this is the first opcode fetch of the instruction */
		}

	else	{
		/* this is not the first opcode fetch of the instruction */
		}

	return opcode;
	}

That will not work in instructions with the DD/FD prefix when the second opcode fetch reads another DD/FD prefix. In this case, the new prefix becomes the first byte of the instruction once the callback returns the prefix but PC still points to the previous prefix during the execution of the callback, not to the one fetched in the current invocation of the callback.

That's why the condition checks !insn_cycles. The Z80IncnClock object process the opcode before the callback returns it and, if it is a 2nd DD/FD prefix, it resets the instruction clock cycle counter.

@redcode
Copy link
Owner

redcode commented Mar 16, 2025

That will not work in instructions with the DD/FD prefix when the second opcode fetch reads another DD/FD prefix. In this case, the new prefix becomes the first byte of the instruction once the callback returns the prefix but PC still points to the previous prefix during the execution of the callback, not to the one fetched in the current invocation of the callback.

That's why the condition checks !insn_cycles. The Z80IncnClock object process the opcode before the callback returns it and, if it is a 2nd DD/FD prefix, it resets the instruction clock cycle counter.

I think we are both wrong. !insn_cycles && Z80_PC(self->cpu) == address will be false during the 2nd opcode fetch if the opcode is another DD or FD prefix.

It should be:

#define IS_XY_PREFIX(opcode) (((opcode) & 0xDF) == 0xDD)

if (!insn_cycles && (Z80_PC(self->cpu) == address || IS_XY_PREFIX(opcode))
	{
	/* this is the first opcode fetch of the instruction */
	}

Your instruction clock takes into account that the Z80 emulator treats the DD and FD prefixes as execution units that increment PC, hence it zeroes the instruction clock cycle counter when you pass one of these prefixes to z80_insn_clock_m1. So, for a DD or FD prefix and for the opcode after either of these prefixes, insn_cycles will always be 0. However, if the opcode after either of these prefixes is another DD or FD prefix, once Z80::fetch_opcode returns, xy_xy() will increment PC so that this new prefix becomes the first byte of the instruction and, thefore, this case also needs to be checked in the callback. Otherwise, we couldn't detect that a double DD/FD prefix is discarding the previous prefix and making the current prefix the first byte of the instruction.

If the previous opcode fetch was a DD/FD prefix, !insn_cycles will evaluate to true but Z80_PC(self->cpu) == address will evaluate to false. This will work unless the current opcode is another DD/FD that must be treated as the new first byte of the instruction. So, if Z80_PC(self->cpu) == address evaluates to false (which means that the previous opcode was a DD/FD prefix) we check if the current opcode is another DD/FD to decide wether or not to treat the current opcode fetch as the first byte.

If not using Z80InsnClock, the condition would be:

if (	Z80_PC(self->cpu) == address ||
	(IS_XY_PREFIX(self->cpu.data.uint8_array[0]) && IS_XY_PREFIX(opcode))
)
	{
	/* this is the first opcode fetch of the instruction */
	}

@agaxia
Copy link
Collaborator

agaxia commented Mar 16, 2025

Oh... I see, I forgot that in my example 😅. Yes, you are right 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants