[PATCH] Add --skip option similar to dd skip=N option

Artem Bityutskiy dedekind1 at gmail.com
Sun Mar 1 23:06:27 PST 2015


On Sat, 2015-02-28 at 10:00 -0800, Eric Seifert wrote:
> Hi Artem, thanks for the comments, I will look at re-implementing as
> you suggest. One thing though is that since skip can be any number of
> bytes, the start of where we write can be mid-block, so you may have
> one block sent to be written where only part of it will be used. So I
> think at least some change in the write path will be needed to split
> that first block up if needed and adjust the seek offset. 
> 
Hi,

yes, the whole thing is block-based. I'd started with limiting the
--skip to be block size-aligned, to simplify things at the beginning.
Making it unaligned could be the second step, done separately.


Also, now I realize that what I suggested has one small complication.

If you look at the bmap file, you'll see that it basically lists ranges
of mapped blocks, and the SHA256 of the range.

So basically, you need to skip all the ranges before "skip". However, if
"skip" is in the middle of the bmap file range, we have a small
complication. Suppose the blocks range from the bmap file is A-B, and A
< skip < B, so skip is somewhere in the middle.

What we need to do then, is

1. Read A-B, check SHA256, just as we already do now.

2. But instead of generating range A-B to the writer thread, generate
range skip-B.


And then the second step will be to teach the code to not assume A-B are
block-aligned.

May be few words how it works, if it is helpful.

BmapCopy starts with creating a separate thread - the reader thread. The
reader reads the image, verifies SHA256, and puts the data writer has to
write to a queue. Writer picks data from the queue and writes. Writer
and reader are 2 separate threads.

The reader is implemented in _get_data().

It iterates over every block range. Reads the range. Verifies it. Puts
to the queue for the writer. And I think the skip logic should be added
here, not in _get_block_ranges().


_get_block_ranges() is a helper function which reads bmap file, picks
block ranges from there, and yields up. So it just tells which block
ranges need to be read. Well, there is a small additional complication -
we split too long ranges on smaller "batches", just to lessen the peak
memory consumption.

HTH.






More information about the Bmap-tools mailing list