this post was submitted on 23 Apr 2024
47 points (88.5% liked)

Linux

48003 readers
883 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

https://codeberg.org/cyber-luna/lunas

Archlinux: yay -S lunas

i made a versatile syncing cli program, lunas, that's capable of syncing local to local, local to remote, remote to local and remote to remote at the same time with many input directories, with their file attributes if enabled and more. It syncs both ways but it has src/dest options that can be assigned to individual input directories

it runs locally, unless remote syncing is used then it runs peer-to-peer using libssh/sftp

It can do sync removal between different input directories, meaning if u want to remove a file/directory that you don't want to sync back to other directories, you can "lunas -rm file" for local or "lunas -rrm user@ip:/path/to/dir" for remote and then use the option "-cr Y" while syncing to remove it from every other directories, or simply don't use this option and it should be ignored without removal, or "-cr S" and it should be synced back to the directory it was removed from, IF it was found in one of the other directories

it has an optional config file for defining presets for easier syncing instead of writing them each time in the cli

there are more options to it which can be found in the --help or in the man page for more details

a simple usage of lunas can be like this

lunas -p dir1 -p dir2 --dry-run

lunas -s dir1 -d dir2 -d dir3 -rd user@ip:dir4

lunas -r user@ip:dir1 -d dir2 -dr

lunas -rs user@ip:dir1 -d dir2 -cr Y

lunas -rd user@ip:dir1 -s dir2

lunas -p dir1 -p dir2 -p dir3 -p user@ip:dir4 -p user@ip:dir5

p: local path r: remote path , both of them are source and destination

s: source local path d: destination local path

rs: source remote path rd: destination remote path

-dr/--dry-run: outputs what would be synced without actually syncing them

-cr/--confirm-remove Y: confirms the sync removal as explained previously

you are viewing a single comment's thread
view the rest of the comments
[–] wewbull@feddit.uk 8 points 6 months ago (1 children)

None of that really matters.

What's your sync algorithm? How are you detecting when a file changes? How do you resolve conflicts? How do you guarantee against data loss?

These are the reasons people use rsync.

[–] cyberluna@programming.dev 4 points 6 months ago

quick overview of the syncing algo

  • a simple overview

1- list all input directories content first

2- a table is made as a map, the rows are the files/dirs, the column numbers are the input directories in a specific order, what's inside each cell of each row, is the mtime, modification time, of that file in different input directory if -diff option was used

3- and it loops through the table to check what is a SRC AND has a newer mtime than another DEST, if so, it removes the dest and resyncs it

without the option -diff what gets filled in the cells of each row are true/false of whether that file exist in this input directory or not. and it just sync based on file name, and which SRC dir it detects first that would be the src of what is missing in the DEST

the ordering of the input directories in the table's columns, are as the user input them, but the local ones has a priority, they get listed first in these columns

so, the conflicts with the -diff option is resolve based on newer/older mtime and src/dest and the newer src updates the older dest. without it just based on file names which is more random, as explained

  • a more techincal overview

1- list all input directories content first

2- in each input directory listing, add the content to vector A 'all_content', the type to vector B 'types', the input_dir_number of the file to vector C 'track_existence', and if option -diff is enabled, push the mtime to vector D 'track_existence_mtime'

i'm gonna explain with the option -diff first which let's it check for mtime, modification time, difference between files and sync based on that

-diff option enables '--attributes mtime' by default which makes sure if re-run it only resync the files if they were changed

'--mtime off' can be used as mentioned in the man page to avoid syncing the mtime

3- the all_content gets sorted using quick sort, and the other vectors follow its sorting order

4- a 2D vector/a table is made as an existence map, the rows are the files/dirs, the column numbers are the input directories in a specific order, what's inside each cell of each row, is the mtime of that file in different input directory

5- the track_existence vector should be cleared after that

6- and it loops through the 2D vector to check what is a SRC AND has a newer mtime than another DEST, if so, it removes the dest and resyncs it

without the option -diff the vector D 'track_existence_mtime' don't get filled, and what gets filled in the cells of each row are true/false of whether that file exist in this input directory or not. and it just sync based on file name, and which SRC dir it detects first that would be the src of what is missing in the DEST

the ordering of the input directories in the existence map columns, are as the user input them, but the local ones has a priority they get listed first in these columns

so, the conflicts with the -diff option is resolved based on newer/older mtime and src/dest and the newer src updates the older dest. without it just based on file names which is more random, as explained

in the copying, or filesystem in general, functions, C++ provide methods to check if certain operations were faulty or not, i use these methods combined with checking the return of remote reading/writing if successful or not, if a write or read to a buffer produced an error, syncing to that file stops and it goes to the next file. that file stays named as file.ls.part

lunas doesn't have checksum option, not yet at least i might add it later. so if that is a problem for someone, they could avoid using lunas for now

but i made a seperate program that checks recursively checksums of many input directories which i usually use when needed to check if lunas is working correctly or not

btw just to be clear, as mentioned in the license, --> This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License version 3 for more details. -> https://www.gnu.org/licenses/gpl-3.0.en.html