Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd fail when input data server and DIN_LOC_ROOT are the same. #4480

Open
rljacob opened this issue Aug 28, 2023 · 5 comments
Open

Odd fail when input data server and DIN_LOC_ROOT are the same. #4480

rljacob opened this issue Aug 28, 2023 · 5 comments

Comments

@rljacob
Copy link
Member

rljacob commented Aug 28, 2023

While our main input data server was down for a week, we quickly made the NERSC DIN_LOC_ROOT (/global/cfs/cdirs/e3sm/inputdata) a server since it is accessible via the web at https://portal.nersc.gov/project/e3sm/inputdata. Then left it there as a backup.

This led to odd behavior when check_input_data run at NERSC tried to download a file and it wasn't available on any server. It created a 0 size file with the name of the file being downloaded in DIN_LOC_ROOT. Subsequent runs asking for the same file thought it was there. Wasn't until someone went and looked at the file, removed it and saw it come back that this was found (see E3SM-Project/E3SM#5899).

I knew there was a reason why DIN_LOC_ROOT and a data server can't be the same directory but still an odd fail mode.

@jedwards4b
Copy link
Contributor

This happens under other circumstances as well - I think adding a check after download is completed that the file has a non-zero length would be good.

@ndkeen
Copy link
Contributor

ndkeen commented Aug 28, 2023

It might be that the --output-document arg is working differently (?) on perlmutter.

login22% rm foo ; wget https://portal.nersc.gov/foo -nc --output-document foo ; ls -l foo
--2023-08-28 14:46:37--  https://portal.nersc.gov/foo
Resolving portal.nersc.gov (portal.nersc.gov)... 128.55.206.113, 128.55.206.107, 128.55.206.109, ...
Connecting to portal.nersc.gov (portal.nersc.gov)|128.55.206.113|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-28 14:46:37 ERROR 404: Not Found.

-rw-rw-r-- 1 ndk e3sm 0 Aug 28 14:46 foo



login22% rm foo ; wget https://portal.nersc.gov/foo -nc  ; ls -l foo
--2023-08-28 14:46:44--  https://portal.nersc.gov/foo
Resolving portal.nersc.gov (portal.nersc.gov)... 128.55.206.113, 128.55.206.107, 128.55.206.109, ...
Connecting to portal.nersc.gov (portal.nersc.gov)|128.55.206.113|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-08-28 14:46:44 ERROR 404: Not Found.

ls: cannot access 'foo': No such file or directory
login22% wget --version
GNU Wget 1.20.3 built on linux-gnu.

@jedwards4b
Copy link
Contributor

Why isn't this working? It should remove the 0 length file.

@ndkeen
Copy link
Contributor

ndkeen commented Aug 28, 2023

Ah, I see it is fully expected that wget writes a 0-length file when file not found and output-document arg used. Then python will test/remove it. Yea not sure why that may not be working.

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants